Skip to content

refactor: optimize RepoFileWatcher sync pipeline#9918

Open
pavkout wants to merge 2 commits into
Kong:developfrom
pavkout:INS-2259
Open

refactor: optimize RepoFileWatcher sync pipeline#9918
pavkout wants to merge 2 commits into
Kong:developfrom
pavkout:INS-2259

Conversation

@pavkout
Copy link
Copy Markdown
Contributor

@pavkout pavkout commented May 13, 2026

Summary

A series of targeted performance improvements to RepoFileWatcher the bidirectional sync layer between NeDB and on-disk Git YAML files.

  • Conditional polling: the 10 s fallback interval only starts when fs.watch() fails (macOS/Linux users no longer pay the cost). pollDirectory now iterates already-tracked files directly instead of a full recursive readdir on every tick.
  • Per-workspace debounce: replaced the single flushDebounce timer with a Map<workspaceId, timer>. The onChange listener resolves which workspace(s) were actually affected per change batch; only those workspaces are serialized. Bursts in one workspace no longer delay flushing another.
  • Targeted flush: flushWorkspacesToDisk accepts an optional workspaceIds set; debounce-triggered flushes only export and write the changed workspaces instead of all workspaces in the project.
  • Batched DB queries: getWorkspacesWithMeta replaced N+1 findOne calls with two bulk db.find queries joined in memory. handleFileDeletion no longer fetches all workspaces; it scans the in-memory lastKnownGitFilePath map instead.
  • docToWorkspace lookup map: populated during upsertDocs; lets resolveWorkspaceId attribute DB change events to specific workspaces without hitting NeDB. Kept consistent via deleteOrphans and removeWorkspaceWithDescendants.
  • Orphan detection: inner .some() per origin doc (O(N×M)) replaced with a Set lookup (O(N+M)).
  • Parallel directory reads: collectYamlFiles recurses into subdirectories in parallel via Promise.all.
  • Avoid redundant syscalls: post-write fs.stat() calls (used only to record mtime) replaced with Date.now(), which is always ≥ the actual mtime of a file just written.
  • Cheaper content parsing: two content.split('\n') allocations in parseAndValidate replaced with indexOf('\n') for the first-line check and a single regex for conflict-marker detection.

Security fixes

  • Path traversal hardening: flushWorkspacesToDisk and loadKnownGitFilePaths were missing the rel.startsWith('..') boundary check that already existed in flushNewerDbWorkspacesToDisk. A poisoned gitFilePath value in the DB (e.g. introduced via a compromised sync API) could have caused the watcher to write YAML to or unlink files outside the repo directory. Both paths now reject any resolved path that escapes repoDir.
  • Symlink read prevention: the fs.watch inbound path (scheduleImportreadIfChanged) used fs.stat, which follows symlinks. An attacker with write access to the repo directory could place a symlink whose target gets read into memory before the Insomnia type check runs. Switched to lstat with an early-return on isSymbolicLink() so the readFile call is never reached for symlinks.

Test plan

  • Open a git-backed project and verify the workspace YAML updates on disk after editing a request
  • Edit requests in two different workspaces in quick succession — both YAMLs should flush independently
  • Edit a YAML file externally (e.g. in VS Code) and verify Insomnia reflects the change
  • Delete a workspace YAML file on disk and verify the workspace is removed from the sidebar
  • Run git pull / git checkout from the Insomnia UI and verify the DB reflects the new state
  • Verify no regressions on a project with multiple workspaces across nested subdirectories

@pavkout pavkout self-assigned this May 13, 2026
@pavkout pavkout requested review from a team and gatzjames May 13, 2026 16:34
@pavkout pavkout marked this pull request as ready for review May 21, 2026 16:02
Copilot AI review requested due to automatic review settings May 21, 2026 16:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors RepoFileWatcher (Git YAML ↔ NeDB sync) to reduce unnecessary work during steady-state syncing, limit DB→disk flush scope to impacted workspaces, and harden a couple of filesystem-edge security cases.

Changes:

  • Makes DB→disk flushing per-workspace (per-workspace debounce + ability to flush a subset of workspaces) and reduces N+1 DB metadata lookups.
  • Optimizes directory polling by primarily iterating tracked files and only scanning the tree for newly discovered YAML files; adds parallel recursion in collectYamlFiles.
  • Adds path-escape guards for gitFilePath resolution and prevents reading symlinked YAML files by switching inbound checks to lstat().
Comments suppressed due to low confidence (2)

packages/insomnia/src/sync/git/repo-file-watcher.ts:476

  • Recording lastSyncMtime as Date.now() after writing can be < fs.stat().mtimeMs because mtimeMs can include fractional milliseconds while Date.now() is integer ms. That can trigger an unnecessary re-read/re-import on the next scan. Consider storing Date.now() + 1 (or otherwise ensuring the stored value is >= possible mtimeMs) and adjusting the comment that says this is “always >=”.

        this.lastWrittenHash.set(absPath, hash);
        this.lastKnownGitFilePath.set(workspace._id, absPath);
        // Use Date.now() — always >= the actual mtime of the file just written, saves a stat() syscall
        this.lastSyncMtime.set(absPath, Date.now());

packages/insomnia/src/sync/git/repo-file-watcher.ts:917

  • The same path traversal guard issue exists here: rel.startsWith('..') will also reject safe in-repo file names like ..foo.yaml. Prefer checking rel === '..' or rel.startsWith('..' + path.sep) so only actual parent-directory escapes are rejected.
        this.lastKnownGitFilePath.set(workspace._id, absPath);
      }
    }
  }


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const absPath = path.normalize(path.join(this.repoDir, gitFilePath));

const rel = path.relative(this.repoDir, absPath);
if (rel.startsWith('..') || path.isAbsolute(rel)) {
Comment on lines 497 to 503
watcher.on('error', err => {
console.warn('[repo-file-watcher] fs.watch error:', err);
});

this.fsWatchers.push(watcher);
this.fsWatchActive = true;
} catch (err) {
Comment on lines +519 to 528
// Check known files for mtime changes or deletions — no readdir
for (const [absPath, lastMtime] of this.lastSyncMtime) {
try {
const stat = await fs.promises.stat(absPath);
const lastMtime = this.lastSyncMtime.get(absPath) ?? 0;
if (stat.mtimeMs > lastMtime) {
this.queue.enqueue(() => this.importFile(absPath));
}
} catch {
// File may have been removed between readdir and stat
this.queue.enqueue(() => this.importFile(absPath));
}
this.lastWrittenHash.set(normalised, hash);
const newStat = await fs.promises.stat(absPath);
this.lastSyncMtime.set(normalised, newStat.mtimeMs);
this.lastSyncMtime.set(normalised, Date.now());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants