refactor: optimize RepoFileWatcher sync pipeline#9918
Open
pavkout wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors RepoFileWatcher (Git YAML ↔ NeDB sync) to reduce unnecessary work during steady-state syncing, limit DB→disk flush scope to impacted workspaces, and harden a couple of filesystem-edge security cases.
Changes:
- Makes DB→disk flushing per-workspace (per-workspace debounce + ability to flush a subset of workspaces) and reduces N+1 DB metadata lookups.
- Optimizes directory polling by primarily iterating tracked files and only scanning the tree for newly discovered YAML files; adds parallel recursion in
collectYamlFiles. - Adds path-escape guards for
gitFilePathresolution and prevents reading symlinked YAML files by switching inbound checks tolstat().
Comments suppressed due to low confidence (2)
packages/insomnia/src/sync/git/repo-file-watcher.ts:476
- Recording
lastSyncMtimeasDate.now()after writing can be <fs.stat().mtimeMsbecausemtimeMscan include fractional milliseconds whileDate.now()is integer ms. That can trigger an unnecessary re-read/re-import on the next scan. Consider storingDate.now() + 1(or otherwise ensuring the stored value is >= possiblemtimeMs) and adjusting the comment that says this is “always >=”.
this.lastWrittenHash.set(absPath, hash);
this.lastKnownGitFilePath.set(workspace._id, absPath);
// Use Date.now() — always >= the actual mtime of the file just written, saves a stat() syscall
this.lastSyncMtime.set(absPath, Date.now());
packages/insomnia/src/sync/git/repo-file-watcher.ts:917
- The same path traversal guard issue exists here:
rel.startsWith('..')will also reject safe in-repo file names like..foo.yaml. Prefer checkingrel === '..'orrel.startsWith('..' + path.sep)so only actual parent-directory escapes are rejected.
this.lastKnownGitFilePath.set(workspace._id, absPath);
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const absPath = path.normalize(path.join(this.repoDir, gitFilePath)); | ||
|
|
||
| const rel = path.relative(this.repoDir, absPath); | ||
| if (rel.startsWith('..') || path.isAbsolute(rel)) { |
Comment on lines
497
to
503
| watcher.on('error', err => { | ||
| console.warn('[repo-file-watcher] fs.watch error:', err); | ||
| }); | ||
|
|
||
| this.fsWatchers.push(watcher); | ||
| this.fsWatchActive = true; | ||
| } catch (err) { |
Comment on lines
+519
to
528
| // Check known files for mtime changes or deletions — no readdir | ||
| for (const [absPath, lastMtime] of this.lastSyncMtime) { | ||
| try { | ||
| const stat = await fs.promises.stat(absPath); | ||
| const lastMtime = this.lastSyncMtime.get(absPath) ?? 0; | ||
| if (stat.mtimeMs > lastMtime) { | ||
| this.queue.enqueue(() => this.importFile(absPath)); | ||
| } | ||
| } catch { | ||
| // File may have been removed between readdir and stat | ||
| this.queue.enqueue(() => this.importFile(absPath)); | ||
| } |
| this.lastWrittenHash.set(normalised, hash); | ||
| const newStat = await fs.promises.stat(absPath); | ||
| this.lastSyncMtime.set(normalised, newStat.mtimeMs); | ||
| this.lastSyncMtime.set(normalised, Date.now()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A series of targeted performance improvements to
RepoFileWatcherthe bidirectional sync layer between NeDB and on-disk Git YAML files.fs.watch()fails (macOS/Linux users no longer pay the cost).pollDirectorynow iterates already-tracked files directly instead of a full recursivereaddiron every tick.flushDebouncetimer with aMap<workspaceId, timer>. TheonChangelistener resolves which workspace(s) were actually affected per change batch; only those workspaces are serialized. Bursts in one workspace no longer delay flushing another.flushWorkspacesToDiskaccepts an optionalworkspaceIdsset; debounce-triggered flushes only export and write the changed workspaces instead of all workspaces in the project.getWorkspacesWithMetareplaced N+1findOnecalls with two bulkdb.findqueries joined in memory.handleFileDeletionno longer fetches all workspaces; it scans the in-memorylastKnownGitFilePathmap instead.docToWorkspacelookup map: populated duringupsertDocs; letsresolveWorkspaceIdattribute DB change events to specific workspaces without hitting NeDB. Kept consistent viadeleteOrphansandremoveWorkspaceWithDescendants..some()per origin doc (O(N×M)) replaced with aSetlookup (O(N+M)).collectYamlFilesrecurses into subdirectories in parallel viaPromise.all.fs.stat()calls (used only to record mtime) replaced withDate.now(), which is always ≥ the actual mtime of a file just written.content.split('\n')allocations inparseAndValidatereplaced withindexOf('\n')for the first-line check and a single regex for conflict-marker detection.Security fixes
flushWorkspacesToDiskandloadKnownGitFilePathswere missing therel.startsWith('..')boundary check that already existed influshNewerDbWorkspacesToDisk. A poisonedgitFilePathvalue in the DB (e.g. introduced via a compromised sync API) could have caused the watcher to write YAML to orunlinkfiles outside the repo directory. Both paths now reject any resolved path that escapesrepoDir.fs.watchinbound path (scheduleImport→readIfChanged) usedfs.stat, which follows symlinks. An attacker with write access to the repo directory could place a symlink whose target gets read into memory before the Insomnia type check runs. Switched tolstatwith an early-return onisSymbolicLink()so thereadFilecall is never reached for symlinks.Test plan
git pull/git checkoutfrom the Insomnia UI and verify the DB reflects the new state