Skip to content

Remote upload watch mode was effectively unusable on large workspaces#49

Merged
m1rl0k merged 1 commit intoContext-Engine-AI:testfrom
voarsh2:upload-client-hash-perf
Dec 9, 2025
Merged

Remote upload watch mode was effectively unusable on large workspaces#49
m1rl0k merged 1 commit intoContext-Engine-AI:testfrom
voarsh2:upload-client-hash-perf

Conversation

@voarsh2
Copy link
Contributor

@voarsh2 voarsh2 commented Dec 9, 2025

(e.g. ~6K files) on Windows due to two issues:

  • The standalone client’s SimpleHashCache reloaded file_cache.json and re-validated every cached path on each access, causing O(N^2) Path.exists() calls and constant file_cache.json rewrites.
  • Both standalone and non-standalone clients re-hashed the contents of every candidate file on every watch iteration, even when mtime/size were unchanged.

This change:

  • Updates standalone_upload_client.SimpleHashCache to:

    • Load file_cache.json once into memory and run the stale-check only once per process.
    • Keep an in-memory dict for all subsequent get/set/all_paths calls.
    • Stop flushing the cache file on every set/remove, eliminating the continuous rewrites during watch.
  • Adds an in-memory stat cache to RemoteUploadClient.detect_file_changes in both scripts/standalone_upload_client.py and scripts/remote_upload_client.py:

    • Tracks (mtime_ns, size) per absolute path.
    • Uses stat() to short-circuit unchanged files and only re-hash files whose stat changed or are new.
    • Keeps initial behavior of treating the first watch iteration as a full sync (all files "created"), but makes subsequent iterations fast and responsive even on large Windows repos.

Result: both standalone and non-standalone --watch modes now detect single-file edits quickly on a ~6K-file Windows workspace without hammering cache files or re-hashing every file on each interval.

(e.g. ~6K files) on Windows due to two issues:

- The standalone client’s SimpleHashCache reloaded file_cache.json and
  re-validated every cached path on each access, causing O(N^2) Path.exists()
  calls and constant file_cache.json rewrites.
- Both standalone and non-standalone clients re-hashed the contents of every
  candidate file on every watch iteration, even when mtime/size were unchanged.

This change:

- Updates standalone_upload_client.SimpleHashCache to:
  - Load file_cache.json once into memory and run the stale-check only once
    per process.
  - Keep an in-memory dict for all subsequent get/set/all_paths calls.
  - Stop flushing the cache file on every set/remove, eliminating the
    continuous rewrites during watch.

- Adds an in-memory stat cache to RemoteUploadClient.detect_file_changes in
  both scripts/standalone_upload_client.py and scripts/remote_upload_client.py:
  - Tracks (mtime_ns, size) per absolute path.
  - Uses stat() to short-circuit unchanged files and only re-hash files whose
    stat changed or are new.
  - Keeps initial behavior of treating the first watch iteration as a full
    sync (all files "created"), but makes subsequent iterations fast and
    responsive even on large Windows repos.

Result: both standalone and non-standalone --watch modes now detect single-file
edits quickly on a ~6K-file Windows workspace without hammering cache files or
re-hashing every file on each interval.
@m1rl0k m1rl0k merged commit c0ba444 into Context-Engine-AI:test Dec 9, 2025
1 check passed
@voarsh2 voarsh2 deleted the upload-client-hash-perf branch December 10, 2025 04:09
m1rl0k added a commit that referenced this pull request Mar 1, 2026
Remote upload watch mode was effectively unusable on large workspaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants