Skip to content

fix: speed up mount bootstrap fallback#185

Merged
khaliqgant merged 2 commits into
mainfrom
codex/issue-183-bootstrap-performance
May 21, 2026
Merged

fix: speed up mount bootstrap fallback#185
khaliqgant merged 2 commits into
mainfrom
codex/issue-183-bootstrap-performance

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • add decoded-byte contentHash to tree entries and filesystem events, including websocket forwarding and OpenAPI coverage
  • make incomplete bootstrap pull before local writeback scanning so a kept mirror after state loss is tracked before pushLocal can echo it back
  • skip fallback ReadFile calls when local bytes already match tree contentHash, and fetch remaining tree files through bounded parallelism (RELAYFILE_BOOTSTRAP_READ_CONCURRENCY, default 16, max 64)

Review Notes

  • Verified the current client already uses in-process Go HTTP with an HTTP/2-capable shared transport; the stale subprocess-per-file behavior from the issue is not present on main.
  • During review, found and fixed a recovery ordering bug: state-wipe + kept files would still run pushLocal before bootstrap, defeating skip-on-hash and risking noisy writeback. Added a regression for that exact shape.
  • Kept this PR based directly on origin/main so it does not include the older issue-182/lifecycle branch work.

Tests

  • go test ./...
  • scripts/check-contract-surface.sh

Fixes #183

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces content-hash computation (SHA-256) for files and uses it to parallelize bootstrap reads. Content hashes are computed in the store layer, surfaced through tree/event APIs and WebSocket payloads, and then leveraged during bootstrap to skip remote reads when local file content already matches the remote entry hash, while parallelizing remaining reads within a configurable worker bound.

Changes

Content Hash and Bootstrap Optimization

Layer / File(s) Summary
ContentHash data types and computation
internal/relayfile/store.go
TreeEntry and Event types gain ContentHash field; new helpers contentHashForFile and normalizeEncodingForHash compute SHA-256 hashes of file bytes (decoding base64 content as needed).
ContentHash serialization in tree and event responses
internal/relayfile/store.go
ListTree and listTreeFromFiles populate ContentHash for file entries; recordWriteLocked and applyProviderUpsertLocked include ContentHash in event payloads (omitted for file deletions).
WebSocket message structure with ContentHash
internal/httpapi/websocket.go
fileEventMessage struct adds explicit JSON tags and ContentHash field; writeWebSocketEvent uses keyed struct literals to consistently populate the updated schema.
OpenAPI schema updates
openapi/relayfile-v1.openapi.yaml
TreeEntry and FilesystemEvent schemas include new contentHash field; exportWorkspace endpoint adds optional path query parameter for path-prefix filtering.
Bootstrap read parallelization with hash-aware skipping
internal/mountsync/syncer.go
Introduces defaultBootstrapReadWorkers concurrency setting; updates Syncer.sync with didPoll tracking; replaces per-entry remote reads with a batched job approach: trySkipBootstrapRead skips remote fetches when local hash matches entry.ContentHash, otherwise appends to job queue; readBootstrapFiles and bootstrapReadWorkers parallelize remaining reads (bounded by env var).
Bootstrap concurrency test with instrumented client
internal/mountsync/bootstrap_test.go
bootstrapClient test double adds atomic activeReads/maxActiveRead counters to track peak concurrency; ReadFile increments/decrements counters on each call; ListTree populates ContentHash; new TestBootstrapReadsFilesWithBoundedParallelism configures concurrency limit and validates observed peak stays within bounds.
Syncer integration tests for hash-based skip logic
internal/mountsync/syncer_test.go
Two new regression tests assert that pullRemoteFullTree skips ReadFile when local hash equals remote ContentHash, and that Reconcile avoids redundant reads/push when local mirror already matches remote hash; fakeClient adds mutex-guarded read counters and ContentHash in tree entries.
Store test coverage for ContentHash in events and trees
internal/relayfile/store_test.go
TestListTreeHonorsDepth validates contentHash presence on files and absence on directories; TestStoreEventsAndOps asserts contentHash in creation events and verifies deletion events omit the hash.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AgentWorkforce/relayfile#90: Both PRs modify internal/mountsync/syncer.go to use ContentHash during pull/bootstrap logic; #90 focuses on revision-reuse detection, this PR on hash-based skip and worker parallelism.
  • AgentWorkforce/relayfile#92: Both PRs optimize bootstrap/full-tree pull paths in internal/mountsync/syncer.go by avoiding expensive ReadFile work via fast-path logic and hash-aware short-circuiting.
  • AgentWorkforce/relayfile#166: Both PRs modify bootstrap/full-tree pull flow in internal/mountsync/syncer.go; #166 addresses bootstrap timeout/progress/resumability, this PR introduces parallel hash-aware reads.

Poem

🐰 A hare hops through hashes so fine,
SHA-256 stitched in each stored line,
Workers parallel, jobs in a queue,
Bootstrap reads bounce—skip what's not new,
Fast and boundless, a fluffy pursue! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: speed up mount bootstrap fallback' accurately describes the main objective of parallelizing bootstrap file reads and skipping reads when hashes match.
Description check ✅ Passed The description clearly relates to the changeset, detailing the addition of contentHash, bootstrap ordering changes, and bounded parallelism for bootstrap reads.
Linked Issues check ✅ Passed The PR addresses all core objectives from issue #183: adds contentHash to tree entries/events [relayfile/store.go], implements bounded worker pool for reads [syncer.go], skips ReadFile when hashes match [syncer.go], and fixes bootstrap ordering [syncer.go]. WebSocket and OpenAPI coverage added.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #183 bootstrap performance objectives: contentHash addition, bootstrap ordering, parallelism, and skip-on-hash logic. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-183-bootstrap-performance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-21T12-46-40-908Z-HEAD-provider
Mode: provider
Git SHA: 69fad24

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread internal/relayfile/store.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/relayfile/store.go (1)

1075-1083: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid recomputing full-content SHA-256 during every tree read.

ListTree/listTreeFromFiles now hash file bytes on-demand for each entry, which makes listing CPU-cost scale with total file bytes. This can become a hot path under large workspaces and frequent polls. Prefer storing contentHash at write/upsert time and reusing it in list/event responses.

Also applies to: 4237-4245, 4778-4788

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/relayfile/store.go` around lines 1075 - 1083,
ListTree/listTreeFromFiles is recalculating full-file SHA-256 on every read via
contentHashForFile, causing CPU work proportional to file bytes; instead compute
and persist the content hash at write/upsert time and read that stored value
when constructing TreeEntry. Update the upsert/write paths that create or modify
files to set the file.ContentHash (or Provider/ProviderObjectID-backed hash
field) once, then change ListTree/listTreeFromFiles and the code paths that
build TreeEntry (where ContentHash currently calls contentHashForFile) to use
the persisted file.ContentHash property; also replace other similar callsites
(the other occurrences around the repository that call contentHashForFile) to
read the stored hash rather than recomputing.
🧹 Nitpick comments (1)
internal/mountsync/syncer.go (1)

2444-2450: 💤 Low value

Silent error swallowing in optimization path may mask filesystem issues.

The function correctly returns (false, nil) for ErrNotExist (file doesn't exist locally, so we can't skip). However, line 2449 also returns (false, nil) for other errors (permission denied, I/O errors, etc.), which could mask unexpected filesystem problems during bootstrap.

Consider logging non-ErrNotExist errors at debug level so operators can diagnose bootstrap issues:

 snapshot, err := readLocalSnapshot(localPath, false)
 if err != nil {
     if errors.Is(err, os.ErrNotExist) {
         return false, nil
     }
+    s.logf("trySkipBootstrapRead: local snapshot read failed for %s: %v", remotePath, err)
     return false, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mountsync/syncer.go` around lines 2444 - 2450, The current handling
of readLocalSnapshot(localPath, false) swallows all errors by returning (false,
nil) for any non-nil err; change it so only os.ErrNotExist returns (false, nil)
silently, and for any other error you emit a debug-level log entry including the
error and localPath (using the package's logger, e.g. logger.Debugf or
s.log.Debugf) before returning (false, nil) so operators can diagnose
filesystem/permission issues while preserving the existing return behavior;
reference readLocalSnapshot and localPath to locate the code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi/relayfile-v1.openapi.yaml`:
- Around line 201-208: The new query parameter "path" on the exportWorkspace
operation should be constrained to absolute paths like other filesystem
endpoints: update the parameter schema for name "path" (in the exportWorkspace
operation) to include a pattern '^/.*' (and optionally minLength: 1) so only
values starting with '/' are accepted; ensure the schema remains type: string
and keep the existing default and description.

---

Outside diff comments:
In `@internal/relayfile/store.go`:
- Around line 1075-1083: ListTree/listTreeFromFiles is recalculating full-file
SHA-256 on every read via contentHashForFile, causing CPU work proportional to
file bytes; instead compute and persist the content hash at write/upsert time
and read that stored value when constructing TreeEntry. Update the upsert/write
paths that create or modify files to set the file.ContentHash (or
Provider/ProviderObjectID-backed hash field) once, then change
ListTree/listTreeFromFiles and the code paths that build TreeEntry (where
ContentHash currently calls contentHashForFile) to use the persisted
file.ContentHash property; also replace other similar callsites (the other
occurrences around the repository that call contentHashForFile) to read the
stored hash rather than recomputing.

---

Nitpick comments:
In `@internal/mountsync/syncer.go`:
- Around line 2444-2450: The current handling of readLocalSnapshot(localPath,
false) swallows all errors by returning (false, nil) for any non-nil err; change
it so only os.ErrNotExist returns (false, nil) silently, and for any other error
you emit a debug-level log entry including the error and localPath (using the
package's logger, e.g. logger.Debugf or s.log.Debugf) before returning (false,
nil) so operators can diagnose filesystem/permission issues while preserving the
existing return behavior; reference readLocalSnapshot and localPath to locate
the code to modify.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6fd8dc3a-8296-4f41-b6bc-bd6ce308529b

📥 Commits

Reviewing files that changed from the base of the PR and between a9b7153 and fea1b92.

📒 Files selected for processing (7)
  • internal/httpapi/websocket.go
  • internal/mountsync/bootstrap_test.go
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go
  • internal/relayfile/store.go
  • internal/relayfile/store_test.go
  • openapi/relayfile-v1.openapi.yaml

Comment thread openapi/relayfile-v1.openapi.yaml
@khaliqgant khaliqgant merged commit 7032a76 into main May 21, 2026
9 checks passed
@khaliqgant khaliqgant deleted the codex/issue-183-bootstrap-performance branch May 21, 2026 12:49
@kjgbot
Copy link
Copy Markdown
Contributor

kjgbot commented May 21, 2026

Opened follow-up PR for the post-merge CodeRabbit OpenAPI feedback: #187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mount bootstrap: ~1.5 files/sec — subprocess-per-file ReadFile, no parallelism, no skip-on-hash

2 participants