Skip to content

fix(mountsync): short-circuit reconcile when cloud has no new events#92

Merged
khaliqgant merged 1 commit intomainfrom
fix/mountsync-skip-empty-events
May 7, 2026
Merged

fix(mountsync): short-circuit reconcile when cloud has no new events#92
khaliqgant merged 1 commit intomainfrom
fix/mountsync-skip-empty-events

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

The bug

The polling reconcile loop times out and stalls on workspaces with non-trivial content. Concrete repro: a workspace with 132 Notion pages produces this in ~/relayfile-mount/.relay/mount.log repeatedly:

mount stalled: no successful reconcile for 10m
mount sync cycle failed: context deadline exceeded
mount sync stopping: context canceled

relayfile status reports stall: no successful reconcile for 10m indefinitely. Restarting the daemon doesn't help — the next reconcile cycle hits the same deadline. Local file edits never propagate to the cloud while the daemon is in this state.

The threshold is in the low hundreds of files. The default per-cycle timeout is 15s (RELAYFILE_MOUNT_TIMEOUT); the periodic full-tree pull (every defaultFullPullEvery=20 incremental cycles) and the bootstrap full pull both perform sequential ReadFile calls per entry, which exceeds the deadline once the workspace crosses ~100 files.

Root cause

Two locations in internal/mountsync/syncer.go were unconditionally falling into the slow full-tree path:

  1. Periodic full pull (line ~1424, pre-fix): every Nth incremental cycle (fullPullEvery=20, ~10 min at 30s intervals) forced a full tree pull regardless of whether the events feed had reported any activity.
  2. Bootstrap path (line ~1453, pre-fix): when EventsCursor was empty — including on every daemon restart against a workspace whose prior sync had timed out before the cursor was resolved — the reconcile path always ran pullRemoteFull. The state file persisted 289 tracked files + lastEventAt but null eventsCursor, so each restart re-tried the same doomed full pull.

pullRemoteFullTree (syncer.go:1515) iterates every tree entry and calls ReadFile sequentially per file. With 132+ Notion pages this exceeds the 15s deadline deterministically.

Fix

Both paths now leverage the cursor-based events feed the SDK already exposes:

  1. Skip-if-no-events short-circuit (syncer.go:1414). When EventsCursor is set and ListEvents(since=cursor, limit=1) returns empty, pullRemote returns immediately. Most reconcile cycles on a quiet workspace are no-ops; this turns them into a single cheap ListEvents call instead of a full-tree fetch. The trust-but-verify periodic full-pull cadence is still respected, but the counter only advances on non-empty cycles — counting empty cycles toward the threshold would race the periodic pull against the very condition the short-circuit is designed to avoid.

  2. Restart fast-path (syncer.go:1477). When EventsCursor is empty but the state file already records tracked files AND a prior LastEventAt (i.e. a previous daemon successfully observed events from this workspace), seed the cursor from the events tip and skip the bootstrap full pull entirely. The LastEventAt gate keeps the fast-path opt-in: tests and callers that hand-seed a state file without ever observing live events still take the full-pull bootstrap path (necessary for e.g. TestPullDeletesLocalDeniedFile, where the full pull is what detects and tears down the now-denied file).

Both fixes are no-ops on backends without an events feed: ListEvents returns 404 and we fall through to the existing full-pull path.

Verification

Rebuilt the binary against rw_517d60b6 (acme-demo, 289 tracked files):

Pre-fix log:

mount sync cycle failed: context deadline exceeded
mount stalled: no successful reconcile for 10m
mount sync cycle failed: context deadline exceeded

Post-fix log:

restart fast-path: seeded events cursor "evt_1" from 289 tracked files; skipping bootstrap full pull
mount sync cycle completed

Test plan

  • TestPullShortCircuitsWhenNoNewEvents — quiet cycles issue exactly one ListEvents probe and zero ListTree/ReadFile calls; a real event still triggers a fetch and updates the mirror.
  • TestPullRestartFastPathSkipsFullPull — restart with persisted tracked.Files + LastEventAt seeds the cursor instead of running the full pull; subsequent quiet cycle short-circuits cleanly.
  • TestPullPeriodicFullCycle updated to append events between cycles (so the cadence counter still advances under the new short-circuit).
  • All existing internal/mountsync and cmd/relayfile-cli tests pass with go test ./... -count=1.
  • End-to-end repro on acme-demo workspace (289 files): pre-fix permanent stall, post-fix mount sync cycle completed.

🤖 Generated with Claude Code

The polling reconcile loop times out and stalls on workspaces with
non-trivial content (low hundreds of files). Concrete repro: a
workspace with 132 Notion pages produces

  mount sync cycle failed: context deadline exceeded
  mount stalled: no successful reconcile for 10m

over and over. The default per-cycle deadline is 15s
(RELAYFILE_MOUNT_TIMEOUT); the periodic full tree pull (every 20
incremental cycles) and the bootstrap full pull both perform
sequential ReadFile calls per entry, which exceeds the deadline once
the workspace crosses ~100 files.

Two fixes, both leveraging the existing events feed:

1. Skip-if-no-events short-circuit. When EventsCursor is set and the
   events feed reports no new events since that cursor (limit=1
   probe), pullRemote returns immediately. Most reconcile cycles on
   a quiet workspace are no-ops; this turns them into a single cheap
   ListEvents call instead of a full-tree fetch. The trust-but-verify
   periodic full pull cadence is still respected, but only counts
   non-empty cycles toward its threshold (counting empty cycles
   would race the periodic pull against the very condition the
   short-circuit is designed to avoid).

2. Restart fast-path. When EventsCursor is empty but the state file
   already records tracked files AND a prior LastEventAt — meaning a
   previous daemon successfully observed events from this workspace
   — seed the cursor from the events tip and skip the bootstrap full
   pull entirely. This unblocks the production case where a daemon's
   state.json persists 289 tracked files + lastEventAt but null
   eventsCursor (because the previous run's full pull deadlined out
   before resolveLatestEventCursor ran). Without this, every restart
   re-tries the same doomed full pull and the daemon stays stalled
   forever.

   The LastEventAt gate is load-bearing: callers and tests that
   hand-seed a state file without ever observing live events still
   take the full-pull bootstrap path (necessary for e.g.
   TestPullDeletesLocalDeniedFile, where the full pull is what
   detects and tears down the now-denied file).

Verification: rebuilt the binary against rw_517d60b6 (acme-demo, 289
tracked files). Pre-fix, every cycle logged "context deadline
exceeded". Post-fix, the first reconcile after restart logs
"restart fast-path: seeded events cursor evt_1 from 289 tracked
files; skipping bootstrap full pull" and "mount sync cycle
completed".

Tests:
- TestPullShortCircuitsWhenNoNewEvents — quiet cycles issue exactly
  one ListEvents probe and zero ListTree/ReadFile calls.
- TestPullRestartFastPathSkipsFullPull — restart with persisted
  tracked.Files + LastEventAt seeds cursor instead of running full
  pull.
- TestPullPeriodicFullCycle updated to append events between cycles
  (so the cadence counter still advances under the new
  short-circuit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f57917f4f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1503 to +1505
cursor, err := s.resolveLatestEventCursor(ctx)
if err == nil {
s.state.EventsCursor = cursor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid advancing cursor past unseen downtime events

When a restarted daemon has tracked files and LastEventAt but no EventsCursor, resolving the cursor to the current tip and returning skips every event that occurred after the persisted mirror was last updated. In that scenario (for example, a remote file is updated or deleted while the daemon is down), the next reconcile probes from this newly seeded tip, sees no events, and short-circuits; because quiet cycles no longer advance the full-pull counter, the local mirror can remain stale indefinitely until some later non-empty cycle happens to trigger verification.

Useful? React with 👍 / 👎.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR optimizes the mount sync reconciliation logic by adding two short-circuit paths to reduce unnecessary work across restarts and quiet cycles. When events are tracked, a lightweight probe skips work if quiet. When restarting with missing cursor state, it seeds the cursor from prior tracking rather than re-bootstrapping. Tests validate all new paths and ensure periodic full-pull cadence remains intact.

Changes

Reconciliation Short-Circuit and Restart Optimizations

Layer / File(s) Summary
Event-Driven Short-Circuit
internal/mountsync/syncer.go
pullRemote calls ListEvents with limit=1 when EventsCursor is set and returns immediately on quiet cycles (zero new events), avoiding incremental and full-pull work.
Restart Recovery Fast-Path
internal/mountsync/syncer.go
When EventsCursor is empty but LastEventAt indicates prior tracking, pullRemote resolves and seeds the cursor via resolveLatestEventCursor, skipping full-pull bootstrap; falls back to bootstrap on cursor resolution failure.
Test Infrastructure Enhancement
internal/mountsync/syncer_test.go
fakeClient gains call-count tracking fields for ListEvents and ReadFile with increments in both methods and a helper requestedReadCalls() to expose counts.
Periodic Full-Pull Test Updates
internal/mountsync/syncer_test.go
TestPullPeriodicFullCycle appends fresh events before each cycle so the "skip-if-no-events" optimization does not suppress incremental work; validates periodic full-pull cadence independently.
Quiet Reconcile Cycle Tests
internal/mountsync/syncer_test.go
Introduces TestPullShortCircuitsWhenNoNewEvents to assert that quiet cycles probe ListEvents once but avoid ListTree and per-file ReadFile; verifies behavior changes when a new event arrives.
Restart Recovery Tests
internal/mountsync/syncer_test.go
Adds TestPullRestartFastPathSkipsFullPull to validate that restart with on-disk tracked files avoids ListTree and ReadFile, writes success timestamp, and uses only ListEvents probes on subsequent cycles.

Possibly Related PRs

  • AgentWorkforce/relayfile#90: Both PRs modify internal/mountsync pull/reconcile logic and tests—particularly the periodic full-pull and event-driven incremental paths in syncer.go and its test coverage.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A Quiet Pull

When cursors track the event stream true,
One probe per hush—skip work anew.
On restarts past, from timestamp's trace,
Rewind the feed, fast-path the race.
Tests guard both paths with watchful eyes,
No tree-full walks on silent skies! 🌙✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: adding a short-circuit to skip reconcile when there are no new events from the cloud, which directly addresses the reconcile stall bug described in the PR.
Description check ✅ Passed The description comprehensively explains the bug (reconcile stalls on workspaces with hundreds of files), root cause (unconditional full-tree pulls), fixes (short-circuit and restart fast-path), and verification (end-to-end testing), all directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mountsync-skip-empty-events

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@khaliqgant khaliqgant merged commit 39e5125 into main May 7, 2026
6 of 7 checks passed
@khaliqgant khaliqgant deleted the fix/mountsync-skip-empty-events branch May 7, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant