Skip to content

fix(mount): checkpoint incremental event pages#198

Merged
khaliqgant merged 2 commits into
mainfrom
codex/issue-197-incremental-checkpoints
May 22, 2026
Merged

fix(mount): checkpoint incremental event pages#198
khaliqgant merged 2 commits into
mainfrom
codex/issue-197-incremental-checkpoints

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • persist incremental event cursor progress after each applied ListEvents page
  • keep partial page checkpoints durable across daemon restarts
  • report backlog-draining cycles as syncing instead of fully ready until the feed tail is reached

Closes #197.

Verification

  • go test ./internal/mountsync -run 'TestPullRemoteIncremental(PersistsAppliedPageCursorOnListEventsError|ReturnsDeadlineWhenNoPageProgress|ResumesWithinAppliedPage|CheckpointPreservesChangedPath404Delete)'
  • go test ./internal/mountsync
  • scripts/check-contract-surface.sh
  • git diff --check
  • go test ./...

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Warning

Rate limit exceeded

@khaliqgant has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: adaac23b-e478-4118-822f-d42429e7b1e9

📥 Commits

Reviewing files that changed from the base of the PR and between cc418dd and b7618d8.

📒 Files selected for processing (2)
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-197-incremental-checkpoints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-22T07-19-57-322Z-HEAD-provider
Mode: provider
Git SHA: 9cef1a5

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

@khaliqgant
Copy link
Copy Markdown
Member Author

Reviewed against #197 — design and implementation satisfy what the issue asked for.

Checklist vs #197

  • Cursor persisted per applied page. pullRemoteIncremental now sets state.EventsCursor = resumeCursor AND calls s.saveState() after each successful page apply. Test TestPullRemoteIncrementalPersistsAppliedPageCursorOnListEventsError exercises the disk-persistence path explicitly via a listEventsHook that reads the on-disk state file between pages — exactly the cross-restart durability the issue worried about.
  • Deadline returns cleanly when at least one page progressed. New madeProgress gate: if madeProgress && errors.Is(err, context.DeadlineExceeded) { state.IncrementalBacklogDraining = true; return safeCursor, nil }. Convergence across cycles is guaranteed.
  • No-progress deadline still surfaces an error. TestPullRemoteIncrementalReturnsDeadlineWhenNoPageProgress covers the case where the first ListEvents call times out with zero pages applied — deadline propagates, LastError set, Syncing=false. Correct: a configuration/network issue should NOT be silently masked as "syncing".
  • Reset paths covered. IncrementalBacklogDraining = false is reset in every place where backlog has cleared (feed empty, fallback-to-full-pull, restart-fast-path seeds cursor).
  • Bonus: backlog signal in public state. publicStateFlags.Syncing + a new "syncing" status mean dashboards and relayfile status callers can distinguish "behind but draining" from "ready". LastSuccessfulReconcileAt is not bumped during partial draining — only on a true catch-up — which is the right semantic for monitoring.
  • Partial-page checkpoint persists across restart. TestPullRemoteIncrementalResumesWithinAppliedPage and TestPullRemoteIncrementalCheckpointPreservesChangedPath404Delete now construct a fresh Syncer after the deadline (simulating a daemon restart) and assert resume continues from the persisted checkpoint without re-reading already-applied files.

Optional, not implemented (was marked optional in #197)

Per-cycle page cap. Not needed; the natural cycle interval + persisted resume cursor already give bounded cycle time + convergence.

CI test failure is unrelated to this PR

Go Test fails on TestEventsAndSyncStatus (server_test.go:2033):

desc&limit=1 must return newest event ("evt_3"), got "evt_4"

This is a flaky test I authored in #196 — it computed newest = feed.Events[len(feed.Events)-1] from a prior asc call and asserted desc returned the same id. Races with any background activity that emits an event between the two calls. PR #198 doesn't touch the httpapi package; the assertion just happened to lose the race on this run.

Fixing the assertion (extract numeric event id, assert desc_id >= asc_tail_id — race-tolerant) in a separate small PR rather than gate this one on it.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bootstrap full-pull path does not clear IncrementalBacklogDraining, causing perpetual incorrect 'syncing' status

Every code path in pullRemote that performs a full reset of sync state clears IncrementalBacklogDraining — the forceFullPull closure (syncer.go:2147), the no-events short-circuit (syncer.go:2152), the 404-fallback (syncer.go:2181), and the restart fast-path (syncer.go:2235). However, the main bootstrap full-pull path at syncer.go:2261 only clears IncrementalCheckpoint and omits the IncrementalBacklogDraining = false reset.

If the daemon loads persisted state with IncrementalBacklogDraining=true (from a prior interrupted incremental drain) and then enters the bootstrap path — most concretely when forceFullReconcile=true (via --full-reconcile or RELAYFILE_FORCE_FULL_RECONCILE=true) — the flag is never cleared. Because markSyncSuccess() at syncer.go:4207 gates LastSuccessfulReconcileAt on !IncrementalBacklogDraining, the timestamp is never updated, and the public state perpetually reports status: "syncing" even though complete full-tree pulls succeed every cycle. With forceFullReconcile=true, every cycle re-enters this same path, so the condition persists indefinitely.

(Refers to line 2261)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7618d8.

The main bootstrap/full-pull path now clears IncrementalBacklogDraining immediately after pullRemoteFull succeeds, matching the other reset paths. I also extended TestFullReconcileBypassesQuietEventsShortCircuit to persist a stale backlog-draining flag before a forced full reconcile and assert the flag clears and public state returns to ready.

Verification rerun:

  • go test ./internal/mountsync -run 'TestFullReconcileBypassesQuietEventsShortCircuit|TestPullRemoteIncremental(PersistsAppliedPageCursorOnListEventsError|ReturnsDeadlineWhenNoPageProgress|ResumesWithinAppliedPage|CheckpointPreservesChangedPath404Delete)'
  • go test ./internal/mountsync
  • scripts/check-contract-surface.sh
  • git diff --check
  • go test ./...

@khaliqgant khaliqgant merged commit 9522548 into main May 22, 2026
9 checks passed
@khaliqgant khaliqgant deleted the codex/issue-197-incremental-checkpoints branch May 22, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mount: incremental sync cannot catch up — cursor falls permanently behind on busy workspaces (needs mid-cycle checkpoint like #177)

1 participant