Skip to content

Resume incremental pulls within event pages#177

Merged
khaliqgant merged 1 commit into
mainfrom
codex/issue-175-incremental-watermark
May 21, 2026
Merged

Resume incremental pulls within event pages#177
khaliqgant merged 1 commit into
mainfrom
codex/issue-175-incremental-watermark

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • add an incremental checkpoint in mount state for the currently applied event page: source cursor, page cursor, phase, and last applied path
  • update incremental replay to skip already-applied changed/deleted paths when a prior cycle timed out mid-page
  • clear stale checkpoints when a page completes or when the daemon falls back to/full-seeds from a full reconcile

Issue coverage

Addresses the latest #175 comment: #175 (comment)

This closes the remaining non-convergence case where a cycle can consume an event page, apply some sorted ReadFile paths, hit context deadline exceeded, then restart from the same page and reread early alphabetical paths forever.

Verification

  • go test ./internal/mountsync -run 'TestPullRemoteIncremental(ResumesWithinAppliedPage|PersistsAppliedPageCursorOnListEventsError)|TestQuietEventCyclesEventuallyRunPeriodicFullPull|TestScanLocalFilesLogsOversizedFileOncePerSize' -count=3\n- go test -count=1 ./...\n- scripts/check-contract-surface.sh\n- git diff --check

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

Warning

Rate limit exceeded

@khaliqgant has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 54 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d61ca82b-fca0-4c6f-ba7d-155e8bc213da

📥 Commits

Reviewing files that changed from the base of the PR and between 19c8745 and 401f4fc.

📒 Files selected for processing (2)
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go
📝 Walkthrough

Walkthrough

This PR adds resumable checkpointing to incremental remote event pulls. The persisted IncrementalCheckpoint tracks progress within a page (cursor range, phase, path), enabling the syncer to skip already-applied files on resume after interruption. Checkpoint is updated during "changed" and "deleted" processing and validated/applied to conditionally skip reprocessing.

Changes

Incremental Checkpoint Feature

Layer / File(s) Summary
Checkpoint data contract and validation
internal/mountsync/syncer.go
mountState persists an IncrementalCheckpoint pointer; new incrementalCheckpoint struct holds page cursors, phase, and path; applyIncrementalChanges signature extended to accept checkpoint and conditionally skip "changed" entries based on checkpoint phase/path ordering.
Checkpoint persistence during incremental processing
internal/mountsync/syncer.go
During "changed" and "deleted" phases, checkpoint is persisted after each file is applied, capturing page cursors, phase, and current remote path to enable resume without reprocessing.
Resume testing infrastructure and validation
internal/mountsync/syncer_test.go
fakeClient gains readFileErrAfter and readFileErr fields to inject deterministic ReadFile failures; new TestPullRemoteIncrementalResumesWithinAppliedPage verifies checkpoint saves progress and mid-page recovery, then resumes skipping already-applied files and advancing the events cursor.

Sequence Diagram

sequenceDiagram
  participant Syncer
  participant FakeClient
  participant State
  Syncer->>FakeClient: ReadFile (page start)
  FakeClient->>FakeClient: Count reads
  FakeClient-->>Syncer: File data (applied)
  Syncer->>State: Update IncrementalCheckpoint<br/>(phase, cursor, path)
  Syncer->>FakeClient: ReadFile (mid-page)
  FakeClient->>FakeClient: Exceeds readFileErrAfter
  FakeClient-->>Syncer: Deadline error
  Syncer->>State: Persist checkpoint<br/>with partial progress
  Note over Syncer,State: Interrupt/restart
  Syncer->>State: Load checkpoint
  Syncer->>FakeClient: ReadFile (resume)<br/>Skip prior files
  FakeClient-->>Syncer: File data
  Syncer->>State: Clear checkpoint<br/>Advance EventsCursor
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly Related PRs

  • AgentWorkforce/relayfile#90: Both PRs modify incremental sync behavior in pullRemoteIncremental; this PR adds per-page resumable checkpointing while the other adds content-hash defenses and periodic full-pull forcing.

Poem

A checkpoint marks the path we've come,
No reprocessing when work's undone—
The page resumes right where it fell,
Through deadline storms our state stays well! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Resume incremental pulls within event pages' directly and concisely summarizes the main change: adding checkpoint resumption capability for incremental pull operations within event page boundaries.
Description check ✅ Passed The description clearly relates to the changeset, detailing the addition of incremental checkpoints, skip logic for already-applied paths, and checkpoint clearing, all of which align with the code changes in syncer.go and syncer_test.go.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-175-incremental-watermark

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-21T10-23-14-685Z-HEAD-provider
Mode: provider
Git SHA: 55ee26c

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +2986 to +2991
if checkpoint.Phase == "changed" && remotePath <= checkpoint.Path {
continue
}
if checkpoint.Phase == "deleted" {
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Checkpoint skip logic drops 404-to-delete transitions for paths sorting before the checkpoint

When resuming from a checkpoint with Phase="changed", paths that sort lexicographically before or equal to checkpoint.Path are skipped entirely (line 2986). However, in the original (failed) run, some of those skipped paths may have received a 404 from ReadFile and been added to the local deleted map (line 2996) — without updating the checkpoint (since the checkpoint only advances on successful applyRemoteFile). Because the delete phase (deletedPaths loop at line 3024) only runs AFTER the changed loop completes, and the changed loop errored out before completing, those 404-redirected deletes were never applied.

On resume, the skipped paths are never re-read via ReadFile, so they're never re-added to the deleted map. The local file persists as stale until the next periodic full pull.

Example scenario

changedPaths (sorted): [A, B, C, D]

  • A: ReadFile → 404 → added to deleted map, no checkpoint update
  • B: ReadFile → success → applyRemoteFile → checkpoint = {Phase: "changed", Path: B}
  • C: ReadFile → timeout → return error

On resume (checkpoint.Path = B):

  • A: skipped (A <= B) — never re-read, never added to deleted
  • B: skipped (B <= B)
  • C, D: processed normally
  • Delete phase: A is missing from the deleted set

A's local file persists indefinitely (until next full pull).

This is a regression: without checkpoints, the entire page was retried and the 404 would be re-encountered, properly routing the file to deletion.

Prompt for agents
In applyIncrementalChanges, when checkpoint.Phase=="changed" and we skip paths <= checkpoint.Path, files that got a 404 (added to the deleted map) in the original run but sort before the checkpoint are silently dropped. The checkpoint only advances on successful applyRemoteFile, so 404'd paths are never checkpointed and are lost on resume.

Possible fix approaches:
1. Record 404'd paths in the checkpoint itself (e.g. a list of paths that need deletion), so they can be replayed on resume.
2. Do NOT skip 404-vulnerable paths — only skip paths that were actually confirmed applied (i.e. paths that appear in s.state.Files with the expected revision from this cycle).
3. Accept the trade-off but document it, relying on the periodic full-pull cadence to self-heal.

The simplest fix might be to not skip paths in the changed loop that don't have a tracked entry matching the current cycle's revision — those are the ones that might have gotten 404 without being checkpointed. But this would require knowing which tracked files were updated in this cycle vs previously, which adds complexity.

Alternatively, for paths being skipped, check if they exist in s.state.Files with a current revision — if not, they may need to be re-read.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@khaliqgant khaliqgant force-pushed the codex/issue-175-incremental-watermark branch from 19c8745 to 401f4fc Compare May 21, 2026 10:21
@khaliqgant
Copy link
Copy Markdown
Member Author

Addressed Devin feedback on changed-path 404 handling in 401f4fc.

Changed-path 404 and 403 outcomes now count as checkpointed work only after their local effect is applied, so a later timeout cannot cause an earlier 404-routed delete to be skipped on resume. Added TestPullRemoteIncrementalCheckpointPreservesChangedPath404Delete to cover the exact case.

Validation passed:

  • targeted mountsync tests with -count=3
  • go test -count=1 ./...
  • scripts/check-contract-surface.sh
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant