Resume incremental pulls within event pages#177
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds resumable checkpointing to incremental remote event pulls. The persisted ChangesIncremental Checkpoint Feature
Sequence DiagramsequenceDiagram
participant Syncer
participant FakeClient
participant State
Syncer->>FakeClient: ReadFile (page start)
FakeClient->>FakeClient: Count reads
FakeClient-->>Syncer: File data (applied)
Syncer->>State: Update IncrementalCheckpoint<br/>(phase, cursor, path)
Syncer->>FakeClient: ReadFile (mid-page)
FakeClient->>FakeClient: Exceeds readFileErrAfter
FakeClient-->>Syncer: Deadline error
Syncer->>State: Persist checkpoint<br/>with partial progress
Note over Syncer,State: Interrupt/restart
Syncer->>State: Load checkpoint
Syncer->>FakeClient: ReadFile (resume)<br/>Skip prior files
FakeClient-->>Syncer: File data
Syncer->>State: Clear checkpoint<br/>Advance EventsCursor
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Relayfile Eval ReviewRun: Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesNo reviewable human-review cases captured Relayfile output. |
| if checkpoint.Phase == "changed" && remotePath <= checkpoint.Path { | ||
| continue | ||
| } | ||
| if checkpoint.Phase == "deleted" { | ||
| continue | ||
| } |
There was a problem hiding this comment.
🟡 Checkpoint skip logic drops 404-to-delete transitions for paths sorting before the checkpoint
When resuming from a checkpoint with Phase="changed", paths that sort lexicographically before or equal to checkpoint.Path are skipped entirely (line 2986). However, in the original (failed) run, some of those skipped paths may have received a 404 from ReadFile and been added to the local deleted map (line 2996) — without updating the checkpoint (since the checkpoint only advances on successful applyRemoteFile). Because the delete phase (deletedPaths loop at line 3024) only runs AFTER the changed loop completes, and the changed loop errored out before completing, those 404-redirected deletes were never applied.
On resume, the skipped paths are never re-read via ReadFile, so they're never re-added to the deleted map. The local file persists as stale until the next periodic full pull.
Example scenario
changedPaths (sorted): [A, B, C, D]
- A: ReadFile → 404 → added to deleted map, no checkpoint update
- B: ReadFile → success → applyRemoteFile → checkpoint = {Phase: "changed", Path: B}
- C: ReadFile → timeout → return error
On resume (checkpoint.Path = B):
- A: skipped (A <= B) — never re-read, never added to deleted
- B: skipped (B <= B)
- C, D: processed normally
- Delete phase: A is missing from the deleted set
A's local file persists indefinitely (until next full pull).
This is a regression: without checkpoints, the entire page was retried and the 404 would be re-encountered, properly routing the file to deletion.
Prompt for agents
In applyIncrementalChanges, when checkpoint.Phase=="changed" and we skip paths <= checkpoint.Path, files that got a 404 (added to the deleted map) in the original run but sort before the checkpoint are silently dropped. The checkpoint only advances on successful applyRemoteFile, so 404'd paths are never checkpointed and are lost on resume.
Possible fix approaches:
1. Record 404'd paths in the checkpoint itself (e.g. a list of paths that need deletion), so they can be replayed on resume.
2. Do NOT skip 404-vulnerable paths — only skip paths that were actually confirmed applied (i.e. paths that appear in s.state.Files with the expected revision from this cycle).
3. Accept the trade-off but document it, relying on the periodic full-pull cadence to self-heal.
The simplest fix might be to not skip paths in the changed loop that don't have a tracked entry matching the current cycle's revision — those are the ones that might have gotten 404 without being checkpointed. But this would require knowing which tracked files were updated in this cycle vs previously, which adds complexity.
Alternatively, for paths being skipped, check if they exist in s.state.Files with a current revision — if not, they may need to be re-read.
Was this helpful? React with 👍 or 👎 to provide feedback.
19c8745 to
401f4fc
Compare
|
Addressed Devin feedback on changed-path 404 handling in 401f4fc. Changed-path 404 and 403 outcomes now count as checkpointed work only after their local effect is applied, so a later timeout cannot cause an earlier 404-routed delete to be skipped on resume. Added Validation passed:
|
Summary
Issue coverage
Addresses the latest #175 comment: #175 (comment)
This closes the remaining non-convergence case where a cycle can consume an event page, apply some sorted ReadFile paths, hit
context deadline exceeded, then restart from the same page and reread early alphabetical paths forever.Verification
go test ./internal/mountsync -run 'TestPullRemoteIncremental(ResumesWithinAppliedPage|PersistsAppliedPageCursorOnListEventsError)|TestQuietEventCyclesEventuallyRunPeriodicFullPull|TestScanLocalFilesLogsOversizedFileOncePerSize' -count=3\n-go test -count=1 ./...\n-scripts/check-contract-surface.sh\n-git diff --check