Skip to content

Fall back to tree pull when export exceeds cap or DO overloads (#192)#195

Merged
khaliqgant merged 1 commit into
mainfrom
codex/issue-192-do-overload
May 21, 2026
Merged

Fall back to tree pull when export exceeds cap or DO overloads (#192)#195
khaliqgant merged 1 commit into
mainfrom
codex/issue-192-do-overload

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Extends the truncated-export fallback (PR that landed 451ff91) to two further failure modes that wedge bootstrap on large workspaces.

On rw_fc7b534b (~29k files, ~3.5 GB export body), bootstrap was wedging because the full-tree export endpoint can't fit the workspace into one response:

  • http 413 payload_too_large — the cloud caps the export body at 128 MB and responds: workspace export body is N bytes, which exceeds the export body limit of 134217728; use paginated tree/read APIs instead. This is the persistent, deterministic blocker.
  • http 500 internal_error: Durable Object is overloaded — under load the export also intermittently fails this way (it's the secondary symptom of the same problem: the export forces the DO to serialize the entire workspace in one invocation).

exportSnapshotUnsupported recognized neither, so pullRemoteFullExport retried the doomed export every cycle and bootstrap never progressed past where the DO began to fail (observed wedging at ~14k files).

Classify both 413 payload_too_large and 5xx-with-"overloaded" as unsupported so pullRemoteFull falls through to pullRemoteFullTree, whose paginated ListTree + per-file reads are individually bounded and resume from the persisted cursor — exactly what the 413 body instructs. Bare 5xx without the overload signal still retries the export so genuinely transient server errors are unaffected.

Test plan

  • go test ./internal/mountsync/... -count=1 — full package green, including:
    • TestReconcileFallsBackToTreeWhenExportJSONTruncated (existing, unaffected)
    • TestReconcileFallsBackToTreeWhenExportDurableObjectOverloaded (new)
    • TestReconcileFallsBackToTreeWhenExportPayloadTooLarge (new)
    • TestExportSnapshotOverloadedClassification (new — positive cases for 413/overload, negative cases asserting transient 5xx still retries)
  • Live verified on rw_fc7b534b: pre-fix the daemon wedged at ~14.3k files retrying the failing export. Post-fix the daemon broke past the wall (14,317 → 15,095 → 16,095 → … → 25,013 BOOTSTRAP COMPLETE, then grew to 29,282 via incremental sync). No cycle failed: 413 lines appeared after the fix took over — the fallback swallows the error cleanly.

Related

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

Warning

Rate limit exceeded

@khaliqgant has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 1 second before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e115e77b-2bab-445d-8f18-83b0a2f2dc81

📥 Commits

Reviewing files that changed from the base of the PR and between 774bd92 and 5b57b57.

📒 Files selected for processing (2)
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go
📝 Walkthrough

Walkthrough

This PR extends the export error classification to detect truncated JSON responses, Durable Object overload conditions, and payload-too-large errors, treating them as unsupported so the syncer falls back to a paginated list-tree pull instead of retrying. New test coverage validates the classification logic and confirms reconcile fallback behavior under these failure modes.

Changes

Export fallback and classification

Layer / File(s) Summary
Enhanced export error classification
internal/mountsync/syncer.go
exportSnapshotUnsupported detects truncated JSON (io.ErrUnexpectedEOF, JSON syntax errors), Durable Object overload (5xx with "overloaded" signal), and 413 Request Entity Too Large, delegating to helper functions for each check. Falls through to existing 404 and 400 bad_request logic for remaining HTTP errors.
Test infrastructure and classification unit test
internal/mountsync/syncer_test.go
fakeExportClient gains exportErr field; ExportFiles returns it early before enumeration. TestExportSnapshotOverloadedClassification validates that unsupported errors (truncation, overload, 413) return true and transient errors return false.
Integration tests for export fallback scenarios
internal/mountsync/syncer_test.go
Three Syncer.Reconcile tests confirm fallback: TestReconcileFallsBackToTreeWhenExportJSONTruncated (truncated JSON), TestReconcileFallsBackToTreeWhenExportDurableObjectOverloaded (HTTP 500 overload), TestReconcileFallsBackToTreeWhenExportPayloadTooLarge (HTTP 413). Each verifies exactly one export attempt followed by list-tree bootstrap and file reads.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 When exports truncate and Durable Objects sigh,
No more do we retry, we simply say goodbye!
We fall back to the list-tree, that steady, patient way—
Partial JSON truncations won't ruin our day. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: extending fallback behavior when export hits payload limits or DO overload, directly addressing issue #192.
Description check ✅ Passed The description comprehensively explains the problem, solution, test plan, and live verification on the affected workspace, all directly related to the changeset.
Linked Issues check ✅ Passed The PR fully addresses issue #192's objective: classifying truncated JSON, 413 payload-too-large, and 500-overloaded responses as unsupported to fall back to paginated tree pull, with comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are scoped to the export-fallback enhancement in syncer.go and related test coverage; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-192-do-overload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/mountsync/syncer_test.go (1)

636-637: ⚡ Quick win

Add typed truncation errors to classification coverage.

These tests currently validate truncation mostly via error-message strings. Add typed cases (io.ErrUnexpectedEOF, *json.SyntaxError) so classification coverage remains stable if message text changes.

Proposed test hardening
 func TestExportSnapshotOverloadedClassification(t *testing.T) {
 	unsupported := []error{
+		io.ErrUnexpectedEOF,
+		&json.SyntaxError{Offset: 12},
 		&HTTPError{StatusCode: 500, Code: "internal_error", Message: "Durable Object is overloaded. Requests queued for too long."},
 		&HTTPError{StatusCode: 503, Message: "Worker overloaded"},
 		errors.New("http 500 internal_error: Durable Object is overloaded. Requests queued for too long."),
 		&HTTPError{StatusCode: 413, Code: "payload_too_large", Message: "workspace export body is 3524788058 bytes, which exceeds the export body limit of 134217728; use paginated tree/read APIs instead"},
 	}

Also applies to: 709-715

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mountsync/syncer_test.go` around lines 636 - 637, Update the test
cases that currently assert truncation by error-message strings to include typed
truncation errors so classification doesn't depend on text: replace or add cases
using io.ErrUnexpectedEOF and a *json.SyntaxError (e.g., construct
&json.SyntaxError{Offset:...}) as the exportErr values in the table used by the
tests (the entries around the exportErr field in
internal/mountsync/syncer_test.go and the similar entries at the later block
near lines 709-715). Ensure the test logic that classifies truncation treats
these typed errors the same way as the current string-based cases so the
assertions still pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/mountsync/syncer_test.go`:
- Around line 636-637: Update the test cases that currently assert truncation by
error-message strings to include typed truncation errors so classification
doesn't depend on text: replace or add cases using io.ErrUnexpectedEOF and a
*json.SyntaxError (e.g., construct &json.SyntaxError{Offset:...}) as the
exportErr values in the table used by the tests (the entries around the
exportErr field in internal/mountsync/syncer_test.go and the similar entries at
the later block near lines 709-715). Ensure the test logic that classifies
truncation treats these typed errors the same way as the current string-based
cases so the assertions still pass.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3fb64c7d-0600-4d58-9846-518aad5fc969

📥 Commits

Reviewing files that changed from the base of the PR and between aea7f6d and 774bd92.

📒 Files selected for processing (2)
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go

…loads

On large workspaces the single full-tree export cannot complete:

- The cloud caps the export body at 128MB and responds `http 413
  payload_too_large: workspace export body is N bytes ... use paginated
  tree/read APIs instead` (observed: a 3.5GB export body on rw_fc7b534b).
- Under load the export also intermittently fails `http 500
  internal_error: Durable Object is overloaded`.

exportSnapshotUnsupported recognized neither, so pullRemoteFullExport
retried the doomed export every cycle and bootstrap wedged (~14k files
on rw_fc7b534b, never completing).

Classify both 413 payload-too-large and 5xx-with-"overloaded" as
unsupported so pullRemoteFull falls through to pullRemoteFullTree, whose
paginated ListTree + per-file reads are individually bounded and resume
from the persisted cursor — exactly what the 413 body instructs. Bare
5xx without the overload signal still retries the export, so genuinely
transient server errors are unaffected.

Verified live on rw_fc7b534b: bootstrap broke past the prior export
wall (~14.3k) and resumed climbing via the paginated tree path.

Stacks on the truncated-export fallback (#192).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-21T17-51-43-294Z-HEAD-provider
Mode: provider
Git SHA: f3fa95b

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@khaliqgant khaliqgant merged commit 8f386f1 into main May 21, 2026
8 checks passed
@khaliqgant khaliqgant deleted the codex/issue-192-do-overload branch May 21, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mount: unexpected end of JSON input on every full-export cycle (distinct from #175 H stall)

1 participant