Skip to content

fix(writeback): mount-daemon dead-letter + writeback status/retry CLI#84

Merged
khaliqgant merged 5 commits intomainfrom
fix/writeback-reliability-impl
May 5, 2026
Merged

fix(writeback): mount-daemon dead-letter + writeback status/retry CLI#84
khaliqgant merged 5 commits intomainfrom
fix/writeback-reliability-impl

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Relayfile-side fix for the productized cloud-mount writeback path. Implements phases 1 + 4 of the writeback-reliability spec at AgentWorkforce/cloud#448. This is the inspected + scope-pruned implementation produced by workflow #83.

Phase 1 — mount daemon stops silently dropping writes.

  • Every non-2xx PUT response logged with status + truncated body
  • New FailedWritebacks counter persisted in state.json (additive, backwards compatible)
  • After retries exhausted, daemon writes ~/relayfile-mount/.relay/dead-letter/<opId>.json with op metadata
  • writeback_daemon_test.go drives an httptest 4xx through the daemon and asserts the dead-letter file appears AND failedWritebacks > 0

Phase 4 — CLI surfaces.

  • relayfile writeback status [WORKSPACE] [--json]: pending / failed / dead-lettered counts + most-recent error per provider; --json mode for machine parsing; non-zero exit iff failures
  • relayfile writeback retry --opId OP [WORKSPACE]: re-enqueues, removes dead-letter file on success
  • Both handle missing state.json / dead-letter dir gracefully
  • writeback_status_test.go builds a fixture mount and asserts CLI output

internal/mountsync/syncer.go adds 17 lines threading the failed-writeback event through the existing reconcile pipeline (no behaviour change to the read side).

Cloud-side phases (plus / 2 / 3) ship in AgentWorkforce/cloud#452.

Verification

$ go build ./...
(clean)

$ go test -count=1 -run TestWritebackDeadLetter ./cmd/relayfile-cli/
ok  github.com/agentworkforce/relayfile/cmd/relayfile-cli  1.797s

$ go test -count=1 ./...
ok  github.com/agentworkforce/relayfile/cmd/relayfile          0.192s
ok  github.com/agentworkforce/relayfile/cmd/relayfile-cli      5.493s
ok  github.com/agentworkforce/relayfile/cmd/relayfile-mount    0.769s
ok  github.com/agentworkforce/relayfile/internal/httpapi       1.952s
ok  github.com/agentworkforce/relayfile/internal/mountfuse     0.621s
ok  github.com/agentworkforce/relayfile/internal/mountsync     3.217s
ok  github.com/agentworkforce/relayfile/internal/relayfile     2.427s
ok  github.com/agentworkforce/relayfile/internal/schema        0.907s

All 8 packages green, including the existing test suites — no regressions from the new daemon hooks or state.json schema addition.

Test plan

  • New writeback_daemon_test.go test passes
  • New writeback_status_test.go test passes
  • go build ./... clean
  • go test ./... green across all packages
  • After merge: pair with cloud#452 to validate end-to-end — local edit on a real mount surfaces in Notion within the poll interval, relayfile writeback status reports the right counts during a forced 4xx repro

Closes part of AgentWorkforce/cloud#448.

🤖 Generated with Claude Code

khaliqgant and others added 3 commits May 5, 2026 14:48
Implements relayfile-side phases (1, 4) of the cross-repo writeback
reliability spec at AgentWorkforce/cloud#448.

Pattern: relay-80-100. The workflow does not commit until:
  - new Go test exercises a 4xx response from a httptest server and
    asserts a dead-letter file gets written
  - go build ./... clean
  - go test ./... green (existing suite + the two new tests)
  - claude reviewer signs off on each phase diff

Team split per writing-agent-relay-workflows skill rule §5:
  lead / impl / tester → codex
  reviewer            → claude

The cloud-side phases (plus / 2 / 3) live in a sibling workflow on
the cloud repo.

Validates:
- agent-relay run --dry-run: PASS (0 errors, 0 warnings, 30 steps,
  26 waves, peak concurrency 4)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex-as-lead and codex-as-impl on the same channel both interpreted
"owns the channel" / "edit cmd/relayfile-cli/main.go" literally and
fought over file ownership. The retry budget exhausted at attempt 3:

  08:23 impl-daemon-fix-r3 → "taking ownership... will only edit main.go"
  09:15 lead-coordinate    → "Stop. I have already taken ownership"
  09:29 impl-daemon-fix-r3 → OWNER_DECISION: INCOMPLETE_RETRY

With only two phases × two implementers there's no real coordinator
to add. The impl agents read the spec directly from {{steps.read-spec.output}};
reviewer-claude is the inter-phase quality gate; collect-evidence is
the deterministic merge gate. No lead step needed.

Removed:
- lead-codex agent
- lead-coordinate step
- lead-signoff step

Validates:
- agent-relay run --dry-run: PASS (0 errors, 0 warnings, 28 steps,
  25 waves, peak concurrency 4). impl-daemon-fix starts in wave 5
  directly from read-writeback-region.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements relayfile-side phases (1, 4) of the writeback-reliability
spec (AgentWorkforce/cloud#448). Authored as the fix output of
workflow #83 (workflows/057-writeback-reliability-mount-and-cli.ts),
inspected and pruned to scope.

Phase 1 — mount daemon stops silently dropping writes:
- Every non-2xx PUT response logged with status + truncated body
- New FailedWritebacks counter on the daemon state, persisted in
  state.json (additive, backwards compatible — older mounts deserialize
  cleanly with default 0)
- After retries exhausted, daemon writes
  ~/relayfile-mount/.relay/dead-letter/<opId>.json with op metadata
  ({ opId, path, attempts, lastStatus, lastBody, ts })
- writeback_daemon_test.go: drives an httptest 4xx through the daemon,
  asserts the dead-letter file appears AND failedWritebacks > 0

Phase 4 — CLI surfaces:
- `relayfile writeback status [WORKSPACE] [--json]`: prints pending /
  failed / dead-lettered counts and most-recent error per provider;
  --json mode for machine consumption; exit code non-zero iff there
  are failures (so CI can gate)
- `relayfile writeback retry --opId OP [WORKSPACE]`: re-enqueues a
  dead-lettered op and removes the dead-letter file on success
- Both gracefully handle missing dead-letter dir / state.json
- writeback_status_test.go: builds a fixture mount layout, runs the
  CLI as a subprocess, asserts output

internal/mountsync/syncer.go: 17 lines threading the failed-writeback
event through the existing reconcile pipeline (no behaviour change to
the read side).

Cloud-side phases (plus / 2 / 3) ship in AgentWorkforce/cloud#452.

Verification:
- go build ./... clean
- go test -count=1 -run TestWritebackDeadLetter ./cmd/relayfile-cli/
  → ok 1.797s
- go test -count=1 -run TestWritebackStatus ./cmd/relayfile-cli/
  → ok (covered by full suite)
- go test -count=1 ./...
  → all 8 packages green (cmd/relayfile, cmd/relayfile-cli,
    cmd/relayfile-mount, internal/httpapi, internal/mountfuse,
    internal/mountsync, internal/relayfile, internal/schema)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6583410b-8c0a-47f1-be72-2fd659342517

📥 Commits

Reviewing files that changed from the base of the PR and between 4351d8c and c7190c9.

📒 Files selected for processing (3)
  • cmd/relayfile-cli/main.go
  • cmd/relayfile-cli/writeback_status_test.go
  • workflows/057-writeback-reliability-mount-and-cli.ts
✅ Files skipped from review due to trivial changes (1)
  • workflows/057-writeback-reliability-mount-and-cli.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/relayfile-cli/writeback_status_test.go

📝 Walkthrough

Walkthrough

Adds local writeback-failure capture and retry tooling: an HTTP transport that samples failed writeback requests and writes per-op dead-letter JSON, a persisted FailedWritebacks counter with mutexed access in .relay/state.json, and relayfile writeback status/retry CLI commands with tests and workflow orchestration.

Changes

Writeback Failure Tracking & CLI Management

Layer / File(s) Summary
Data Shape / Public State
internal/mountsync/syncer.go
Adds FailedWritebacks uint64 to publicState and reads existing persisted value when constructing snapshots (readPublicFailedWritebacks).
State Persistence & Locking
cmd/relayfile-cli/main.go (248–254, 286–287, 3886–3922, 3924–3969)
Adds FailedWritebacks to syncStateFile model, introduces failedWritebacksStateMu and helpers to read/increment persisted FailedWritebacks safely (readPersistedFailedWritebacks, incrementFailedWritebacksInState, helpers).
Mount Snapshot Reconciliation
cmd/relayfile-cli/main.go (3844–3848, 3369–3969)
buildSyncStateSnapshot and writeMirrorStateFile reconcile the persisted FailedWritebacks into emitted .relay/state.json snapshots under the mutex.
HTTP Transport & Failure Capture
cmd/relayfile-cli/main.go (2427–2434, 4266–4541)
Adds writebackFailureTransport wired into the mount HTTP client to intercept PUT /fs/file and POST /fs/bulk, sample/restore request/response bodies, extract op IDs/paths, track per-attempt state, increment persisted failure counter on non-2xx, and emit dead-letter JSON when retry exhaustion is reached.
CLI Subcommands & Retry Logic
cmd/relayfile-cli/main.go (336–409, 1369–1507, 1622–1764, 1766–1894)
Adds writeback subcommands (status, retry), dead-letter record types and printing (--json/human), status reporting inclusive of persisted counter and dead-letter files, and retry flow that validates records, maps remote paths to local files, replays writes via mountsync.HandleLocalChange, and deletes dead-letter files on success.
Daemon Tests
cmd/relayfile-cli/writeback_daemon_test.go
Adds TestWritebackDaemonDeadLettersHTTP400 exercising an httptest server returning 400 to verify dead-letter JSON creation and FailedWritebacks snapshot reporting.
CLI Command Tests
cmd/relayfile-cli/writeback_status_test.go
Adds end-to-end tests covering writeback status (human + --json, failure/no-failure/missing-state cases) and writeback retry (unknown opId, malformed record preserved).
Workflow & Gating
workflows/057-writeback-reliability-mount-and-cli.ts
Adds a phased workflow coordinating daemon and CLI changes, deterministic build/test gates, and targeted acceptance tests for the new behavior.

Sequence Diagram

sequenceDiagram
    participant Client as Local Client
    participant Mount as Mount Loop
    participant Transport as writebackFailureTransport
    participant Server as Remote Server
    participant State as .relay/state.json
    participant CLI as writeback CLI

    Client->>Mount: HandleLocalChange (PUT /fs/file or POST /fs/bulk)
    Mount->>Transport: Perform HTTP request
    Transport->>Server: Send request
    Server-->>Transport: HTTP non-2xx response (e.g., 400)

    alt Retries remain
        Transport->>Transport: Increment attempt counter
        Transport-->>Mount: Return error (retryable)
    else Retries exhausted
        Transport->>State: Increment FailedWritebacks
        Transport->>State: Write dead-letter JSON (opId, path, attempts, lastStatus, lastBody, ts)
        Transport-->>Mount: Return error (dead-lettered)
    end

    Mount->>State: Save sync snapshot (includes FailedWritebacks)
    CLI->>State: writeback status
    State-->>CLI: Return failed count + dead-letter list

    CLI->>State: writeback retry (opId)
    State-->>CLI: Return dead-letter record
    CLI->>Mount: Re-enqueue local change via HandleLocalChange
    Mount->>Transport: Retry HTTP request
    Server-->>Transport: HTTP 200 response
    Transport->>State: On success, dead-letter file removed
    CLI-->>CLI: Return success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled logs by lamplight late,

Dead letters stacked beside the gate.
I count the hops, I patch the track,
Retry the trail and tuck them back.
A hop, a patch — the filesystem sings.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding mount-daemon dead-letter support and writeback status/retry CLI commands for writeback reliability.
Description check ✅ Passed The description comprehensively covers the pull request scope, including phase 1 (mount daemon changes) and phase 4 (CLI commands), with clear verification and test plan sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/writeback-reliability-impl

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38138f1ab7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/relayfile-cli/main.go Outdated
Comment on lines +1441 to +1442
if report.Failed > 0 || len(report.DeadLettered) > 0 {
return errWritebackFailuresPresent
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid failing status for recovered writebacks

When a writeback gets a transient 429/5xx that later succeeds on the built-in HTTP retry path, RoundTrip still increments failedWritebacks for the earlier non-2xx attempt, but success only clears the in-memory attempt key and never decrements or resets that persisted counter. Because writeback status returns errWritebackFailuresPresent whenever report.Failed > 0, a fully recovered mount with no pending work and no dead-letter files will keep exiting non-zero forever after any transient retry, making the new status command unusable as a current-health check.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/relayfile-cli/main.go`:
- Around line 1616-1637: resolveWorkspaceLikeStatus currently returns an error
when loadCredentials() fails, preventing local-only inspection; change it to not
fail fast: if loadCredentials() returns an error, continue with zero/nil creds
(do not return) so resolveToken("", creds) and resolveWorkspaceIDWithToken can
still run with an empty token, then proceed to workspaceRecordByID(workspaceID)
and populate record.ID/Name as before. Keep the existing calls to resolveToken,
resolveWorkspaceIDWithToken, and workspaceRecordByID but remove the early return
on loadCredentials() failure so local workspaces.json + mirror can be inspected
without valid credentials.
- Around line 1441-1443: The exit condition should be driven by
current/unresolved failures, not the lifetime counter; remove any use of the
cumulative FailedWritebacks counter and ensure the non-zero exit is based only
on actionable fields (e.g., report.Failed and len(report.DeadLettered) or any
other current/unresolved failure fields), so update the check around
report.Failed/report.DeadLettered in main.go to not consult or return based on
FailedWritebacks and instead rely solely on the report's live failure state.
- Around line 1740-1788: The retry logic in retryDeadLetterWriteback assumes the
remote mount root is "/" which breaks mounts created with a non-root remote
path; update retryDeadLetterWriteback to read the original mount root from the
workspace record (e.g., record.RemoteRoot or record.RemotePath, defaulting to
"/") and pass that into mountsync.NewSyncer via SyncerOptions.RemoteRoot instead
of "/", and adjust retryRelativePath (or create a variant) to strip that remote
root prefix from dead-letter remote paths (using deadLetterRetryPaths output)
before resolving them to local paths so /github/file.md maps to
<localDir>/file.md for a mount rooted at /github. Ensure references:
retryDeadLetterWriteback, deadLetterRetryPaths, retryRelativePath, and
mountsync.NewSyncer / SyncerOptions.RemoteRoot are updated accordingly.

In `@workflows/057-writeback-reliability-mount-and-cli.ts`:
- Around line 291-320: The workflow steps run-daemon-test and
run-daemon-test-final use the -run flag value "TestWritebackDeadLetter" which
doesn't match the actual test name TestWritebackDaemonDeadLettersHTTP400, so the
new test never executes; update the command strings in steps run-daemon-test and
run-daemon-test-final to use a regex that matches the new test (e.g.,
TestWritebackDaemonDeadLetters or TestWritebackDaemonDeadLettersHTTP400) or
broaden to TestWriteback* so the dead-letter test is actually run, and keep
captureOutput/failOnError/verification behavior the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 37f71ded-f556-4043-a1e0-1436d5c918ea

📥 Commits

Reviewing files that changed from the base of the PR and between 5684b42 and 38138f1.

📒 Files selected for processing (5)
  • cmd/relayfile-cli/main.go
  • cmd/relayfile-cli/writeback_daemon_test.go
  • cmd/relayfile-cli/writeback_status_test.go
  • internal/mountsync/syncer.go
  • workflows/057-writeback-reliability-mount-and-cli.ts

Comment thread cmd/relayfile-cli/main.go Outdated
Comment thread cmd/relayfile-cli/main.go
Comment thread cmd/relayfile-cli/main.go
Comment thread workflows/057-writeback-reliability-mount-and-cli.ts
Addresses two coverage gaps in the writeback-reliability spec
acceptance criteria flagged in self-review of PR #84:

P4.3 — "Both subcommands handle a missing dead-letter dir / state.json
gracefully (don't panic)." The original
TestWritebackStatusNoFailures created state.json with
failedWritebacks:0 and no dead-letter dir, so it covers the
missing-dead-letter-dir half. Adds TestWritebackStatusMissingStateJSON
which deliberately omits state.json itself and asserts the CLI prints
"no failures" and exits 0.

P4.2 — `writeback retry --opId OP` had implementation but no test.
Adds two sad-path tests:

- TestWritebackRetryUnknownOpIDFailsCleanly — opId references no
  dead-letter file. Asserts a clear "unknown dead-letter op" error
  rather than a panic.
- TestWritebackRetryRejectsMalformedRecord — dead-letter file exists
  but contents don't parse as JSON. Asserts "invalid dead-letter
  record" error AND that the malformed file is preserved (the user
  can inspect it; we never silently delete bad data).

The retry happy-path is intentionally not E2E-tested here. It would
require a full HTTP mock of the cloud writeback API plus a local file
matching the dead-letter record's path — substantially overlapping
with TestWritebackDaemonDeadLettersHTTP400 from writeback_daemon_test.go
which already exercises the failure-injection side of the same code
path. Documented inline.

Verification:
- go test -count=1 -run TestWriteback ./cmd/relayfile-cli/
  → 6/6 pass (1 daemon + 5 status/retry)
- go test -count=1 ./...
  → 8/8 packages green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cmd/relayfile-cli/writeback_status_test.go (2)

141-174: ⚡ Quick win

Add missing-state coverage for writeback status --json.

This test currently validates only human output when state.json is absent. Adding a JSON-mode assertion would close the contract gap for the same graceful path.

Suggested extension
 func TestWritebackStatusMissingStateJSON(t *testing.T) {
@@
 	if got := strings.ToLower(human.String()); !strings.Contains(got, "no failures") {
 		t.Fatalf("expected no-failures marker when state is absent, got: %q", human.String())
 	}
+
+	var jsonOut bytes.Buffer
+	if err := run([]string{"writeback", "status", "demo", "--json"}, strings.NewReader(""), &jsonOut, io.Discard); err != nil {
+		t.Fatalf("expected no error in --json mode when state.json is missing, got %v", err)
+	}
+	var report struct {
+		Failed       int        `json:"failed"`
+		DeadLettered []struct{} `json:"deadLettered"`
+	}
+	if err := json.Unmarshal(jsonOut.Bytes(), &report); err != nil {
+		t.Fatalf("parse --json output failed: %v\npayload:\n%s", err, jsonOut.String())
+	}
+	if report.Failed != 0 || len(report.DeadLettered) != 0 {
+		t.Fatalf("expected failed=0 and no dead-letter entries, got failed=%d deadLettered=%d", report.Failed, len(report.DeadLettered))
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/relayfile-cli/writeback_status_test.go` around lines 141 - 174, Add a
JSON-mode assertion to TestWritebackStatusMissingStateJSON: call
run([]string{"writeback","status","demo","--json"}, ...) capturing output (use a
bytes.Buffer), then unmarshal the output into a map[string]interface{} and
assert that the "failures" key exists and is an empty slice (or equivalent empty
value). Modify the test body around the existing human-mode check to perform
this additional run/unmarshal/assert using the same workspace ID "ws_demo" and
helpers (run, testJWTWithWorkspace) so the graceful missing-state path is
covered for JSON output as well.

60-74: ⚡ Quick win

Isolate JSON stdout from stderr in the --json assertion path.

Line 61 passes the same buffer for stdout and stderr. If any stderr text is emitted in this error-return path, JSON parsing at Line 73 will become flaky.

Suggested change
 var jsonOut bytes.Buffer
-err = run([]string{"writeback", "status", "demo", "--json"}, strings.NewReader(""), &jsonOut, &jsonOut)
+var jsonErr bytes.Buffer
+err = run([]string{"writeback", "status", "demo", "--json"}, strings.NewReader(""), &jsonOut, &jsonErr)
 if !errors.Is(err, errWritebackFailuresPresent) {
 	t.Fatalf("expected errWritebackFailuresPresent in --json mode, got %v", err)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/relayfile-cli/writeback_status_test.go` around lines 60 - 74, The test
uses the same bytes.Buffer (jsonOut) for both stdout and stderr when calling run
in the "--json" error path, which can cause stderr content to contaminate JSON
parsing; change the call to run in writeback_status_test.go to provide a
dedicated stderr buffer (e.g., bytes.Buffer errOut) or io.Discard for stderr
while keeping jsonOut for stdout, then continue to assert errors.Is(err,
errWritebackFailuresPresent) and json.Unmarshal(jsonOut.Bytes(), &report) so
only stdout JSON is parsed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/relayfile-cli/writeback_status_test.go`:
- Around line 267-268: The test currently only fails when os.Stat returns
os.IsNotExist, allowing other stat errors to slip by; change the check that
calls os.Stat(filepath.Join(dlDir, "op_garbled.json")) to fail when statErr !=
nil (i.e., treat any non-nil error as a test failure) so the malformed-file
retention assertion triggers on any unexpected stat error rather than only on
missing files.

---

Nitpick comments:
In `@cmd/relayfile-cli/writeback_status_test.go`:
- Around line 141-174: Add a JSON-mode assertion to
TestWritebackStatusMissingStateJSON: call
run([]string{"writeback","status","demo","--json"}, ...) capturing output (use a
bytes.Buffer), then unmarshal the output into a map[string]interface{} and
assert that the "failures" key exists and is an empty slice (or equivalent empty
value). Modify the test body around the existing human-mode check to perform
this additional run/unmarshal/assert using the same workspace ID "ws_demo" and
helpers (run, testJWTWithWorkspace) so the graceful missing-state path is
covered for JSON output as well.
- Around line 60-74: The test uses the same bytes.Buffer (jsonOut) for both
stdout and stderr when calling run in the "--json" error path, which can cause
stderr content to contaminate JSON parsing; change the call to run in
writeback_status_test.go to provide a dedicated stderr buffer (e.g.,
bytes.Buffer errOut) or io.Discard for stderr while keeping jsonOut for stdout,
then continue to assert errors.Is(err, errWritebackFailuresPresent) and
json.Unmarshal(jsonOut.Bytes(), &report) so only stdout JSON is parsed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d6ba0b0-96d7-4319-85e5-ac5d6c9ef7d4

📥 Commits

Reviewing files that changed from the base of the PR and between 38138f1 and 4351d8c.

📒 Files selected for processing (1)
  • cmd/relayfile-cli/writeback_status_test.go

Comment thread cmd/relayfile-cli/writeback_status_test.go Outdated
Five actionable review items from chatgpt-codex-connector and
coderabbitai. All fixed:

A. (Codex P2 + CodeRabbit Major, main.go:1442)
   `failedWritebacks` is a lifetime counter that only ever increments
   in this PR. Driving the exit code from `Failed > 0` would keep
   `writeback status` exiting non-zero forever after any single
   transient 429/5xx, even after retries succeed and the dead-letter
   queue empties. Now drives the exit gate from `len(DeadLettered) > 0`
   only — actionable failures, not historical bookkeeping. Lifetime
   counter still surfaces in the report for observability.

   Adds TestWritebackStatusLifetimeCounterDoesNotFailExitCode pinning
   the contract: `failedWritebacks: 7, deadLettered: []` exits 0.

B. (CodeRabbit Major, main.go:1637)
   `resolveWorkspaceLikeStatus` called `loadCredentials()` before
   consulting workspaces.json, blocking offline / expired-creds usage
   that the feature is meant to support. Now tries the local registry
   first when a workspace name/id is given; only falls back to the
   credentials path when nothing matches locally (or no value was
   given and we need the JWT's `wks` claim to identify the default).

   The new lifetime-counter test (above) deliberately omits
   saveCredentials() to also pin this behaviour.

C. (CodeRabbit Major / heavy lift, main.go:1788)
   `writeback retry` hardcoded `RemoteRoot: "/"`. For a mount created
   with `--remote-path /github`, a dead-lettered `/github/file.md`
   would be looked up at `<localDir>/github/file.md` instead of
   `<localDir>/file.md`, so replay would fail even though the mirrored
   file exists. Added `readMountRemoteRoot(localDir)` which reads the
   live `<localDir>/.relay/state.json` (defaults to `/` when missing
   or unparseable). `retryRelativePath` now strips the mount root
   prefix before joining to localDir.

D. (CodeRabbit, workflows/057-...ts phase-4 commands)
   Phase-4 test runner regex `TestWritebackStatus` matched the
   *Status tests but skipped the *Retry tests added in this PR.
   Widened to `TestWriteback(Status|Retry)` so retry coverage
   actually runs in the workflow's gate.

E. (CodeRabbit Minor, writeback_status_test.go:268)
   The malformed-record retention assertion used `os.IsNotExist`
   only — other stat errors silently passed. Now uses a strict
   nil-check on stat so any unexpected access error fails the test.

Verification:
- go build ./... clean
- go test -count=1 ./... → all 8 packages green, 7 writeback tests
  (1 daemon + 6 status/retry, all PASS)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit 16f76ea into main May 5, 2026
6 of 7 checks passed
@khaliqgant khaliqgant deleted the fix/writeback-reliability-impl branch May 5, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant