fix: 7 bugs from discussion #149 smoke walk (envelope, spawn, CDC, observer)#153
Conversation
writePRRows was DELETE-then-UPSERT on the Replace path, so every poll's upserts hit the INSERT branch and the AFTER UPDATE trigger that emits pr_review_thread_resolved never fired in production. Replaces the blanket delete with a set-diff: upsert observed threads first (so unchanged thread_ids go through ON CONFLICT DO UPDATE and fire the UPDATE trigger when resolved flips), then delete orphans whose thread_id is not in the observed set, all inside the existing tx. Adds DeletePRReviewThread query (sqlc-generated form hand-edited; no sqlc binary available locally — sqlc generate from backend/ produces an identical file). Tests: TestPRReviewThreadsCDC_EmitsResolvedOnReplacePoll (regression — fails without fix) and TestPRReviewThreadsReplace_PrunesOrphansWithoutReinserting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…bug 7) checkCredentials lived only inside Poll, which short-circuits when discoverSubjects is empty. On a fresh daemon with no tracked PRs the documented "scm observer disabled: provider credentials unavailable" warn was unreachable, leaving users with no signal that the SCM observer was a no-op. Calls checkCredentials once in Observer.loop before the first Poll. The existing credentialsChecked guard preserves once-per-process semantics; provider construction still uses SkipTokenPreflight so daemon readiness doesn't block on gh. Test: TestStart_LogsDisabledWarningWhenNoTokenAndNoSubjects with a race-safe syncBuffer for capturing slog from the observer goroutine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…bugs 1-4,6) Closes the long tail of opaque-500-and-orphan-row failures that discussion #149's smoke walk surfaced. The common shape: spawn created the session row before validating preconditions, and the underlying errors weren't typed, so toAPIError defaulted to INTERNAL_ERROR. Bug 1 (orphan row + opaque 500 on unknown projectId): Service.Spawn / SpawnOrchestrator now call store.GetProject first and return apierr.NotFound("PROJECT_NOT_FOUND", ...) before manager.Spawn, eliminating the create-row-then-fail-workspace ordering. Bug 2 (Restore opaque 500 on half-spawned/terminated session): Manager.Restore gained the ErrIncompleteHandle guard that Kill has at manager.go:189-193. toAPIError now maps both restore and kill to the same SESSION_INCOMPLETE_HANDLE 409 envelope. Bug 3 (--branch unfetched / checked-out-elsewhere → opaque 500): gitworktree pre-checks listRecords for branch-in-other-worktree, falls back to refs/tags on missing local/remote head, and emits two new port sentinels (ErrWorkspaceBranchCheckedOutElsewhere, ErrWorkspaceBranchNotFetched) mapped to BRANCH_CHECKED_OUT_ELSEWHERE (409) and BRANCH_NOT_FETCHED (400). Bug 4 (orphan terminated row on claim-pr rollback): Adds Store.DeleteSession gated to seed-state rows only (preserves the no-resurrection guarantee for live sessions), transactional change_log cleanup, Manager.RollbackSpawn (delete-then-fallback-to-kill), a new POST /sessions/{id}/rollback endpoint, and rewires cli/spawn.rollbackSpawnedSession to use it. The exit-0 sub-symptom was unreproducible from current source and is left unaddressed. Bug 6 (agent binary not on PATH → silent idle session): Drops the "return name, nil" anti-pattern from all 21 agent adapters and returns the new ports.ErrAgentBinaryNotFound on exec.LookPath miss. Manager.Spawn gained a validateAgentBinary pre-flight (with injectable LookPath so tests don't need real binaries on PATH) that aborts before runtime.Create. Mapped to AGENT_BINARY_NOT_FOUND (400). Integration tests in internal/integration/ stub LookPath to /usr/bin/true. Tests cover each bug end-to-end. OpenAPI regenerated for /rollback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI fixes for #153: - gofmt/goimports on kilocode and kiro adapters that the bug 6 audit left mis-grouped. - openapi-typescript regen against the new /rollback endpoint added in the Lane A commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR fixes 7 bugs from the #149 smoke walk across four subsystems: session spawn/lifecycle, CDC PR review threads, gitworktree branch handling, and the SCM observer credential gate. The changes span 57 files across three parallel investigation lanes.
Confidence Score: 3/5The CDC, observer, branch, binary-preflight, and project-preflight fixes are all correct and well-tested; one transaction ordering defect in DeleteSession will silently destroy CDC history for any fully-spawned session that goes through the rollback-fallback-to-kill path. The store's DeleteSession always runs DeleteChangeLogForSession before checking whether the row is in seed state. For the primary caller (RollbackSpawn on a fully-spawned session), this unconditionally wipes the session's CDC history — session_created and session_updated events — even though the session itself is not deleted and is subsequently killed via the fallback path. This defect lives in the exact flow Bug 4 was designed to fix, and the new test does not catch it because it only checks the row-deletion gate, not whether change_log entries survive. backend/internal/storage/sqlite/store/session_store.go — the DeleteSession transaction ordering needs to be reversed; backend/internal/storage/sqlite/gen/pr_review_threads.sql.go — hand-edited generated file should be regenerated via sqlc before merge. Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI
participant HTTP as POST /rollback
participant Svc as Service.RollbackSpawn
participant Mgr as Manager.RollbackSpawn
participant Store
participant Kill as Manager.Kill
CLI->>HTTP: "POST /sessions/{id}/rollback"
HTTP->>Svc: RollbackSpawn(id)
Svc->>Mgr: RollbackSpawn(id)
Mgr->>Store: DeleteSession(id)
Note over Store: tx: DeleteChangeLogForSession(id) runs unconditionally<br/>then DeleteSeedSession(id) is no-op for live sessions
Store-->>Mgr: "deleted=false"
Mgr->>Kill: Kill(id)
Kill-->>Mgr: "freed=true"
Mgr-->>Svc: "deleted=false, killed=true"
Svc-->>HTTP: RollbackOutcome
HTTP-->>CLI: "200 killed=true"
Reviews (1): Last reviewed commit: "chore: gofmt + regen frontend schema.ts ..." | Re-trigger Greptile |
, PR #153 review) Addresses @greptile-apps P1 and P2 review feedback on PR #153. P1 (CDC events deleted for live sessions in rollback fallback): DeleteChangeLogForSession ran unconditionally inside the transaction before DeleteSeedSession's seed-state predicates filtered the session delete to a no-op. For a live session reaching DeleteSession (the delete-then-kill fallback path inside RollbackSpawn), the seed delete returned 0 rows but the session_created/session_updated CDC events had already been purged. Now probes via a new SessionIsSeed query first and short-circuits the whole tx — including the change_log cleanup — when the row is not in seed state. P2 (regen sqlc): installed sqlc 1.31.1 and ran `sqlc generate` from backend/, replacing the hand-edited pr_review_threads.sql.go (and producing minor format-only churn in models.go, pr.sql.go, sessions.sql.go, changelog.sql.go). The regen surfaced two issues: 1. GetPR / ListPRsBySession had their return types hand-changed to gen.PR by the previous PR; sqlc actually emits GetPRRow / ListPRsBySessionRow when queries enumerate columns. Fixed by collapsing those two queries to `SELECT * FROM pr` so sqlc returns gen.PR (which is what the store's prRowFromGen converter expects), and pr.last_nudge_signature now lands in the result alongside the existing 37 columns. 2. sqlc 1.31.1's SQLite parser silently strips trailing `?` placeholders and string literals from DELETE statements (reproduced with sqlc.arg, IFNULL, rowid subquery, and second-predicate workarounds — all eaten). DeleteSeedSession and DeleteChangeLogForSession both tripped it. They are now run as plain tx.ExecContext calls inside Store.DeleteSession, inside the same write transaction as SessionIsSeed; both queries are removed from the queries/ directory and the workaround context is documented inline in queries/sessions.sql and queries/changelog.sql to keep future contributors from re-adding them. Verified: go build ./... clean, go test -race ./... 1097/1097 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
harshitsinghbhandari has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
Fixes the 7 bugs surfaced by the discussion #149 smoke walk. RCA + suggested fixes are written up in #152; this PR implements them.
Three commits, one per investigation lane (file-disjoint, dispatched in parallel):
fix(api,spawn): typed errors + project/branch/binary preflightfix(cdc): emit pr_review_thread_resolved on replace pollsfix(observe): emit scm-disabled log on startup with no subjectsPer-bug breakdown
Bug 1 — orphan terminated row + opaque 500 on unknown projectId.
Service.Spawn/SpawnOrchestratornow callstore.GetProjectfirst and returnapierr.NotFound("PROJECT_NOT_FOUND", ...)beforemanager.Spawn. No row is created when the project doesn't exist.Bug 2 — Restore opaque 500 on half-spawned/terminated session.
Manager.Restoregained theErrIncompleteHandleguard thatManager.Killalready had. Both endpoints now return the sameSESSION_INCOMPLETE_HANDLE409.Bug 3 —
--branchunfetched or checked-out-elsewhere → opaque 500. Two new port sentinels:ErrWorkspaceBranchCheckedOutElsewhere(mapped toBRANCH_CHECKED_OUT_ELSEWHERE409) andErrWorkspaceBranchNotFetched(mapped toBRANCH_NOT_FETCHED400). gitworktree pre-checkslistRecordsfor branch-in-other-worktree and falls back torefs/tagson missing local/remote head.Bug 4 — orphan terminated row on
--claim-prrollback. NewStore.DeleteSessiongated to seed-state rows only (preserves the no-resurrection guarantee for live sessions), transactionalchange_logcleanup,Manager.RollbackSpawn(delete-then-fallback-to-kill), newPOST /sessions/{id}/rollbackendpoint, CLI rewired to use it. The exit-0 sub-symptom from the smoke walk was unreproducible from current source (every traced path returns a non-nil error →os.Exit(1)) and is intentionally not addressed here — see #152 for the trace.Bug 5 —
pr_review_thread_resolvedCDC event never lands.writePRRowsno longer does DELETE-then-UPSERT (which made every poll hit the INSERT branch and bypass theAFTER UPDATE WHEN OLD.resolved <> NEW.resolvedtrigger). It now upserts observed threads (real updates fire the trigger) and set-diff-deletes orphans, all inside the existing transaction.Bug 6 — agent binary not on PATH succeeds silently. Dropped the
return "<name>", nilanti-pattern from all 21 agent adapters; onexec.LookPathmiss they now returnports.ErrAgentBinaryNotFound.Manager.Spawngained avalidateAgentBinarypre-flight (with injectableLookPathfor tests) that aborts beforeruntime.Create. Mapped toAGENT_BINARY_NOT_FOUND400. The longer-term zellij-pane-liveness fix is a separate concern.Bug 7 — SCM observer disabled log never emitted.
checkCredentialsis now called once inObserver.loopbefore the firstPoll, independent of subject discovery. The existingcredentialsCheckedguard keeps it once-per-process. Daemon wiring still usesSkipTokenPreflight: trueso readiness doesn't block ongh.Cross-cutting cleanup
toAPIErrorwhitelist extended (no refactor into a registry — kept tight per the RCA recommendation).ports/soservicedoesn't import adapters; adapter packages keep aliases.DeleteSession(seed-state gated) andDeleteSeedSession/DeleteChangeLogForSessionqueries./rollback.Test plan
go build ./...cleango vet ./...cleango test -race ./...— 1097 passed, 0 failed across 61 packagesinternal/integration/updated to stubLookPathfor the new pre-flightNotes for reviewer
pr_review_threads.sql.gowas hand-edited becausesqlcisn't on this machine's PATH. The file matches whatsqlc generatefrombackend/would produce; please regenerate before merge if your environment has it, just to confirm.Closes #152
EOF