feat(observe): SCM poller — Observe → pr.Manager → lifecycle nudges#72
Conversation
Wires the github.Provider (PR #69) into the daemon as a periodic poller on top of the SM/lifecycle stack (PR #70). Every 30s the loop lists alive sessions, branch-discovers each session's open PR, observes it through the Provider under a 15s per-call deadline, and hands the result to pr.Manager.ApplyObservation — which transactionally writes the row and forwards to lifecycle.ApplyPRObservation for CI-failure log-tail nudges, review-feedback nudges (capped at reviewMaxNudge=3), and merge-conflict rebase nudges. Branch discovery is the v1 fallback because sessions don't yet carry a PR URL field; adding that column is a separate session/sqlc PR. Until then the poller resolves owner/repo from project.Repo (currently RepoOriginURL) or git remote get-url origin, then asks GitHub for the open PR with head = owner:branch. Error classification follows the spec: - ErrRateLimited: short-circuit rest of tick (don't burn through remaining sessions while GitHub asks us to back off) - ErrAuthFailed: flip Healthy() to false (sticky — does NOT auto-recover, because a 304-cached success doesn't actually exercise the token) and continue - other: log warn, continue No-token environments degrade gracefully: startSCM logs an Info notice and returns a closed done-channel; Stop is a free call.
Greptile SummaryWires a new GitHub SCM polling loop into the daemon: every 30 s the loop lists alive sessions, resolves each session's workspace branch to its open PR URL via
Confidence Score: 4/5Safe to merge for the primary github.com use case; the two noted issues are isolated edge cases (GHE enterprise fallback URL and shutdown log noise) that do not affect the main polling path. The daemon wiring, shutdown ordering, per-call deadline, and rate-limit/auth-fail branching are all implemented correctly and are well-covered by tests. The two items worth a follow-up are: classify importing GitHub-specific error sentinels through a supposedly generic Provider interface (will silently break rate-limit short-circuiting for any future non-GitHub adapter), and the html_url-absent fallback in FindOpenPRForBranch hardcoding https://github.com/ in a code path the comment says is specifically for enterprise responses. Neither affects the happy path today. backend/internal/observe/scm/poller.go (classify coupling) and backend/internal/adapters/scm/github/find_branch_pr.go (enterprise URL fallback) Important Files Changed
Sequence DiagramsequenceDiagram
participant D as daemon.Run
participant P as scm.Poller (loop)
participant B as Provider.FindOpenPRForBranch
participant O as Provider.Observe
participant M as pr.Manager.ApplyObservation
participant L as lifecycle.ApplyPRObservation
participant Msg as messenger
D->>P: Start(ctx) → done chan
loop every 30s
P->>P: Tick(ctx)
P->>P: ListAllSessions
loop each alive session with branch
P->>B: FindOpenPRForBranch(owner, repo, branch)
B-->>P: prURL or ""
alt "prURL != """
P->>O: Observe(pollCtx[15s], prURL)
O-->>P: PRObservation or error
alt ErrRateLimited
P->>P: short-circuit remaining sessions
else ErrAuthFailed
P->>P: healthy.Store(false), continue
else success and obs.Fetched
P->>M: ApplyObservation(ctx, sessID, obs)
M->>M: write PR row + checks
M->>L: ApplyPRObservation(ctx, obs)
L->>Msg: Send(CI-failure / review / rebase nudge)
end
end
end
end
D->>P: cancel ctx
D->>D: scmStk.Stop() ← pollerDone
Reviews (1): Last reviewed commit: "feat(observe): SCM poller — Observe → pr..." | Re-trigger Greptile |
| case <-t.C: | ||
| if err := p.Tick(ctx); err != nil { | ||
| p.logger.Error("scm poller: tick failed", "err", err) | ||
| } |
There was a problem hiding this comment.
The
loop function logs a cancelled-context error at Error level every time the daemon shuts down while a Tick is in flight. When the parent ctx is cancelled mid-tick, the per-call pollCtx (derived from it) is also cancelled, Observe returns context.Canceled, classify treats it as a generic error (warn + continue), and then the ctx.Err() guard in Tick propagates context.Canceled back to loop — which logs it as Error. The result is a spurious "scm poller: tick failed: context canceled" in every clean shutdown.
| case <-t.C: | |
| if err := p.Tick(ctx); err != nil { | |
| p.logger.Error("scm poller: tick failed", "err", err) | |
| } | |
| case <-t.C: | |
| if err := p.Tick(ctx); err != nil && !errors.Is(err, context.Canceled) { | |
| p.logger.Error("scm poller: tick failed", "err", err) | |
| } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| func (p *Poller) classify(sid domain.SessionID, stage string, err error) bool { | ||
| switch { | ||
| case errors.Is(err, scmgithub.ErrRateLimited): | ||
| p.logger.Warn("scm poller: rate limited, skipping rest of tick", | ||
| "session", sid, "stage", stage, "err", err) | ||
| return true | ||
| case errors.Is(err, scmgithub.ErrAuthFailed): | ||
| p.healthy.Store(false) | ||
| p.logger.Error("scm poller: auth failed, provider marked unhealthy", | ||
| "session", sid, "stage", stage, "err", err) | ||
| return false | ||
| default: | ||
| p.logger.Warn("scm poller: error", | ||
| "session", sid, "stage", stage, "err", err) |
There was a problem hiding this comment.
Generic
Provider interface leaks GitHub-specific error sentinels
classify directly imports and matches scmgithub.ErrRateLimited / scmgithub.ErrAuthFailed even though the Provider interface is intentionally adapter-agnostic. A future GitLab or Bitbucket provider would need to return these GitHub-package sentinel values — or its rate-limit errors would silently fall through to the "generic warn, continue" branch, meaning the entire remaining session list would be polled against a throttled API on every tick, and its auth failures would never flip Healthy() to false. The standard fix is to declare var ErrRateLimited and var ErrAuthFailed in the shared ports package (or in this scm package itself) and have adapters wrap with fmt.Errorf("%w: …", ports.ErrRateLimited).
| // Construct the canonical web URL from owner/repo/number when the | ||
| // API response omits html_url (some enterprise responses elide it). | ||
| return "https://github.com/" + owner + "/" + repo + "/pull/" + strconv.Itoa(chosen.Number), nil |
There was a problem hiding this comment.
The fallback URL synthesis hardcodes
https://github.com/ but the comment above it explicitly says this path fires for "some enterprise responses" that omit html_url. On a GitHub Enterprise instance the synthesised URL would carry the wrong host — causing the PR row to be stored with a github.com URL and any subsequent Provider.Observe call on that URL to either hit the wrong host or be rejected by parsePRURL. The Provider already holds its configured base URL (e.g. https://github.mycompany.com) so the fix requires plumbing it through, or at minimum synthesising from the REST base host rather than the hardcoded public domain.
| // Construct the canonical web URL from owner/repo/number when the | |
| // API response omits html_url (some enterprise responses elide it). | |
| return "https://github.com/" + owner + "/" + repo + "/pull/" + strconv.Itoa(chosen.Number), nil | |
| // Construct the canonical web URL from owner/repo/number when the | |
| // API response omits html_url (some enterprise responses elide it). | |
| // TODO: plumb the Provider's web base URL here so GHE instances get | |
| // the correct host rather than the hardcoded github.com. | |
| return "https://github.com/" + owner + "/" + repo + "/pull/" + strconv.Itoa(chosen.Number), nil |
Inherited from PR #72 merging into staging after this branch opened. golangci-lint v2.12.2 → 0 issues.
* feat(ao): `ao spawn` CLI + POST /api/v1/sessions route * fix(ao): address PR γ review + clear inherited lint debt Review fixes (PR #71): - spawn CLI now uses a dedicated 90 s timeout (90 s > server's 60 s DefaultRequestTimeout) via context.WithTimeout, and stops sharing deps.HTTPClient — that client is sized for fast /healthz/shutdown probes (2 s) and was preempting the synchronous Spawn long before the daemon could finish provisioning a worktree + zellij pane + agent. - Harden writeSpawnError so a *project.Error with a non-client Kind ("internal", "not_implemented", or anything unknown) falls through to the generic 500 SPAWN_FAILED envelope instead of passing the project error's Code/Message verbatim to the client. Adds three subtests that pin down the opacity contract. Lint debt cleared (inherited from PRs #65/#70): - Add doc comments on every exported symbol in the agent / claudecode / codex / adapters-registry packages (revive: exported) - gosec G306/G301: inbox file/dir perms 0644→0600 and 0755→0750 - gosec G703 (path traversal via taint): excluded globally with the same rationale as G304 — adapter paths are daemon-config/worktree-derived, not user input - gocritic emptyStringTest: len(strings.TrimSpace(...)) > 0 → != "" - gocritic paramTypeCombine: combine adjacent same-type params - errcheck: wrap deferred os.Remove(tmpName) in a closure - prealloc: preallocate cmd slices on the resume paths * fix(lint): lift loop condition in scm poller test (staticcheck QF1006) Inherited from PR #72 merging into staging after this branch opened. golangci-lint v2.12.2 → 0 issues.
Summary
Wires the GitHub SCM provider (PR #69) into the daemon as a periodic poller on top of the SM/lifecycle stack (PR #70). Every 30s the loop lists alive sessions, branch-discovers each session's open PR, asks the Provider for a
ports.PRObservationunder a 15s per-call deadline, and hands the result topr.Manager.ApplyObservation— which transactionally writes the PR row and forwards tolifecycle.ApplyPRObservationfor CI-failure log-tail nudges, review-feedback nudges (capped atreviewMaxNudge=3), and merge-conflict rebase nudges.After this lands, the daemon automatically observes the PR for each live session with a workspace branch and fires reactive agent nudges through the messenger.
Prerequisites (already on
staging)scm/github.ProviderexposingObserve(prURL) → PRObservationWhat changed
backend/internal/observe/scm/{poller.go,poller_test.go}— new package.Poller.Ticklists sessions, skips terminated/branch-less rows, resolves each open PR URL via the newBranchPRFinder, callsProvider.Observeundercontext.WithTimeout, and hands the result topr.Manager.ApplyObservation.Healthy()is a stickyatomic.Boolthat flips tofalsethe first time the Provider returnsErrAuthFailed.backend/internal/adapters/scm/github/{find_branch_pr.go,find_branch_pr_test.go}—Provider.FindOpenPRForBranch(owner, repo, branch)overGET /repos/{owner}/{repo}/pulls?head={owner}:{branch}&state=open. Picks the most recently updated PR when multiple match (rather than failing closed). Errors classify the same asObserve.backend/internal/daemon/scm_wiring.go+ a small edit todaemon.go::Run—startSCM(ctx, store, projects, lcm, log)builds the Provider from env-token auth (AO_GITHUB_TOKENpreferred, falling back toGITHUB_TOKEN), constructspr.Manager, and starts the poller alongside the reaper. Shutdown order:stop() → scmStk.Stop() → lcStack.Stop() → cdcPipe.Stop().backend/internal/integration/scm_poller_test.go— boots store + LCM +pr.Manager+scm.Polleragainst anhttptestGitHub stub, ticks once, and asserts the PR row is written, the failing check is recorded, and the lifecycle CI-failure nudge reaches the messenger (withlen(msgs) == 1to catch double-nudge regressions).Branch-discovery fallback — and why no schema change here
domain.SessionRecorddoes not (yet) carry a PR URL field, so v1 resolves it from the session's workspace branch:project.Manager.Get(ctx, sess.ProjectID)→ deriveowner/repofromproject.Repo(currently populated fromRepoOriginURL).git remote get-url originagainst the project path (RemoteResolveris injectable for tests).BranchPRFinder.FindOpenPRForBranch(owner, repo, sess.Metadata.Branch).When the session record grows a stored PR URL field, the poller should prefer it over branch discovery — that's intentionally deferred to a session/sqlc PR so this PR stays focused on the polling loop.
Error / back-off behavior
ErrRateLimitedwarn, short-circuit the rest of the tick so we don't burn through remaining sessions while GitHub is asking us to back off. Next tick retries normally.ErrAuthFailederror, flipHealthy()tofalse(sticky — does NOT auto-recover, because a single subsequent success could be an ETag-cached 304 that didn't actually exercise the token). Continue with the next session.warn, continue.A missing token (
ErrNoToken) at daemon startup degrades gracefully:startSCMlogs anInfonotice and returns a closed done-channel, soStopis a free call and local development without a token still works.Out of scope (intentional)
messenger.Senddirectly now)Test plan
go test ./internal/observe/scm/ -race— 12 unit cases (apply, !Fetched, no-branch, no-PR, rate-limit short-circuit, auth-fail sticky, generic error, project-lookup error, per-call deadline, Start/cancel drain, repeat ticks,parseGitHubRemotetable)go test ./internal/adapters/scm/github/ -race -run TestFindOpenPRForBranch— 7 cases covering single/no/multi-match, empty inputs, rate-limit, auth, synth-URLgo test ./internal/integration/ -race -run TestSCMPollerEndToEnd— end-to-end through realpr.Manager, reallifecycle.ApplyPRObservation, realcaptureMessengergo test ./... -race— green (the unrelatedTestSessionStreamsRealZellijPanefailure pre-exists onstagingand requires the realzellijbinary)go vet ./...— cleangofmt -l backend/— clean🤖 Generated with Claude Code