Skip to content

Widen Sentry capture beyond panic recovery#181

Merged
motatoes merged 1 commit intomainfrom
sentry-wider-capture
Apr 21, 2026
Merged

Widen Sentry capture beyond panic recovery#181
motatoes merged 1 commit intomainfrom
sentry-wider-capture

Conversation

@motatoes
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #179 / #180. The initial Sentry wiring only captured panics via a top-level defer, but Go services rarely panic and the codebase handles errors with log.Printf — so nothing was reaching Sentry in practice (hence the empty dashboard).

This PR adds three capture layers so real errors actually surface:

  • Echo middleware on both the control plane API and the worker HTTP server. Captures panics (before the existing middleware.Recover() converts them to 500s) and any handler error that resolves to HTTP 5xx. Client errors (4xx) are skipped — they're expected flow.
  • Unary + stream gRPC interceptors on the worker. Captures server-side error codes (Internal, Unknown, Unavailable, DataLoss, etc.) and skips client-side codes (InvalidArgument, NotFound, Canceled, DeadlineExceeded, PermissionDenied, etc.) that represent expected flow rather than bugs.
  • observability.Go wrapper for background goroutines. A panic in a long-lived background loop would otherwise die silently; now it's captured with a goroutine tag. Applied to the maintenance loop in the control plane and UploadBaseImageIfNew / MigrateStaleCheckpoints / RollingUpgradeHibernated in the worker.

Also adds observability.CaptureError(err, tags...) as a generic helper, and uses it in the control plane maintenance loop to report DB recovery failures that were previously only logged.

All capture calls are safe no-ops when Sentry is not initialized.

Test plan

  • go build ./cmd/server ./cmd/worker ./internal/observability ./internal/api ./internal/worker passes
  • go vet clean on touched packages
  • Deploy and confirm sentry: enabled (...) still appears at startup for both services
  • Hit a handler that returns a 500 on the control plane — event appears in Sentry with service=control-plane and http.route tag
  • Trigger a worker gRPC error with code Internal — event appears with service=worker and grpc.method tag
  • Force a panic in a background goroutine (e.g. via test hook) — event appears with goroutine tag

What's still not captured

  • log.Printf / log.Fatalf at arbitrary sites across internal packages (would require either replacing the standard logger or adding targeted CaptureError calls site-by-site — out of scope for this PR; can be added incrementally as we find gaps)
  • Panics in per-request streaming goroutines (runExecStreamQEMU, forwardStdin) — errors there bubble up through the gRPC interceptor already

🤖 Generated with Claude Code

The initial integration only captured panics via a top-level defer, but
Go services rarely panic and the codebase handles most errors with
log.Printf, so nothing was reaching Sentry in practice.

Adds three capture layers:

- Echo middleware on both control plane and worker HTTP servers —
  captures panics (before the existing Recover middleware handles them)
  and any 5xx responses, attaching request context.
- Unary + stream gRPC interceptors on the worker — capture server-side
  error codes (Internal, Unknown, Unavailable, etc.) while skipping
  client-side codes (InvalidArgument, NotFound, Canceled, etc.) that
  represent expected flow.
- observability.Go wrapper for background goroutines with Sentry-aware
  panic recovery; applied to the maintenance loop (control plane) and
  UploadBaseImageIfNew / MigrateStaleCheckpoints / RollingUpgrade
  loops (worker).

Also adds observability.CaptureError for targeted error capture, used
in the control plane maintenance loop to report DB recovery failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
opensandbox Ready Ready Preview, Comment Apr 21, 2026 9:37pm

Request Review

Copy link
Copy Markdown
Contributor

@breardon2011 breardon2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve

@motatoes motatoes merged commit cf606e9 into main Apr 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants