Centralise Sentry + observability bootstrap, add deploy tags#372
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughCentralizes Sentry and metrics/pprof startup: adds ChangesObservability & Logging Abstractions
Sequence Diagram(s)sequenceDiagram
participant Cmd as cmd/{app,worker,analysis}
participant Logging as internal/logging
participant Observ as internal/observability
participant OTel as OTel_Providers
participant Sentry as Sentry_Backend
Cmd->>Logging: InitSentry(SentryOptions)
Logging->>Sentry: sentry.Init(...)
Logging->>Cmd: return sentryFlush
Cmd->>Observ: StartMetricsServer(MetricsServerOptions)
Observ->>OTel: Init OTel providers
Observ->>Cmd: return MetricsServer (http server if bound)
Cmd->>Cmd: defer sentryFlush() / defer metricsSrv.Shutdown()
Note over Observ,OTel: /metrics and optional /debug/pprof/* served by metrics HTTP server
Cmd->>Sentry: on shutdown -> call sentryFlush()
Cmd->>Observ: on shutdown -> metricsSrv.Shutdown(ctx)
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 22 minutes and 28 seconds. Comment |
|
Updates to Preview Branch (work/sentry-tags-bootstrap) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/observability/metrics_server_test.go (1)
61-92: 💤 Low valueConsider adding a positive test for
EnablePprof: true.The test verifies pprof returns 404 when disabled, but there's no test confirming pprof endpoints return 200 when enabled. This would strengthen coverage of the conditional registration logic.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/observability/metrics_server_test.go` around lines 61 - 92, Add a new test mirroring TestStartMetricsServerPprofGated that starts StartMetricsServer with MetricsServerOptions{ServiceName:"hover-test", Environment:"test", MetricsAddress: freePort(t), EnablePprof: true}, wait for the server to come up (same polling logic), then perform an HTTP GET to "http://"+addr+"/debug/pprof/" and assert resp.StatusCode == http.StatusOK (and close resp.Body and shutdown srv in t.Cleanup). Reuse the same helper freePort(t) and deadline/polling pattern from TestStartMetricsServerPprofGated to ensure flakeless startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 61-92: Add a new test mirroring TestStartMetricsServerPprofGated
that starts StartMetricsServer with
MetricsServerOptions{ServiceName:"hover-test", Environment:"test",
MetricsAddress: freePort(t), EnablePprof: true}, wait for the server to come up
(same polling logic), then perform an HTTP GET to "http://"+addr+"/debug/pprof/"
and assert resp.StatusCode == http.StatusOK (and close resp.Body and shutdown
srv in t.Cleanup). Reuse the same helper freePort(t) and deadline/polling
pattern from TestStartMetricsServerPprofGated to ensure flakeless startup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b9cf364d-3dde-440e-a2fb-97d763252079
📒 Files selected for processing (7)
cmd/analysis/main.gocmd/app/main.gocmd/worker/main.gointernal/logging/sentry_init.gointernal/logging/sentry_init_test.gointernal/observability/metrics_server.gointernal/observability/metrics_server_test.go
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
Release VersionsApp patch: ChangelogAdded
Fixed
|
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 41-42: Replace uses of the default http.Get (which can hang) with
a dedicated http.Client that sets a timeout (e.g. &http.Client{Timeout: 5 *
time.Second}) and call client.Get("http://"+addr+"/metrics") instead of
http.Get; update all occurrences in the test (the spots that assign resp, err =
http.Get(...)) and add the time import if missing so the test fails fast on
stalled connections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d961c47c-7f0b-4bd7-ad6f-1ba0df1ca001
📒 Files selected for processing (2)
internal/logging/sentry_init_test.gointernal/observability/metrics_server_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/logging/sentry_init_test.go
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
internal/logging/sentry_init.go (1)
64-68: ⚡ Quick winFix the comment order to match runtime behaviour.
Line 64 says tags are stamped before delegating, but Line 71 delegates first and stamps afterwards. Please update the comment so future changes don’t accidentally invert the order.
Proposed wording fix
-// wrapBeforeSend stamps deploy-identifying tags directly onto every event -// before delegating to the existing BeforeSend normalisation. The earlier +// wrapBeforeSend delegates to the existing BeforeSend normalisation first, +// then stamps deploy-identifying tags directly onto every non-nil event. The earlier // approach used sentry.ConfigureScope, but staging diagnostics showed scope // tags were not reaching events captured via the sentryslog handler — likely // a goroutine-local hub interaction. Stamping in BeforeSend is unconditional.Also applies to: 70-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/logging/sentry_init.go` around lines 64 - 68, The comment for wrapBeforeSend is incorrect about ordering: the function delegates to the existing BeforeSend first and then stamps deploy-identifying tags, but the comment says the reverse; update the comment in internal/logging/sentry_init.go (the wrapBeforeSend / BeforeSend block) to state that it delegates to the existing BeforeSend first and then unconditionally adds/stamps the deploy-identifying tags onto the event so the wording matches the actual runtime behavior.internal/logging/sentry_init_test.go (1)
9-33: ⚡ Quick winAdd an explicit non-overwrite test for deploy tags.
This test proves stamping works, but it doesn’t lock in the “don’t overwrite existing
app/region/processtags” contract. A dedicated case will prevent regressions in tag precedence.Suggested additional test
+func TestWrapBeforeSendPreservesExistingDeployTags(t *testing.T) { + fn := wrapBeforeSend("hover-pr-372", "syd", "worker") + + event := &sentry.Event{ + Message: "test", + Tags: map[string]string{ + "app": "preset-app", + "region": "iad", + "process": "analysis", + }, + } + + got := fn(event, nil) + if got == nil { + t.Fatal("expected non-nil event") + } + if got.Tags["app"] != "preset-app" { + t.Errorf("app overwritten: %q", got.Tags["app"]) + } + if got.Tags["region"] != "iad" { + t.Errorf("region overwritten: %q", got.Tags["region"]) + } + if got.Tags["process"] != "analysis" { + t.Errorf("process overwritten: %q", got.Tags["process"]) + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/logging/sentry_init_test.go` around lines 9 - 33, Add a new unit test that verifies wrapBeforeSend does not overwrite existing deploy tags: create an event with Tags already containing "app", "region", and "process" set to distinct values, call fn := wrapBeforeSend("hover-pr-372", "syd", "worker") and invoke fn(event, nil), then assert the returned event is non-nil and that got.Tags["app"], got.Tags["region"], and got.Tags["process"] still equal the original values (and not the new "hover-pr-372"/"syd"/"worker"), while other tags remain present; use wrapBeforeSend and the same sentry.Event shape as in TestWrapBeforeSendStampsTags to locate where to add this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/logging/sentry_init_test.go`:
- Around line 9-33: Add a new unit test that verifies wrapBeforeSend does not
overwrite existing deploy tags: create an event with Tags already containing
"app", "region", and "process" set to distinct values, call fn :=
wrapBeforeSend("hover-pr-372", "syd", "worker") and invoke fn(event, nil), then
assert the returned event is non-nil and that got.Tags["app"],
got.Tags["region"], and got.Tags["process"] still equal the original values (and
not the new "hover-pr-372"/"syd"/"worker"), while other tags remain present; use
wrapBeforeSend and the same sentry.Event shape as in
TestWrapBeforeSendStampsTags to locate where to add this test.
In `@internal/logging/sentry_init.go`:
- Around line 64-68: The comment for wrapBeforeSend is incorrect about ordering:
the function delegates to the existing BeforeSend first and then stamps
deploy-identifying tags, but the comment says the reverse; update the comment in
internal/logging/sentry_init.go (the wrapBeforeSend / BeforeSend block) to state
that it delegates to the existing BeforeSend first and then unconditionally
adds/stamps the deploy-identifying tags onto the event so the wording matches
the actual runtime behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9c5d5190-063c-497d-bd50-e0d36302351d
📒 Files selected for processing (2)
internal/logging/sentry_init.gointernal/logging/sentry_init_test.go
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-372.fly.dev |
Summary
logging.InitSentry+observability.StartMetricsServerso the three Fly mains share one Sentry init and one metrics-server setup.app,region,process,release, andserver_nametags read from Fly env vars — staging events were missing all of these, so we couldn't pin connect-error bursts to a specific review app.cmd/analysis's missing metrics-server graceful shutdown fixed for free.Why
Investigating recurring
*pgconn.ConnectError("tenant/user … not found") spikes on staging review apps. Production has zero of these errors over the same window, but every staging event hasrelease: nulland no app/server tag, so we can't tell whether the bursts are concentrated on one review app, all of them, or correlated with deploys.This PR is the test case: once it lands and a review app is built from it, we run a normal job and check whether the bursts return — and if so, which
app/release/processthey're tagged with.What's in each commit
Centralise Sentry init with deploy tags— newinternal/logging/sentry_init.go+ tests. ReadsFLY_RELEASE_VERSION→FLY_IMAGE_REFforRelease;FLY_MACHINE_ID(with hostname fallback) forServerName; sets scope tagsapp=FLY_APP_NAME,region=FLY_REGION,processfrom caller.Centralise observability metrics server setup— newinternal/observability/metrics_server.go+ tests. WrapsInit+ listener bind + Serve goroutine + graceful shutdown behind a singleShutdown(ctx).EnablePproftoggles the pprof handlers (worker/analysis on, app off — matches existing behaviour).Use bootstrap helpers in cmds— swaps the duplicated boilerplate incmd/app,cmd/worker,cmd/analysisfor the helpers. Also fixes a pre-existing bug wherecmd/analysisstarted a metrics HTTP server but never shut it down.Behaviour preserved
cmd/appkeepsTracesSampleRate(0.1 prod / 1.0 elsewhere) andDebugin development.cmd/appdoes not expose pprof on the metrics port (would sit alongside the public listener).flyio/fly-autoscalerimage and don't emit to our Sentry — untouched.Test plan
go test ./...passesbash scripts/security-check.shpassesapp,process,releasetagspgconn.ConnectErrorbursts return with the new tagging — and if so, which app/release they're attributed toSummary by CodeRabbit