Centralise Sentry + observability bootstrap, add deploy tags by simonsmallchua · Pull Request #372 · Good-Native/hover

simonsmallchua · 2026-05-02T01:15:24Z

Summary

Add logging.InitSentry + observability.StartMetricsServer so the three Fly mains share one Sentry init and one metrics-server setup.
Sentry events now carry app, region, process, release, and server_name tags read from Fly env vars — staging events were missing all of these, so we couldn't pin connect-error bursts to a specific review app.
Net effect on the three mains: −151 / +46 lines, with cmd/analysis's missing metrics-server graceful shutdown fixed for free.

Why

Investigating recurring *pgconn.ConnectError ("tenant/user … not found") spikes on staging review apps. Production has zero of these errors over the same window, but every staging event has release: null and no app/server tag, so we can't tell whether the bursts are concentrated on one review app, all of them, or correlated with deploys.

This PR is the test case: once it lands and a review app is built from it, we run a normal job and check whether the bursts return — and if so, which app / release / process they're tagged with.

What's in each commit

Centralise Sentry init with deploy tags — new internal/logging/sentry_init.go + tests. Reads FLY_RELEASE_VERSION → FLY_IMAGE_REF for Release; FLY_MACHINE_ID (with hostname fallback) for ServerName; sets scope tags app=FLY_APP_NAME, region=FLY_REGION, process from caller.
Centralise observability metrics server setup — new internal/observability/metrics_server.go + tests. Wraps Init + listener bind + Serve goroutine + graceful shutdown behind a single Shutdown(ctx). EnablePprof toggles the pprof handlers (worker/analysis on, app off — matches existing behaviour).
Use bootstrap helpers in cmds — swaps the duplicated boilerplate in cmd/app, cmd/worker, cmd/analysis for the helpers. Also fixes a pre-existing bug where cmd/analysis started a metrics HTTP server but never shut it down.

Behaviour preserved

cmd/app keeps TracesSampleRate (0.1 prod / 1.0 elsewhere) and Debug in development.
cmd/app does not expose pprof on the metrics port (would sit alongside the public listener).
Worker + analysis keep pprof on the metrics port.
Three Fly autoscaler apps run the third-party flyio/fly-autoscaler image and don't emit to our Sentry — untouched.

Test plan

go test ./... passes
bash scripts/security-check.sh passes
Review app builds and boots (worker + app + analysis)
Run a normal crawl job against the review app
Check Sentry: events from this run carry app, process, release tags
Confirm whether the staging pgconn.ConnectError bursts return with the new tagging — and if so, which app/release they're attributed to

Summary by CodeRabbit

Refactor
- Unified error-tracking initialization and metrics/profiling server startup across services for consistent startup/shutdown behavior.
Bug Fixes
- Metrics HTTP server now shuts down gracefully on termination; observability startup tolerates bind failures.
New Features
- Error-tracking events now include app, process, region, release, and server-name tags; pprof exposure is configurable.
Tests
- Added tests for error-tracking helpers, metrics endpoints, pprof gating, and shutdown idempotency.
Documentation
- Changelog updated to reflect tagging and observability changes.

coderabbitai · 2026-05-02T01:15:36Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 546a1d7d-07c5-4639-b481-b2b9c6bc529d

📥 Commits

Reviewing files that changed from the base of the PR and between d7fca98 and 8c6588d.

📒 Files selected for processing (1)

CHANGELOG.md

📝 Walkthrough

Walkthrough

Centralizes Sentry and metrics/pprof startup: adds logging.InitSentry(...) and observability.StartMetricsServer(...), updates cmd/app, cmd/worker, and cmd/analysis to use them (capturing/deferring returned flush and Shutdown), and removes manual in-file Sentry and HTTP/pprof server wiring.

Changes

Observability & Logging Abstractions

Layer / File(s)	Summary
Data / Config Shape `internal/logging/sentry_init.go`, `internal/observability/metrics_server.go`	Adds `SentryOptions` and `MetricsServerOptions` to encapsulate Sentry and metrics server configuration (DSN, Environment, Process, TracesSampleRate, OTLP endpoint/headers, MetricsAddress, EnablePprof, logger).
Core Implementation (Sentry) `internal/logging/sentry_init.go`	Implements `InitSentry(opts) (func(), error)` that conditionally initializes Sentry, composes `sentry.ClientOptions` (release/server name/stacktrace/debug/BeforeSend), applies Fly-derived tags, optionally sets traces sample rate, and returns a flush closure (no-op when DSN empty). Adds `wrapBeforeSend`, `deployRelease`, and `serverName` helpers.
Core Implementation (Metrics) `internal/observability/metrics_server.go`	Implements `StartMetricsServer(ctx, opts) (MetricsServer, error)` which initializes OTel providers, conditionally constructs an `http.ServeMux` for `/metrics` and gated `/debug/pprof/`, attempts to bind `MetricsAddress`, serves in a goroutine when bound, and exposes `(*MetricsServer).Shutdown`.
Cmd Wiring / Integration `cmd/app/main.go`, `cmd/worker/main.go`, `cmd/analysis/main.go`	Replace inline `sentry-go` init and manual `http`/pprof server wiring with `logging.InitSentry(...)` (capture and defer returned flush) and `observability.StartMetricsServer(...)` (capture server and defer `Shutdown`). Adds `appTracesSampleRate` helper in `cmd/app`. Removes now-unused direct imports (`net`, `net/http`, `net/http/pprof`, `errors`, `github.com/getsentry/sentry-go`).
Tests / Validation `internal/logging/sentry_init_test.go`, `internal/observability/metrics_server_test.go`	Adds tests: Sentry BeforeSend wrapper behavior and init (including DSN no-op), `deployRelease()` precedence, `serverName()` fallback; metrics server tests for `/metrics`, pprof gating, idempotent Shutdown and no-address behavior.
Changelog `CHANGELOG.md`	Documents Sentry tagging (app/region/process/release/server_name) and introduces `logging.InitSentry` and `observability.StartMetricsServer`; notes graceful metrics shutdown fix for `cmd/analysis`.

Sequence Diagram(s)

sequenceDiagram
    participant Cmd as cmd/{app,worker,analysis}
    participant Logging as internal/logging
    participant Observ as internal/observability
    participant OTel as OTel_Providers
    participant Sentry as Sentry_Backend

    Cmd->>Logging: InitSentry(SentryOptions)
    Logging->>Sentry: sentry.Init(...)
    Logging->>Cmd: return sentryFlush
    Cmd->>Observ: StartMetricsServer(MetricsServerOptions)
    Observ->>OTel: Init OTel providers
    Observ->>Cmd: return MetricsServer (http server if bound)
    Cmd->>Cmd: defer sentryFlush() / defer metricsSrv.Shutdown()
    Note over Observ,OTel: /metrics and optional /debug/pprof/* served by metrics HTTP server
    Cmd->>Sentry: on shutdown -> call sentryFlush()
    Cmd->>Observ: on shutdown -> metricsSrv.Shutdown(ctx)

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Stop worker wedging on log backpressure #351 — touches worker startup observability/pprof and logging shutdown wiring, overlapping the same startup/metrics/pprof code paths.

Poem

🐰 I hopped through code with tidy paws,
I wrapped Sentry, started metrics with applause,
pprof sits gated on a tidy route,
flushes deferred, servers close devout,
a small rabbit hop — the startup hums.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: centralizing Sentry and observability initialization across multiple entry points, and adding deployment-related tags to Sentry events.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch work/sentry-tags-bootstrap

_{Review rate limit: 3/5 reviews remaining, refill in 22 minutes and 28 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

supabase · 2026-05-02T01:15:37Z

Updates to Preview Branch (work/sentry-tags-bootstrap) ↗︎

Deployments	Status	Updated
Database	✅	Sat, 02 May 2026 04:42:47 UTC
Services	✅	Sat, 02 May 2026 04:42:47 UTC
APIs	✅	Sat, 02 May 2026 04:42:47 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks	Status	Updated
Configurations	✅	Sat, 02 May 2026 04:42:49 UTC
Migrations	✅	Sat, 02 May 2026 04:42:51 UTC
Seeding	✅	Sat, 02 May 2026 04:42:53 UTC
Edge Functions	✅	Sat, 02 May 2026 04:42:53 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

codecov · 2026-05-02T01:17:38Z

Codecov Report

❌ Patch coverage is 65.94203% with 47 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
cmd/app/main.go	0.00%	14 Missing ⚠️
cmd/analysis/main.go	0.00%	11 Missing ⚠️
cmd/worker/main.go	0.00%	11 Missing ⚠️
internal/observability/metrics_server.go	82.69%	5 Missing and 4 partials ⚠️
internal/logging/sentry_init.go	96.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

🧹 Nitpick comments (1)

internal/observability/metrics_server_test.go (1)
61-92: 💤 Low value

Consider adding a positive test for EnablePprof: true.

The test verifies pprof returns 404 when disabled, but there's no test confirming pprof endpoints return 200 when enabled. This would strengthen coverage of the conditional registration logic.

,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/observability/metrics_server_test.go` around lines 61 - 92, Add a
new test mirroring TestStartMetricsServerPprofGated that starts
StartMetricsServer with MetricsServerOptions{ServiceName:"hover-test",
Environment:"test", MetricsAddress: freePort(t), EnablePprof: true}, wait for
the server to come up (same polling logic), then perform an HTTP GET to
"http://"+addr+"/debug/pprof/" and assert resp.StatusCode == http.StatusOK (and
close resp.Body and shutdown srv in t.Cleanup). Reuse the same helper
freePort(t) and deadline/polling pattern from TestStartMetricsServerPprofGated
to ensure flakeless startup.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 61-92: Add a new test mirroring TestStartMetricsServerPprofGated
that starts StartMetricsServer with
MetricsServerOptions{ServiceName:"hover-test", Environment:"test",
MetricsAddress: freePort(t), EnablePprof: true}, wait for the server to come up
(same polling logic), then perform an HTTP GET to "http://"+addr+"/debug/pprof/"
and assert resp.StatusCode == http.StatusOK (and close resp.Body and shutdown
srv in t.Cleanup). Reuse the same helper freePort(t) and deadline/polling
pattern from TestStartMetricsServerPprofGated to ensure flakeless startup.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b9cf364d-3dde-440e-a2fb-97d763252079

📥 Commits

Reviewing files that changed from the base of the PR and between c6288ce and ad54867.

📒 Files selected for processing (7)

cmd/analysis/main.go
cmd/app/main.go
cmd/worker/main.go
internal/logging/sentry_init.go
internal/logging/sentry_init_test.go
internal/observability/metrics_server.go
internal/observability/metrics_server_test.go

github-actions · 2026-05-02T01:19:28Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

github-actions · 2026-05-02T01:20:01Z

Release Versions

App patch: v0.34.1 → v0.34.2

Changelog

Added

Sentry events now carry app, process, region, and server_name tags
identifying the Fly app, binary, region, and machine that emitted them.
Review-app errors previously had none of these, so bursts couldn't be
attributed to a specific deploy. New helpers logging.InitSentry and
observability.StartMetricsServer centralise the duplicated bootstrap across
cmd/app, cmd/worker, and cmd/analysis.

Fixed

cmd/analysis now gracefully shuts down its metrics HTTP server on SIGTERM;
previously it spawned the listener but never called Shutdown.

github-actions · 2026-05-02T01:24:13Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 41-42: Replace uses of the default http.Get (which can hang) with
a dedicated http.Client that sets a timeout (e.g. &http.Client{Timeout: 5 *
time.Second}) and call client.Get("http://"+addr+"/metrics") instead of
http.Get; update all occurrences in the test (the spots that assign resp, err =
http.Get(...)) and add the time import if missing so the test fails fast on
stalled connections.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d961c47c-7f0b-4bd7-ad6f-1ba0df1ca001

📥 Commits

Reviewing files that changed from the base of the PR and between d619dc2 and 01d3ed9.

📒 Files selected for processing (2)

internal/logging/sentry_init_test.go
internal/observability/metrics_server_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/logging/sentry_init_test.go

github-actions · 2026-05-02T01:28:17Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

github-actions · 2026-05-02T01:33:36Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

coderabbitai

🧹 Nitpick comments (2)

internal/logging/sentry_init.go (1)

64-68: ⚡ Quick win

Fix the comment order to match runtime behaviour.

Line 64 says tags are stamped before delegating, but Line 71 delegates first and stamps afterwards. Please update the comment so future changes don’t accidentally invert the order.

Proposed wording fix

-// wrapBeforeSend stamps deploy-identifying tags directly onto every event
-// before delegating to the existing BeforeSend normalisation. The earlier
+// wrapBeforeSend delegates to the existing BeforeSend normalisation first,
+// then stamps deploy-identifying tags directly onto every non-nil event. The earlier
 // approach used sentry.ConfigureScope, but staging diagnostics showed scope
 // tags were not reaching events captured via the sentryslog handler — likely
 // a goroutine-local hub interaction. Stamping in BeforeSend is unconditional.

Also applies to: 70-71

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/logging/sentry_init.go` around lines 64 - 68, The comment for
wrapBeforeSend is incorrect about ordering: the function delegates to the
existing BeforeSend first and then stamps deploy-identifying tags, but the
comment says the reverse; update the comment in internal/logging/sentry_init.go
(the wrapBeforeSend / BeforeSend block) to state that it delegates to the
existing BeforeSend first and then unconditionally adds/stamps the
deploy-identifying tags onto the event so the wording matches the actual runtime
behavior.

internal/logging/sentry_init_test.go (1)

9-33: ⚡ Quick win

Add an explicit non-overwrite test for deploy tags.

This test proves stamping works, but it doesn’t lock in the “don’t overwrite existing app/region/process tags” contract. A dedicated case will prevent regressions in tag precedence.

Suggested additional test

+func TestWrapBeforeSendPreservesExistingDeployTags(t *testing.T) {
+	fn := wrapBeforeSend("hover-pr-372", "syd", "worker")
+
+	event := &sentry.Event{
+		Message: "test",
+		Tags: map[string]string{
+			"app":     "preset-app",
+			"region":  "iad",
+			"process": "analysis",
+		},
+	}
+
+	got := fn(event, nil)
+	if got == nil {
+		t.Fatal("expected non-nil event")
+	}
+	if got.Tags["app"] != "preset-app" {
+		t.Errorf("app overwritten: %q", got.Tags["app"])
+	}
+	if got.Tags["region"] != "iad" {
+		t.Errorf("region overwritten: %q", got.Tags["region"])
+	}
+	if got.Tags["process"] != "analysis" {
+		t.Errorf("process overwritten: %q", got.Tags["process"])
+	}
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/logging/sentry_init_test.go` around lines 9 - 33, Add a new unit
test that verifies wrapBeforeSend does not overwrite existing deploy tags:
create an event with Tags already containing "app", "region", and "process" set
to distinct values, call fn := wrapBeforeSend("hover-pr-372", "syd", "worker")
and invoke fn(event, nil), then assert the returned event is non-nil and that
got.Tags["app"], got.Tags["region"], and got.Tags["process"] still equal the
original values (and not the new "hover-pr-372"/"syd"/"worker"), while other
tags remain present; use wrapBeforeSend and the same sentry.Event shape as in
TestWrapBeforeSendStampsTags to locate where to add this test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/logging/sentry_init_test.go`:
- Around line 9-33: Add a new unit test that verifies wrapBeforeSend does not
overwrite existing deploy tags: create an event with Tags already containing
"app", "region", and "process" set to distinct values, call fn :=
wrapBeforeSend("hover-pr-372", "syd", "worker") and invoke fn(event, nil), then
assert the returned event is non-nil and that got.Tags["app"],
got.Tags["region"], and got.Tags["process"] still equal the original values (and
not the new "hover-pr-372"/"syd"/"worker"), while other tags remain present; use
wrapBeforeSend and the same sentry.Event shape as in
TestWrapBeforeSendStampsTags to locate where to add this test.

In `@internal/logging/sentry_init.go`:
- Around line 64-68: The comment for wrapBeforeSend is incorrect about ordering:
the function delegates to the existing BeforeSend first and then stamps
deploy-identifying tags, but the comment says the reverse; update the comment in
internal/logging/sentry_init.go (the wrapBeforeSend / BeforeSend block) to state
that it delegates to the existing BeforeSend first and then unconditionally
adds/stamps the deploy-identifying tags onto the event so the wording matches
the actual runtime behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9c5d5190-063c-497d-bd50-e0d36302351d

📥 Commits

Reviewing files that changed from the base of the PR and between 3290e36 and 85c4119.

📒 Files selected for processing (2)

internal/logging/sentry_init.go
internal/logging/sentry_init_test.go

github-actions · 2026-05-02T02:15:34Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

github-actions · 2026-05-02T04:25:18Z

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

simonsmallchua added 3 commits May 2, 2026 11:14

Centralise Sentry init with deploy tags

744f992

Centralise observability metrics server setup

60082cb

Use bootstrap helpers in cmds

ad54867

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Update changelog for Sentry tags

d619dc2

Cover sentry init and pprof enabled

01d3ed9

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Comment thread internal/observability/metrics_server_test.go Outdated

Add timeout to metrics server tests

3290e36

Stamp deploy tags in Sentry BeforeSend

85c4119

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Address review nits on tag stamping

d7fca98

Tighten changelog for Sentry tags

8c6588d

simonsmallchua merged commit 84f2bd8 into main May 2, 2026
10 of 11 checks passed

simonsmallchua deleted the work/sentry-tags-bootstrap branch May 2, 2026 04:43

coderabbitai Bot mentioned this pull request May 9, 2026

Expand Sentry coverage and tracing #380

Merged

9 tasks

Conversation

simonsmallchua commented May 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What's in each commit

Behaviour preserved

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

supabase Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Versions

Changelog

Added

Fixed

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simonsmallchua commented May 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading

supabase Bot commented May 2, 2026 •

edited

Loading

codecov Bot commented May 2, 2026 •

edited

Loading

github-actions Bot commented May 2, 2026 •

edited

Loading