Skip to content

Centralise Sentry + observability bootstrap, add deploy tags#372

Merged
simonsmallchua merged 9 commits into
mainfrom
work/sentry-tags-bootstrap
May 2, 2026
Merged

Centralise Sentry + observability bootstrap, add deploy tags#372
simonsmallchua merged 9 commits into
mainfrom
work/sentry-tags-bootstrap

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented May 2, 2026

Summary

  • Add logging.InitSentry + observability.StartMetricsServer so the three Fly mains share one Sentry init and one metrics-server setup.
  • Sentry events now carry app, region, process, release, and server_name tags read from Fly env vars — staging events were missing all of these, so we couldn't pin connect-error bursts to a specific review app.
  • Net effect on the three mains: −151 / +46 lines, with cmd/analysis's missing metrics-server graceful shutdown fixed for free.

Why

Investigating recurring *pgconn.ConnectError ("tenant/user … not found") spikes on staging review apps. Production has zero of these errors over the same window, but every staging event has release: null and no app/server tag, so we can't tell whether the bursts are concentrated on one review app, all of them, or correlated with deploys.

This PR is the test case: once it lands and a review app is built from it, we run a normal job and check whether the bursts return — and if so, which app / release / process they're tagged with.

What's in each commit

  1. Centralise Sentry init with deploy tags — new internal/logging/sentry_init.go + tests. Reads FLY_RELEASE_VERSIONFLY_IMAGE_REF for Release; FLY_MACHINE_ID (with hostname fallback) for ServerName; sets scope tags app=FLY_APP_NAME, region=FLY_REGION, process from caller.
  2. Centralise observability metrics server setup — new internal/observability/metrics_server.go + tests. Wraps Init + listener bind + Serve goroutine + graceful shutdown behind a single Shutdown(ctx). EnablePprof toggles the pprof handlers (worker/analysis on, app off — matches existing behaviour).
  3. Use bootstrap helpers in cmds — swaps the duplicated boilerplate in cmd/app, cmd/worker, cmd/analysis for the helpers. Also fixes a pre-existing bug where cmd/analysis started a metrics HTTP server but never shut it down.

Behaviour preserved

  • cmd/app keeps TracesSampleRate (0.1 prod / 1.0 elsewhere) and Debug in development.
  • cmd/app does not expose pprof on the metrics port (would sit alongside the public listener).
  • Worker + analysis keep pprof on the metrics port.
  • Three Fly autoscaler apps run the third-party flyio/fly-autoscaler image and don't emit to our Sentry — untouched.

Test plan

  • go test ./... passes
  • bash scripts/security-check.sh passes
  • Review app builds and boots (worker + app + analysis)
  • Run a normal crawl job against the review app
  • Check Sentry: events from this run carry app, process, release tags
  • Confirm whether the staging pgconn.ConnectError bursts return with the new tagging — and if so, which app/release they're attributed to

Summary by CodeRabbit

  • Refactor
    • Unified error-tracking initialization and metrics/profiling server startup across services for consistent startup/shutdown behavior.
  • Bug Fixes
    • Metrics HTTP server now shuts down gracefully on termination; observability startup tolerates bind failures.
  • New Features
    • Error-tracking events now include app, process, region, release, and server-name tags; pprof exposure is configurable.
  • Tests
    • Added tests for error-tracking helpers, metrics endpoints, pprof gating, and shutdown idempotency.
  • Documentation
    • Changelog updated to reflect tagging and observability changes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 546a1d7d-07c5-4639-b481-b2b9c6bc529d

📥 Commits

Reviewing files that changed from the base of the PR and between d7fca98 and 8c6588d.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Centralizes Sentry and metrics/pprof startup: adds logging.InitSentry(...) and observability.StartMetricsServer(...), updates cmd/app, cmd/worker, and cmd/analysis to use them (capturing/deferring returned flush and Shutdown), and removes manual in-file Sentry and HTTP/pprof server wiring.

Changes

Observability & Logging Abstractions

Layer / File(s) Summary
Data / Config Shape
internal/logging/sentry_init.go, internal/observability/metrics_server.go
Adds SentryOptions and MetricsServerOptions to encapsulate Sentry and metrics server configuration (DSN, Environment, Process, TracesSampleRate, OTLP endpoint/headers, MetricsAddress, EnablePprof, logger).
Core Implementation (Sentry)
internal/logging/sentry_init.go
Implements InitSentry(opts) (func(), error) that conditionally initializes Sentry, composes sentry.ClientOptions (release/server name/stacktrace/debug/BeforeSend), applies Fly-derived tags, optionally sets traces sample rate, and returns a flush closure (no-op when DSN empty). Adds wrapBeforeSend, deployRelease, and serverName helpers.
Core Implementation (Metrics)
internal/observability/metrics_server.go
Implements StartMetricsServer(ctx, opts) (*MetricsServer, error) which initializes OTel providers, conditionally constructs an http.ServeMux for /metrics and gated /debug/pprof/*, attempts to bind MetricsAddress, serves in a goroutine when bound, and exposes (*MetricsServer).Shutdown.
Cmd Wiring / Integration
cmd/app/main.go, cmd/worker/main.go, cmd/analysis/main.go
Replace inline sentry-go init and manual http/pprof server wiring with logging.InitSentry(...) (capture and defer returned flush) and observability.StartMetricsServer(...) (capture server and defer Shutdown). Adds appTracesSampleRate helper in cmd/app. Removes now-unused direct imports (net, net/http, net/http/pprof, errors, github.com/getsentry/sentry-go).
Tests / Validation
internal/logging/sentry_init_test.go, internal/observability/metrics_server_test.go
Adds tests: Sentry BeforeSend wrapper behavior and init (including DSN no-op), deployRelease() precedence, serverName() fallback; metrics server tests for /metrics, pprof gating, idempotent Shutdown and no-address behavior.
Changelog
CHANGELOG.md
Documents Sentry tagging (app/region/process/release/server_name) and introduces logging.InitSentry and observability.StartMetricsServer; notes graceful metrics shutdown fix for cmd/analysis.

Sequence Diagram(s)

sequenceDiagram
    participant Cmd as cmd/{app,worker,analysis}
    participant Logging as internal/logging
    participant Observ as internal/observability
    participant OTel as OTel_Providers
    participant Sentry as Sentry_Backend

    Cmd->>Logging: InitSentry(SentryOptions)
    Logging->>Sentry: sentry.Init(...)
    Logging->>Cmd: return sentryFlush
    Cmd->>Observ: StartMetricsServer(MetricsServerOptions)
    Observ->>OTel: Init OTel providers
    Observ->>Cmd: return MetricsServer (http server if bound)
    Cmd->>Cmd: defer sentryFlush() / defer metricsSrv.Shutdown()
    Note over Observ,OTel: /metrics and optional /debug/pprof/* served by metrics HTTP server
    Cmd->>Sentry: on shutdown -> call sentryFlush()
    Cmd->>Observ: on shutdown -> metricsSrv.Shutdown(ctx)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Poem

🐰 I hopped through code with tidy paws,
I wrapped Sentry, started metrics with applause,
pprof sits gated on a tidy route,
flushes deferred, servers close devout,
a small rabbit hop — the startup hums.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: centralizing Sentry and observability initialization across multiple entry points, and adding deployment-related tags to Sentry events.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch work/sentry-tags-bootstrap

Review rate limit: 3/5 reviews remaining, refill in 22 minutes and 28 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@supabase
Copy link
Copy Markdown

supabase Bot commented May 2, 2026

Updates to Preview Branch (work/sentry-tags-bootstrap) ↗︎

Deployments Status Updated
Database Sat, 02 May 2026 04:42:47 UTC
Services Sat, 02 May 2026 04:42:47 UTC
APIs Sat, 02 May 2026 04:42:47 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Sat, 02 May 2026 04:42:49 UTC
Migrations Sat, 02 May 2026 04:42:51 UTC
Seeding Sat, 02 May 2026 04:42:53 UTC
Edge Functions Sat, 02 May 2026 04:42:53 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 65.94203% with 47 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
cmd/app/main.go 0.00% 14 Missing ⚠️
cmd/analysis/main.go 0.00% 11 Missing ⚠️
cmd/worker/main.go 0.00% 11 Missing ⚠️
internal/observability/metrics_server.go 82.69% 5 Missing and 4 partials ⚠️
internal/logging/sentry_init.go 96.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/observability/metrics_server_test.go (1)

61-92: 💤 Low value

Consider adding a positive test for EnablePprof: true.

The test verifies pprof returns 404 when disabled, but there's no test confirming pprof endpoints return 200 when enabled. This would strengthen coverage of the conditional registration logic.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/observability/metrics_server_test.go` around lines 61 - 92, Add a
new test mirroring TestStartMetricsServerPprofGated that starts
StartMetricsServer with MetricsServerOptions{ServiceName:"hover-test",
Environment:"test", MetricsAddress: freePort(t), EnablePprof: true}, wait for
the server to come up (same polling logic), then perform an HTTP GET to
"http://"+addr+"/debug/pprof/" and assert resp.StatusCode == http.StatusOK (and
close resp.Body and shutdown srv in t.Cleanup). Reuse the same helper
freePort(t) and deadline/polling pattern from TestStartMetricsServerPprofGated
to ensure flakeless startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 61-92: Add a new test mirroring TestStartMetricsServerPprofGated
that starts StartMetricsServer with
MetricsServerOptions{ServiceName:"hover-test", Environment:"test",
MetricsAddress: freePort(t), EnablePprof: true}, wait for the server to come up
(same polling logic), then perform an HTTP GET to "http://"+addr+"/debug/pprof/"
and assert resp.StatusCode == http.StatusOK (and close resp.Body and shutdown
srv in t.Cleanup). Reuse the same helper freePort(t) and deadline/polling
pattern from TestStartMetricsServerPprofGated to ensure flakeless startup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b9cf364d-3dde-440e-a2fb-97d763252079

📥 Commits

Reviewing files that changed from the base of the PR and between c6288ce and ad54867.

📒 Files selected for processing (7)
  • cmd/analysis/main.go
  • cmd/app/main.go
  • cmd/worker/main.go
  • internal/logging/sentry_init.go
  • internal/logging/sentry_init_test.go
  • internal/observability/metrics_server.go
  • internal/observability/metrics_server_test.go

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Release Versions

App patch: v0.34.1v0.34.2

Changelog

Added

  • Sentry events now carry app, process, region, and server_name tags
    identifying the Fly app, binary, region, and machine that emitted them.
    Review-app errors previously had none of these, so bursts couldn't be
    attributed to a specific deploy. New helpers logging.InitSentry and
    observability.StartMetricsServer centralise the duplicated bootstrap across
    cmd/app, cmd/worker, and cmd/analysis.

Fixed

  • cmd/analysis now gracefully shuts down its metrics HTTP server on SIGTERM;
    previously it spawned the listener but never called Shutdown.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/observability/metrics_server_test.go`:
- Around line 41-42: Replace uses of the default http.Get (which can hang) with
a dedicated http.Client that sets a timeout (e.g. &http.Client{Timeout: 5 *
time.Second}) and call client.Get("http://"+addr+"/metrics") instead of
http.Get; update all occurrences in the test (the spots that assign resp, err =
http.Get(...)) and add the time import if missing so the test fails fast on
stalled connections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d961c47c-7f0b-4bd7-ad6f-1ba0df1ca001

📥 Commits

Reviewing files that changed from the base of the PR and between d619dc2 and 01d3ed9.

📒 Files selected for processing (2)
  • internal/logging/sentry_init_test.go
  • internal/observability/metrics_server_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/logging/sentry_init_test.go

Comment thread internal/observability/metrics_server_test.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
internal/logging/sentry_init.go (1)

64-68: ⚡ Quick win

Fix the comment order to match runtime behaviour.

Line 64 says tags are stamped before delegating, but Line 71 delegates first and stamps afterwards. Please update the comment so future changes don’t accidentally invert the order.

Proposed wording fix
-// wrapBeforeSend stamps deploy-identifying tags directly onto every event
-// before delegating to the existing BeforeSend normalisation. The earlier
+// wrapBeforeSend delegates to the existing BeforeSend normalisation first,
+// then stamps deploy-identifying tags directly onto every non-nil event. The earlier
 // approach used sentry.ConfigureScope, but staging diagnostics showed scope
 // tags were not reaching events captured via the sentryslog handler — likely
 // a goroutine-local hub interaction. Stamping in BeforeSend is unconditional.

Also applies to: 70-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/logging/sentry_init.go` around lines 64 - 68, The comment for
wrapBeforeSend is incorrect about ordering: the function delegates to the
existing BeforeSend first and then stamps deploy-identifying tags, but the
comment says the reverse; update the comment in internal/logging/sentry_init.go
(the wrapBeforeSend / BeforeSend block) to state that it delegates to the
existing BeforeSend first and then unconditionally adds/stamps the
deploy-identifying tags onto the event so the wording matches the actual runtime
behavior.
internal/logging/sentry_init_test.go (1)

9-33: ⚡ Quick win

Add an explicit non-overwrite test for deploy tags.

This test proves stamping works, but it doesn’t lock in the “don’t overwrite existing app/region/process tags” contract. A dedicated case will prevent regressions in tag precedence.

Suggested additional test
+func TestWrapBeforeSendPreservesExistingDeployTags(t *testing.T) {
+	fn := wrapBeforeSend("hover-pr-372", "syd", "worker")
+
+	event := &sentry.Event{
+		Message: "test",
+		Tags: map[string]string{
+			"app":     "preset-app",
+			"region":  "iad",
+			"process": "analysis",
+		},
+	}
+
+	got := fn(event, nil)
+	if got == nil {
+		t.Fatal("expected non-nil event")
+	}
+	if got.Tags["app"] != "preset-app" {
+		t.Errorf("app overwritten: %q", got.Tags["app"])
+	}
+	if got.Tags["region"] != "iad" {
+		t.Errorf("region overwritten: %q", got.Tags["region"])
+	}
+	if got.Tags["process"] != "analysis" {
+		t.Errorf("process overwritten: %q", got.Tags["process"])
+	}
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/logging/sentry_init_test.go` around lines 9 - 33, Add a new unit
test that verifies wrapBeforeSend does not overwrite existing deploy tags:
create an event with Tags already containing "app", "region", and "process" set
to distinct values, call fn := wrapBeforeSend("hover-pr-372", "syd", "worker")
and invoke fn(event, nil), then assert the returned event is non-nil and that
got.Tags["app"], got.Tags["region"], and got.Tags["process"] still equal the
original values (and not the new "hover-pr-372"/"syd"/"worker"), while other
tags remain present; use wrapBeforeSend and the same sentry.Event shape as in
TestWrapBeforeSendStampsTags to locate where to add this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/logging/sentry_init_test.go`:
- Around line 9-33: Add a new unit test that verifies wrapBeforeSend does not
overwrite existing deploy tags: create an event with Tags already containing
"app", "region", and "process" set to distinct values, call fn :=
wrapBeforeSend("hover-pr-372", "syd", "worker") and invoke fn(event, nil), then
assert the returned event is non-nil and that got.Tags["app"],
got.Tags["region"], and got.Tags["process"] still equal the original values (and
not the new "hover-pr-372"/"syd"/"worker"), while other tags remain present; use
wrapBeforeSend and the same sentry.Event shape as in
TestWrapBeforeSendStampsTags to locate where to add this test.

In `@internal/logging/sentry_init.go`:
- Around line 64-68: The comment for wrapBeforeSend is incorrect about ordering:
the function delegates to the existing BeforeSend first and then stamps
deploy-identifying tags, but the comment says the reverse; update the comment in
internal/logging/sentry_init.go (the wrapBeforeSend / BeforeSend block) to state
that it delegates to the existing BeforeSend first and then unconditionally
adds/stamps the deploy-identifying tags onto the event so the wording matches
the actual runtime behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9c5d5190-063c-497d-bd50-e0d36302351d

📥 Commits

Reviewing files that changed from the base of the PR and between 3290e36 and 85c4119.

📒 Files selected for processing (2)
  • internal/logging/sentry_init.go
  • internal/logging/sentry_init_test.go

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

🐝 Review App Deployed

Homepage: https://hover-pr-372.fly.dev
Dashboard: https://hover-pr-372.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit 84f2bd8 into main May 2, 2026
10 of 11 checks passed
@simonsmallchua simonsmallchua deleted the work/sentry-tags-bootstrap branch May 2, 2026 04:43
@coderabbitai coderabbitai Bot mentioned this pull request May 9, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant