fix(security): gate pprof behind ARC_DEBUG_PPROF on a localhost listener + anchor + normalise PublicPrefixes match#443
Conversation
…ner + anchor + normalise PublicPrefixes match Closes audit finding #2 from the 2026-05-19 external report (Alex Manson / @NeuroWinter). Pre-26.06.1 internal/api/server.go called app.Use(pprof.New()) unconditionally on the public Fiber app, and cmd/arc/main.go added /debug/pprof to the auth middleware's PublicPrefixes list. Any network-reachable caller — no token, no auth — could fetch heap dumps (leaking in-flight SQL strings, msgpack records, cached *TokenInfo), leak goroutine stacks, or pin a CPU core via /debug/pprof/profile?seconds=N. This PR removes pprof from the public Fiber app entirely. The endpoints are now opt-in via ARC_DEBUG_PPROF=1 on a separate listener bound to 127.0.0.1:6060 by default. ARC_DEBUG_PPROF_ADDR overrides the bind; a non-loopback override additionally requires ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1 so a single typo cannot expose pprof to the network. The listener uses a private *http.ServeMux (not http.DefaultServeMux), shuts down via srv.Close() instead of srv.Shutdown(ctx) so a long /debug/pprof/profile?seconds=N capture cannot starve the coordinator's 30s shutdown budget, and bounds slow-client attacks via ReadHeaderTimeout/WriteTimeout/IdleTimeout. Bundled defense-in-depth fix to the auth middleware's PublicPrefixes matcher (same shape as gemini-flagged deniedRoots gap on PR #442): - Anchored match: requires exact-equal or true-subdirectory (prefix + "/"). Sibling paths like /metricsX, /metrics-secret no longer slip through. - path.Clean normalisation before match: non-canonical request shapes like /metrics//foo, /metrics/./x, /metrics/../sensitive are normalised to their canonical form before the bypass branch checks them. Without normalisation a request for /metrics/../api/v1/query would lexically pass the anchored HasPrefix check; after normalisation it becomes /api/v1/query and correctly requires auth. - Empty-prefix guard: skips empty entries so a future config bug that lands "" in PublicPrefixes can't open the whole API. With /debug/pprof removed from the public prefix list, the matcher changes are not currently reachable for any production route — they're guards for any prefix added in the future. Tests added: - TestServer_PprofNotRegisteredOnPublicApp (12 pprof paths against the public Fiber app, all must 404). - TestMiddleware_PublicPrefixes_AnchoredMatch (10 subtests covering exact/trailing-slash/subdir bypass + sibling-byte-prefix shapes + parent-traversal escape shapes + empty-prefix guard). - TestStartDebugPprofIfEnabled_NoopWhenDisabled, _BindsAndServes, TestIsTruthy, TestIsLoopbackBindAddr. Reported by Alex Manson (@NeuroWinter, https://neurowinter.com/). Three rounds of internal multi-agent review caught: shutdown DoS chain via long pprof captures starving downstream shutdown hooks (HIGH), un-normalised c.Path() admitting double-slash / dot-segment / parent- traversal into the bypass branch (HIGH), factually wrong DefaultServeMux comment, duplicate component logger field, listener race window between RegisterHook and goroutine schedule, missing WriteTimeout/IdleTimeout, non-loopback bind needing two opt-ins, empty-string PublicPrefixes guard, builtin min vs reinvented minInt, duplicate /metrics append, stale 30-second http-server comment. All addressed.
There was a problem hiding this comment.
Code Review
This pull request enhances security by moving Go pprof endpoints to an opt-in localhost listener and implementing path normalization and anchored prefix matching in the authentication middleware. Reviewer feedback identifies discrepancies in the release notes regarding log levels and test counts, and suggests simplifying the pprof listener by removing redundant handler registrations.
|
|
||
| - **Default behavior changed**: pprof is no longer reachable on Arc's API port (`:8000` by default). Existing deployments that relied on `curl http://arc:8000/debug/pprof/heap` will start getting `404 Not Found`. Set `ARC_DEBUG_PPROF=1` and reach pprof on `127.0.0.1:6060` instead. | ||
| - **`/metrics` is unchanged**: Prometheus scrapers continue to work as before. Only `/debug/pprof/*` moved. | ||
| - **Override-only-for-explicit-opt-in**: `ARC_DEBUG_PPROF_ADDR` accepts any bind string Go's `net.Listen` understands. Setting it to `0.0.0.0:6060` is the operator opting into cross-host pprof exposure — Arc logs a Warn at startup naming the bound address so the choice is visible in logs. |
There was a problem hiding this comment.
The release note states that Arc logs a Warn for non-loopback pprof binds. However, the implementation in cmd/arc/debug_pprof.go upgrades the log level to Error for non-loopback addresses to increase visibility. Please update the release note to reflect that an Error is logged in this case for accuracy.
|
|
||
| With `/debug/pprof` removed from the public prefix list, items 1 and 2 are not currently reachable for any production route — the fixes are guards for any prefix added in the future. | ||
|
|
||
| Tests added: `TestServer_PprofNotRegisteredOnPublicApp` (12 pprof paths against the public Fiber app, all must 404), `TestMiddleware_PublicPrefixes_AnchoredMatch` (7 paths covering exact match, true subdirectories, and three sibling-shaped paths that must NOT bypass auth), plus the new `cmd/arc/debug_pprof_test.go` (no-op when disabled, binds-and-serves when enabled, isTruthy contract). |
There was a problem hiding this comment.
The description of TestMiddleware_PublicPrefixes_AnchoredMatch mentions that it covers 7 paths. The test implementation is more comprehensive and includes 10 sub-tests, covering important cases like path traversal (/metrics/../sensitive) and the empty-prefix guard. To accurately reflect the test's thoroughness, consider updating this description.
| mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) | ||
| mux.HandleFunc("/debug/pprof/profile", pprof.Profile) | ||
| mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) | ||
| mux.HandleFunc("/debug/pprof/trace", pprof.Trace) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Verified empirically that pprof.Index does NOT cover cmdline/profile/symbol/trace. Index dispatches via runtime/pprof.Lookup, which contains only the runtime profiles (heap, goroutine, allocs, mutex, block, threadcreate). The four special endpoints are separate functions in net/http/pprof and require their own registrations.
Test with just pprof.Index registered (Gemini's suggestion):
/debug/pprof/cmdline -> 404 body: Unknown profile
/debug/pprof/profile?... -> 404 body: Unknown profile
/debug/pprof/symbol -> 404 body: Unknown profile
/debug/pprof/trace?... -> 404 body: Unknown profile
/debug/pprof/heap -> 200
/debug/pprof/goroutine -> 200
Source: net/http/pprof/pprof.go — Index extracts name after /debug/pprof/ and calls handler(name).ServeHTTP, which calls pprof.Lookup(name) and 404s on the four non-runtime-profile names.
Keeping the explicit registrations. Strengthened TestStartDebugPprofIfEnabled_BindsAndServes in commit fd781a0 to hit /debug/pprof/cmdline and /debug/pprof/symbol so a future contributor accepting the same suggestion would catch the regression in CI.
…against the gemini-r3 suggestion Addresses Gemini Code Assist review on PR #443: - G1 (release notes:55): said pprof logs a "Warn" for non-loopback binds, but the code upgrades to Error AND requires the second opt-in env var ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1. Drift introduced by my own earlier fixup; release notes now match. - G2 (release notes:65): said the anchored-match test covers "7 paths" but the actual test has 10 subtests (expanded during the review fixup). Updated. - G3 (debug_pprof.go:104): Gemini suggested removing the explicit HandleFunc registrations for /debug/pprof/{cmdline,profile,symbol, trace}, claiming pprof.Index covers them. Empirically refuted: pprof.Index dispatches via runtime/pprof.Lookup, which contains only the runtime profiles (heap, goroutine, allocs, etc.). The four special endpoints are separate functions in net/http/pprof and require their own registrations — without them, requests return 404 "Unknown profile". Strengthened TestStartDebugPprofIfEnabled_BindsAndServes to hit /debug/pprof/cmdline and /debug/pprof/symbol so a future contributor accepting the same suggestion blindly would catch the regression in CI. (profile and trace skipped because they actively capture data over seconds.) Reply posted on the Gemini comment thread with the empirical evidence.
|
@gemini-code-assist please review |
There was a problem hiding this comment.
Code Review
This pull request enhances security by removing pprof endpoints from the public API and relocating them to an opt-in, localhost-bound listener. It also strengthens the authentication middleware by introducing path normalization and anchored prefix matching to mitigate risks from path traversal and prefix-based auth bypasses. Feedback was provided regarding a logic flaw in the anchored matching implementation where prefixes with trailing slashes would fail to match subdirectories, along with a suggested fix to handle these cases correctly.
| // Anchor the match: require exact path or true subdirectory | ||
| // (`prefix + "/"`). Without this, a configured prefix `/metrics` | ||
| // also matches `/metricsX`, `/metrics-secret`, etc. — any route | ||
| // happening to share the same byte prefix would silently bypass | ||
| // auth. Matches the same shape gemini-code-assist flagged on | ||
| // PR #442's deniedRoots check. | ||
| if reqPath == prefix || strings.HasPrefix(reqPath, prefix+"/") { | ||
| return c.Next() | ||
| } |
There was a problem hiding this comment.
The current logic for anchored prefix matching doesn't correctly handle public prefixes that have a trailing slash (e.g., /metrics/). This could lead to unexpected behavior where subpaths are not considered public.
For example, if PublicPrefixes contains /metrics/, a request to /metrics/prometheus would not be matched as a public path because the check becomes strings.HasPrefix("/metrics/prometheus", "/metrics//"), which is false.
To make the logic more robust and handle this case, I suggest differentiating between prefixes with and without a trailing slash.
// Anchor the match: require exact path or true subdirectory.
// This logic handles prefixes with and without trailing slashes
// to prevent unintended auth bypasses for sibling paths.
if strings.HasSuffix(prefix, "/") {
if strings.HasPrefix(reqPath, prefix) {
return c.Next()
}
} else {
if reqPath == prefix || strings.HasPrefix(reqPath, prefix+"/") {
return c.Next()
}
}…ore anchored match Addresses Gemini Code Assist 2nd review on PR #443. A configured prefix with a trailing slash (e.g. `PublicPrefixes = ["/metrics/"]`) silently broke the anchored match because `prefix + "/"` became `/metrics//`, which matches no real request path. Operator who reads "this is a directory prefix" and adds the slash for clarity would silently lose bypass on every subpath under `/metrics/`. Today this is unreachable: neither DefaultMiddlewareConfig nor cmd/arc/main.go uses a trailing-slash entry. The fix is a footgun guard for any future config that does. Fix: `prefix = strings.TrimSuffix(prefix, "/")` inside the loop, before the existing anchored check. Symmetric with the request-path normalisation (`path.Clean`) above; both sides are now canonical before equality and HasPrefix run. Deliberately NOT path.Clean on the prefix because it's operator-supplied static config (not attacker-controlled) and Clean would also collapse a configured `//foo` to `/foo` — masking the misconfiguration instead of treating it as written. Empty-prefix guard moved after the TrimSuffix so a configured `/` (which the trim just emptied) is also caught. Tests added: - TestMiddleware_PublicPrefixes_TrailingSlashNormalisation runs the same 4-case matrix against both `/metrics` and `/metrics/` configured prefixes; both must produce byte-identical bypass/require-auth results.
|
@gemini-code-assist please review |
There was a problem hiding this comment.
Code Review
This pull request significantly improves the security posture of the application by removing Go pprof endpoints from the public API port and replacing them with an opt-in, separate listener that defaults to localhost. Additionally, it implements a defense-in-depth fix for the authentication middleware's public prefix matching logic, which now utilizes anchored matching and path normalization to prevent bypasses via path traversal or sibling-prefix attacks. Comprehensive regression tests and documentation updates are included for both changes. I have no feedback to provide as there were no review comments.
Summary
Closes audit finding #2 from the 2026-05-19 external report (@NeuroWinter): pre-26.06.1 Arc mounted
/debug/pprof/*unconditionally on the public Fiber app AND added/debug/pprofto the auth middleware'sPublicPrefixeslist. Any network-reachable caller — no token, no auth — could fetch heap dumps (leaking in-flight SQL strings, msgpack records, cached*TokenInfo), leak goroutine stacks, or pin a CPU core via/debug/pprof/profile?seconds=N.This PR removes pprof from the public Fiber app entirely. The endpoints are now opt-in via
ARC_DEBUG_PPROF=1on a separate listener bound to127.0.0.1:6060by default. Non-loopback binds require a second explicit opt-in (ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1) so a single typo in the address knob cannot expose pprof to the network.Tied to GHSA-j93g-rp6m-j32m (private draft).
What's in this PR
pprof hardening:
internal/api/server.go: drop the unconditionalapp.Use(pprof.New()).cmd/arc/main.go: drop/debug/pproffromPublicPrefixes, wirestartDebugPprofIfEnabledafter the shutdown coordinator.cmd/arc/debug_pprof.go(NEW): env-gated localhost listener on a private*http.ServeMux. Force-closes viasrv.Close()(notsrv.Shutdown(ctx)) so a long/debug/pprof/profile?seconds=Ncapture cannot starve the coordinator's shared 30s shutdown budget. Server-side timeouts (ReadHeaderTimeout=5s,WriteTimeout=10m,IdleTimeout=60s) bound slow-client attacks.Auth middleware defence-in-depth (same shape as the deniedRoots gap gemini caught on PR #442):
path == prefix || strings.HasPrefix(path, prefix + \"/\"). Sibling paths like/metricsX,/metrics-secretno longer slip through.path.Cleannormalisation before match: non-canonical shapes (/metrics//foo,/metrics/./x,/metrics/../sensitive) are normalised before the bypass branch checks them.PublicPrefixesentries so a future config bug can't open the whole API.With
/debug/pprofremoved from the public prefix list, the middleware changes are not currently reachable for any production route — they're guards for any prefix added in the future.Operator-facing changes
curl http://arc:8000/debug/pprof/heapnow returns 404. Existing deployments that relied on this for production debugging must setARC_DEBUG_PPROF=1and reach pprof on127.0.0.1:6060./metricsunchanged: Prometheus scrapers continue to work as before.ARC_DEBUG_PPROF_ADDR=0.0.0.0:6060alone refuses to start; pair withARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1for the explicit opt-in.Test plan
go build ./...clean.go test ./internal/auth/... ./internal/api/... ./cmd/arc/...— all green.go test -count=1 -short ./... -timeout 300s— all packages green (pre-existingbenchmarks/fts_experimentvet failure unaffected).TestServer_PprofNotRegisteredOnPublicAppexercises the 12 pprof paths the gofiber middleware used to register and asserts each returns 404.TestMiddleware_PublicPrefixes_AnchoredMatchcovers 10 subtests: exact / trailing-slash / true subdir / deep subdir bypass + sibling-byte-prefix shapes (/metricsX,/metrics-secret,/metricsadmin) + parent-traversal escape shapes (/metrics/../sensitive,/metrics/sub/../../etc) + empty-prefix guard.TestStartDebugPprofIfEnabled_NoopWhenDisabled— no listener when env unset.TestStartDebugPprofIfEnabled_BindsAndServes— env-gated listener binds, serves pprof, 404s non-pprof paths (proving private mux not DefaultServeMux), force-closes cleanly.TestIsTruthy+TestIsLoopbackBindAddr— env-var contract + loopback detection (incl. fail-closed on unresolvable hosts).Internal review
Three rounds of multi-agent review per
.claude/CLAUDE.mdcaught:c.Path()admitting//,/./,/../shapes into the bypass branch.DefaultServeMuxcomment, duplicate component logger field, listener-vs-Serve race window, missingWriteTimeout/IdleTimeout, non-loopback bind needing a second opt-in, empty-stringPublicPrefixesguard.minvs reinventedminInt, duplicate/metricsappend, deadsrv.Addrfield, stale30-secondhttp-server comment, unnecessary handler registrations in test.All 13 addressed before opening this PR (single squashed commit on the branch).
🤖 Generated with Claude Code