Skip to content

fix(security): gate pprof behind ARC_DEBUG_PPROF on a localhost listener + anchor + normalise PublicPrefixes match#443

Merged
xe-nvdk merged 3 commits into
mainfrom
fix/pprof-private-opt-in
May 20, 2026
Merged

fix(security): gate pprof behind ARC_DEBUG_PPROF on a localhost listener + anchor + normalise PublicPrefixes match#443
xe-nvdk merged 3 commits into
mainfrom
fix/pprof-private-opt-in

Conversation

@xe-nvdk
Copy link
Copy Markdown
Member

@xe-nvdk xe-nvdk commented May 20, 2026

Summary

Closes audit finding #2 from the 2026-05-19 external report (@NeuroWinter): pre-26.06.1 Arc mounted /debug/pprof/* unconditionally on the public Fiber app AND added /debug/pprof to the auth middleware's PublicPrefixes list. Any network-reachable caller — no token, no auth — could fetch heap dumps (leaking in-flight SQL strings, msgpack records, cached *TokenInfo), leak goroutine stacks, or pin a CPU core via /debug/pprof/profile?seconds=N.

This PR removes pprof from the public Fiber app entirely. The endpoints are now opt-in via ARC_DEBUG_PPROF=1 on a separate listener bound to 127.0.0.1:6060 by default. Non-loopback binds require a second explicit opt-in (ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1) so a single typo in the address knob cannot expose pprof to the network.

Tied to GHSA-j93g-rp6m-j32m (private draft).

What's in this PR

pprof hardening:

  • internal/api/server.go: drop the unconditional app.Use(pprof.New()).
  • cmd/arc/main.go: drop /debug/pprof from PublicPrefixes, wire startDebugPprofIfEnabled after the shutdown coordinator.
  • cmd/arc/debug_pprof.go (NEW): env-gated localhost listener on a private *http.ServeMux. Force-closes via srv.Close() (not srv.Shutdown(ctx)) so a long /debug/pprof/profile?seconds=N capture cannot starve the coordinator's shared 30s shutdown budget. Server-side timeouts (ReadHeaderTimeout=5s, WriteTimeout=10m, IdleTimeout=60s) bound slow-client attacks.

Auth middleware defence-in-depth (same shape as the deniedRoots gap gemini caught on PR #442):

  • Anchored prefix match: path == prefix || strings.HasPrefix(path, prefix + \"/\"). Sibling paths like /metricsX, /metrics-secret no longer slip through.
  • path.Clean normalisation before match: non-canonical shapes (/metrics//foo, /metrics/./x, /metrics/../sensitive) are normalised before the bypass branch checks them.
  • Empty-prefix guard: skips empty PublicPrefixes entries so a future config bug can't open the whole API.

With /debug/pprof removed from the public prefix list, the middleware changes are not currently reachable for any production route — they're guards for any prefix added in the future.

Operator-facing changes

  • Default behavior changed: curl http://arc:8000/debug/pprof/heap now returns 404. Existing deployments that relied on this for production debugging must set ARC_DEBUG_PPROF=1 and reach pprof on 127.0.0.1:6060.
  • /metrics unchanged: Prometheus scrapers continue to work as before.
  • Two env vars for the non-loopback case: ARC_DEBUG_PPROF_ADDR=0.0.0.0:6060 alone refuses to start; pair with ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1 for the explicit opt-in.

Test plan

  • go build ./... clean.
  • go test ./internal/auth/... ./internal/api/... ./cmd/arc/... — all green.
  • go test -count=1 -short ./... -timeout 300s — all packages green (pre-existing benchmarks/fts_experiment vet failure unaffected).
  • TestServer_PprofNotRegisteredOnPublicApp exercises the 12 pprof paths the gofiber middleware used to register and asserts each returns 404.
  • TestMiddleware_PublicPrefixes_AnchoredMatch covers 10 subtests: exact / trailing-slash / true subdir / deep subdir bypass + sibling-byte-prefix shapes (/metricsX, /metrics-secret, /metricsadmin) + parent-traversal escape shapes (/metrics/../sensitive, /metrics/sub/../../etc) + empty-prefix guard.
  • TestStartDebugPprofIfEnabled_NoopWhenDisabled — no listener when env unset.
  • TestStartDebugPprofIfEnabled_BindsAndServes — env-gated listener binds, serves pprof, 404s non-pprof paths (proving private mux not DefaultServeMux), force-closes cleanly.
  • TestIsTruthy + TestIsLoopbackBindAddr — env-var contract + loopback detection (incl. fail-closed on unresolvable hosts).

Internal review

Three rounds of multi-agent review per .claude/CLAUDE.md caught:

  • 2 HIGH: shutdown DoS chain via long pprof captures starving downstream hooks; un-normalised c.Path() admitting //, /./, /../ shapes into the bypass branch.
  • 6 MED: factually wrong DefaultServeMux comment, duplicate component logger field, listener-vs-Serve race window, missing WriteTimeout/IdleTimeout, non-loopback bind needing a second opt-in, empty-string PublicPrefixes guard.
  • 5 style/nits: builtin min vs reinvented minInt, duplicate /metrics append, dead srv.Addr field, stale 30-second http-server comment, unnecessary handler registrations in test.

All 13 addressed before opening this PR (single squashed commit on the branch).

🤖 Generated with Claude Code

…ner + anchor + normalise PublicPrefixes match

Closes audit finding #2 from the 2026-05-19 external report (Alex Manson /
@NeuroWinter). Pre-26.06.1 internal/api/server.go called app.Use(pprof.New())
unconditionally on the public Fiber app, and cmd/arc/main.go added
/debug/pprof to the auth middleware's PublicPrefixes list. Any
network-reachable caller — no token, no auth — could fetch heap dumps
(leaking in-flight SQL strings, msgpack records, cached *TokenInfo),
leak goroutine stacks, or pin a CPU core via
/debug/pprof/profile?seconds=N.

This PR removes pprof from the public Fiber app entirely. The endpoints
are now opt-in via ARC_DEBUG_PPROF=1 on a separate listener bound to
127.0.0.1:6060 by default. ARC_DEBUG_PPROF_ADDR overrides the bind; a
non-loopback override additionally requires
ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1 so a single typo cannot expose
pprof to the network. The listener uses a private *http.ServeMux (not
http.DefaultServeMux), shuts down via srv.Close() instead of
srv.Shutdown(ctx) so a long /debug/pprof/profile?seconds=N capture
cannot starve the coordinator's 30s shutdown budget, and bounds
slow-client attacks via ReadHeaderTimeout/WriteTimeout/IdleTimeout.

Bundled defense-in-depth fix to the auth middleware's PublicPrefixes
matcher (same shape as gemini-flagged deniedRoots gap on PR #442):

  - Anchored match: requires exact-equal or true-subdirectory
    (prefix + "/"). Sibling paths like /metricsX, /metrics-secret no
    longer slip through.
  - path.Clean normalisation before match: non-canonical request
    shapes like /metrics//foo, /metrics/./x, /metrics/../sensitive
    are normalised to their canonical form before the bypass branch
    checks them. Without normalisation a request for
    /metrics/../api/v1/query would lexically pass the anchored
    HasPrefix check; after normalisation it becomes /api/v1/query
    and correctly requires auth.
  - Empty-prefix guard: skips empty entries so a future config bug
    that lands "" in PublicPrefixes can't open the whole API.

With /debug/pprof removed from the public prefix list, the matcher
changes are not currently reachable for any production route — they're
guards for any prefix added in the future.

Tests added:

  - TestServer_PprofNotRegisteredOnPublicApp (12 pprof paths against
    the public Fiber app, all must 404).
  - TestMiddleware_PublicPrefixes_AnchoredMatch (10 subtests covering
    exact/trailing-slash/subdir bypass + sibling-byte-prefix shapes +
    parent-traversal escape shapes + empty-prefix guard).
  - TestStartDebugPprofIfEnabled_NoopWhenDisabled, _BindsAndServes,
    TestIsTruthy, TestIsLoopbackBindAddr.

Reported by Alex Manson (@NeuroWinter, https://neurowinter.com/).

Three rounds of internal multi-agent review caught: shutdown DoS chain
via long pprof captures starving downstream shutdown hooks (HIGH),
un-normalised c.Path() admitting double-slash / dot-segment / parent-
traversal into the bypass branch (HIGH), factually wrong DefaultServeMux
comment, duplicate component logger field, listener race window
between RegisterHook and goroutine schedule, missing
WriteTimeout/IdleTimeout, non-loopback bind needing two opt-ins,
empty-string PublicPrefixes guard, builtin min vs reinvented minInt,
duplicate /metrics append, stale 30-second http-server comment. All
addressed.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances security by moving Go pprof endpoints to an opt-in localhost listener and implementing path normalization and anchored prefix matching in the authentication middleware. Reviewer feedback identifies discrepancies in the release notes regarding log levels and test counts, and suggests simplifying the pprof listener by removing redundant handler registrations.

Comment thread RELEASE_NOTES_2026.06.1.md Outdated

- **Default behavior changed**: pprof is no longer reachable on Arc's API port (`:8000` by default). Existing deployments that relied on `curl http://arc:8000/debug/pprof/heap` will start getting `404 Not Found`. Set `ARC_DEBUG_PPROF=1` and reach pprof on `127.0.0.1:6060` instead.
- **`/metrics` is unchanged**: Prometheus scrapers continue to work as before. Only `/debug/pprof/*` moved.
- **Override-only-for-explicit-opt-in**: `ARC_DEBUG_PPROF_ADDR` accepts any bind string Go's `net.Listen` understands. Setting it to `0.0.0.0:6060` is the operator opting into cross-host pprof exposure — Arc logs a Warn at startup naming the bound address so the choice is visible in logs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The release note states that Arc logs a Warn for non-loopback pprof binds. However, the implementation in cmd/arc/debug_pprof.go upgrades the log level to Error for non-loopback addresses to increase visibility. Please update the release note to reflect that an Error is logged in this case for accuracy.

Comment thread RELEASE_NOTES_2026.06.1.md Outdated

With `/debug/pprof` removed from the public prefix list, items 1 and 2 are not currently reachable for any production route — the fixes are guards for any prefix added in the future.

Tests added: `TestServer_PprofNotRegisteredOnPublicApp` (12 pprof paths against the public Fiber app, all must 404), `TestMiddleware_PublicPrefixes_AnchoredMatch` (7 paths covering exact match, true subdirectories, and three sibling-shaped paths that must NOT bypass auth), plus the new `cmd/arc/debug_pprof_test.go` (no-op when disabled, binds-and-serves when enabled, isTruthy contract).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description of TestMiddleware_PublicPrefixes_AnchoredMatch mentions that it covers 7 paths. The test implementation is more comprehensive and includes 10 sub-tests, covering important cases like path traversal (/metrics/../sensitive) and the empty-prefix guard. To accurately reflect the test's thoroughness, consider updating this description.

Comment thread cmd/arc/debug_pprof.go
Comment on lines +101 to +104
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pprof.Index handler, registered on line 100 for the /debug/pprof/ path, is designed to handle all standard pprof sub-paths, including cmdline, profile, symbol, and trace. These explicit HandleFunc registrations are therefore redundant and can be removed to simplify the code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified empirically that pprof.Index does NOT cover cmdline/profile/symbol/trace. Index dispatches via runtime/pprof.Lookup, which contains only the runtime profiles (heap, goroutine, allocs, mutex, block, threadcreate). The four special endpoints are separate functions in net/http/pprof and require their own registrations.

Test with just pprof.Index registered (Gemini's suggestion):

/debug/pprof/cmdline       -> 404 body: Unknown profile
/debug/pprof/profile?...   -> 404 body: Unknown profile
/debug/pprof/symbol        -> 404 body: Unknown profile
/debug/pprof/trace?...     -> 404 body: Unknown profile
/debug/pprof/heap          -> 200
/debug/pprof/goroutine     -> 200

Source: net/http/pprof/pprof.goIndex extracts name after /debug/pprof/ and calls handler(name).ServeHTTP, which calls pprof.Lookup(name) and 404s on the four non-runtime-profile names.

Keeping the explicit registrations. Strengthened TestStartDebugPprofIfEnabled_BindsAndServes in commit fd781a0 to hit /debug/pprof/cmdline and /debug/pprof/symbol so a future contributor accepting the same suggestion would catch the regression in CI.

…against the gemini-r3 suggestion

Addresses Gemini Code Assist review on PR #443:

  - G1 (release notes:55): said pprof logs a "Warn" for non-loopback
    binds, but the code upgrades to Error AND requires the second
    opt-in env var ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1. Drift
    introduced by my own earlier fixup; release notes now match.

  - G2 (release notes:65): said the anchored-match test covers "7
    paths" but the actual test has 10 subtests (expanded during the
    review fixup). Updated.

  - G3 (debug_pprof.go:104): Gemini suggested removing the explicit
    HandleFunc registrations for /debug/pprof/{cmdline,profile,symbol,
    trace}, claiming pprof.Index covers them. Empirically refuted:
    pprof.Index dispatches via runtime/pprof.Lookup, which contains
    only the runtime profiles (heap, goroutine, allocs, etc.). The
    four special endpoints are separate functions in net/http/pprof
    and require their own registrations — without them, requests
    return 404 "Unknown profile".

    Strengthened TestStartDebugPprofIfEnabled_BindsAndServes to hit
    /debug/pprof/cmdline and /debug/pprof/symbol so a future
    contributor accepting the same suggestion blindly would catch
    the regression in CI. (profile and trace skipped because they
    actively capture data over seconds.) Reply posted on the Gemini
    comment thread with the empirical evidence.
@xe-nvdk
Copy link
Copy Markdown
Member Author

xe-nvdk commented May 20, 2026

@gemini-code-assist please review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances security by removing pprof endpoints from the public API and relocating them to an opt-in, localhost-bound listener. It also strengthens the authentication middleware by introducing path normalization and anchored prefix matching to mitigate risks from path traversal and prefix-based auth bypasses. Feedback was provided regarding a logic flaw in the anchored matching implementation where prefixes with trailing slashes would fail to match subdirectories, along with a suggested fix to handle these cases correctly.

Comment on lines +91 to 99
// Anchor the match: require exact path or true subdirectory
// (`prefix + "/"`). Without this, a configured prefix `/metrics`
// also matches `/metricsX`, `/metrics-secret`, etc. — any route
// happening to share the same byte prefix would silently bypass
// auth. Matches the same shape gemini-code-assist flagged on
// PR #442's deniedRoots check.
if reqPath == prefix || strings.HasPrefix(reqPath, prefix+"/") {
return c.Next()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for anchored prefix matching doesn't correctly handle public prefixes that have a trailing slash (e.g., /metrics/). This could lead to unexpected behavior where subpaths are not considered public.

For example, if PublicPrefixes contains /metrics/, a request to /metrics/prometheus would not be matched as a public path because the check becomes strings.HasPrefix("/metrics/prometheus", "/metrics//"), which is false.

To make the logic more robust and handle this case, I suggest differentiating between prefixes with and without a trailing slash.

			// Anchor the match: require exact path or true subdirectory.
			// This logic handles prefixes with and without trailing slashes
			// to prevent unintended auth bypasses for sibling paths.
			if strings.HasSuffix(prefix, "/") {
				if strings.HasPrefix(reqPath, prefix) {
					return c.Next()
				}
			} else {
				if reqPath == prefix || strings.HasPrefix(reqPath, prefix+"/") {
					return c.Next()
				}
			}

…ore anchored match

Addresses Gemini Code Assist 2nd review on PR #443.

A configured prefix with a trailing slash (e.g. `PublicPrefixes = ["/metrics/"]`)
silently broke the anchored match because `prefix + "/"` became `/metrics//`,
which matches no real request path. Operator who reads "this is a directory
prefix" and adds the slash for clarity would silently lose bypass on every
subpath under `/metrics/`.

Today this is unreachable: neither DefaultMiddlewareConfig nor cmd/arc/main.go
uses a trailing-slash entry. The fix is a footgun guard for any future
config that does.

Fix: `prefix = strings.TrimSuffix(prefix, "/")` inside the loop, before the
existing anchored check. Symmetric with the request-path normalisation
(`path.Clean`) above; both sides are now canonical before equality and
HasPrefix run.

Deliberately NOT path.Clean on the prefix because it's operator-supplied
static config (not attacker-controlled) and Clean would also collapse a
configured `//foo` to `/foo` — masking the misconfiguration instead of
treating it as written.

Empty-prefix guard moved after the TrimSuffix so a configured `/` (which
the trim just emptied) is also caught.

Tests added:
- TestMiddleware_PublicPrefixes_TrailingSlashNormalisation runs the same
  4-case matrix against both `/metrics` and `/metrics/` configured
  prefixes; both must produce byte-identical bypass/require-auth results.
@xe-nvdk
Copy link
Copy Markdown
Member Author

xe-nvdk commented May 20, 2026

@gemini-code-assist please review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the security posture of the application by removing Go pprof endpoints from the public API port and replacing them with an opt-in, separate listener that defaults to localhost. Additionally, it implements a defense-in-depth fix for the authentication middleware's public prefix matching logic, which now utilizes anchored matching and path normalization to prevent bypasses via path traversal or sibling-prefix attacks. Comprehensive regression tests and documentation updates are included for both changes. I have no feedback to provide as there were no review comments.

@xe-nvdk xe-nvdk merged commit 32a4091 into main May 20, 2026
6 checks passed
@xe-nvdk xe-nvdk deleted the fix/pprof-private-opt-in branch May 20, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant