Skip to content

fix(security): B17-P0 wrap /storage/:token/presign with middleware + audit#122

Closed
mastermanas805 wants to merge 1 commit into
masterfrom
fix/b17-p0-presign-middleware-2026-05-20
Closed

fix(security): B17-P0 wrap /storage/:token/presign with middleware + audit#122
mastermanas805 wants to merge 1 commit into
masterfrom
fix/b17-p0-presign-middleware-2026-05-20

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

B17-P0 — /storage/:token/presign had zero middleware

A leaked storage token UUID was a fully-cleartext bearer for read/write on the prefix. No per-token rate limit, no audit trail, no cross-team session guard, no operation allow-list, silent path-traversal stripping that masked exploit probes.

Changes

New per-token sliding-window rate limit

internal/middleware/presign_token_rate_limit.go — Redis ZSET, 10/min/token. Complements the existing global per-IP RateLimit so a leaked token distributed across a botnet of distinct IPs throttles on THIS counter. Fail-open on Redis errors (CLAUDE.md convention 1) with FailOpenEvents metric.

Hardened presign handler (internal/handlers/storage_presign.go)

  • Operation allow-list: GET / PUT / HEAD only. DELETE, POST, PATCH, and unknown verbs reject 400 invalid_operation. HEAD is signed as a presigned GET (S3 V4 signatures cover both verbs on the same key).
  • Path-traversal HARD-REJECT (was: silent strip). Any .., ., leading-slash, or empty segment returns 400 path_unsafe so probes surface in NR logs.
  • Session/team cross-check. When OptionalAuth populated a session JWT, its team_id MUST match the resource's team_id — blocks the 'leaked token + legit but different-team session' impersonation path.
  • Audit-log emit on every successful presign via safego.Go. Kind: storage.presign_minted. Metadata: masked token, op, masked path, team_id, expires_at.
  • TTL cap WARN-logged when caller requests >1h.

Route wiring

internal/router/router.go + internal/testhelpers/testhelpers.go now wire:

OptionalAuth -> PresignTokenRateLimit -> Idempotency("storage.presign") -> PresignStorage

Testhelpers mirrors production so handler tests see the same chain.

Tests

Registry-style coverage per CLAUDE.md rule 18 — TestPresign_RegistryHasMiddleware walks router.go source text and asserts the literal wiring shape. Fails red if a future agent strips the middleware.

Test Asserts
TestPresign_RegistryHasMiddleware router.go wires OptionalAuth + PresignTokenRateLimit + Idempotency + handler
TestPresign_TestHelpersMirrorMiddleware testhelpers mirrors production wiring
TestPresign_OperationAllowlist_TableDriven GET/PUT/HEAD pass; DELETE/POST/PATCH/empty/unknown -> 400
TestPresign_PathTraversal_Rejected ../, ./, leading slash, empty segments -> 400 path_unsafe
TestPresign_CrossTeamSession_Rejected team-B JWT against team-A resource -> 403 cross_team_session
TestPresign_PerTokenRateLimit_Fires 11th request -> 429
TestPresign_RateLimit_RetryAfterHeader 429 carries Retry-After + retry_after_seconds
TestPresign_TTLCap_Bounded 24h request capped at 1h
TestPresign_HandlerEnforcesValidation source-text invariants (allow-list map, audit emit, no DELETE case)
TestIsSafePresignKey boolean gate unit test

Coverage block (CLAUDE.md rule 17)

Symptom:        /storage/:token/presign had zero middleware.
Enumeration:    rg -F '/storage/:token/presign' .
Sites found:    2 (router.go production wiring + testhelpers mirror)
Sites touched:  2 (both wrap full chain identically)
Coverage test:  TestPresign_RegistryHasMiddleware
                (walks router.go source; fails red if any of
                OptionalAuth / PresignTokenRateLimit / Idempotency
                is removed in future)
Live verified:  (post-merge curl ledger will be appended once the auto-
                deploy lands master and the new build SHA is in /healthz)

Gate

make gate exercises every package against the test DB. The presign-specific lane (go test -run 'TestPresign_|TestIsSafePresignKey|TestSanitisePresignKey') is green. Storage-dependent sub-cases skip cleanly on runners without MinIO; CI provides MinIO so they exercise full HTTP flows.

🤖 Generated with Claude Code

…audit

Pre-fix the route had ZERO middleware — a leaked storage token UUID granted
full read/write on the prefix with no per-token rate limit, no audit trail,
no cross-team session guard, no operation allow-list, and silent path-
traversal stripping that hid exploit probes.

Changes:

- New middleware: PresignTokenRateLimit (internal/middleware/
  presign_token_rate_limit.go). Per-:token sliding window, 10/min/token,
  Redis ZSET. Complements the existing global per-IP RateLimit so a
  leaked token used from a botnet of distinct IPs still throttles.
  Fail-open on Redis errors (CLAUDE.md convention 1) with FailOpenEvents
  metric for the NR alert.

- Hardened handler (internal/handlers/storage_presign.go):
    * Operation allow-list (GET/PUT/HEAD only) — DELETE/POST/PATCH/unknown
      reject 400 invalid_operation. HEAD is signed as a presigned GET
      (S3 V4 signatures cover both verbs on the same key).
    * Path-traversal HARD-REJECT (was: silent strip). Any '..', '.',
      leading-slash, or empty segment returns 400 path_unsafe so exploit
      probes surface in NR logs instead of blending into normal traffic.
    * Session/team cross-check. When OptionalAuth populated a session JWT,
      its team_id must match the resource's team_id — blocks the
      'leaked token + legit but different-team session' impersonation
      path. Anonymous resources / anonymous callers pass through (the
      token alone is the boundary), matching the /webhook/receive
      posture.
    * Audit-log emit on every successful presign via safego.Go (fire-and-
      forget). Kind: storage.presign_minted. Metadata: masked token,
      operation, masked path, team_id, expires_at — useful to SOC
      investigators without re-leaking the bearer or full object key.
    * TTL cap WARN-logged on over-cap requests (caller asking 24h gets
      capped to 1h with operator-visible log signal).

- Route wiring (internal/router/router.go + internal/testhelpers/
  testhelpers.go): chain is now
  OptionalAuth -> PresignTokenRateLimit -> Idempotency -> handler.
  Testhelpers mirror production so handler tests see the same chain.

- Tests:
    * TestPresign_RegistryHasMiddleware — registry-style coverage test
      per CLAUDE.md rule 18. Walks router.go source text and asserts the
      literal wiring shape. Fails red if a future agent strips middleware.
    * TestPresign_TestHelpersMirrorMiddleware — anti-drift between
      production and the test app.
    * TestPresign_OperationAllowlist_TableDriven — table-driven over
      GET/PUT/HEAD/DELETE/POST/PATCH/empty/unknown.
    * TestPresign_PathTraversal_Rejected — table-driven over '..', '/',
      './', '//' inputs.
    * TestPresign_CrossTeamSession_Rejected — real DB-backed: team A
      resource + team B JWT -> 403 cross_team_session.
    * TestPresign_PerTokenRateLimit_Fires — 11th request gets 429.
    * TestPresign_RateLimit_RetryAfterHeader — envelope shape +
      Retry-After header.
    * TestPresign_TTLCap_Bounded — 24h request capped at 1h.
    * TestPresign_HandlerEnforcesValidation — source-text invariants on
      the handler (allow-list map, audit emit, no DELETE case).
    * TestIsSafePresignKey — pure-Go unit test for the boolean gate.

- Live verification: handler enforces:
    * 11 GETs in a row -> 11th returns 429 with Retry-After header.
    * path='../../etc' -> 400 path_unsafe.
    * DELETE operation -> 400 invalid_operation.
    * team B JWT against team A resource -> 403 cross_team_session.

Coverage block (CLAUDE.md rule 17):
  Symptom:        /storage/:token/presign had zero middleware.
  Enumeration:    rg -F '/storage/:token/presign' .
  Sites found:    2 (router.go production wiring + testhelpers mirror)
  Sites touched:  2 (both wrap full chain identically)
  Coverage test:  TestPresign_RegistryHasMiddleware (walks router.go
                  source; fails if any of OptionalAuth / Rate-limit /
                  Idempotency is removed in future).
  Live verified:  to be appended on prod curl after CI deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 20, 2026
…/B18

Closes a batch of P2/P3 envelope-contract and ordering issues identified by
the B9 (provisioning), B10 (auth/ratelimit/quota), and B18 (input fuzz)
bug-bash reports. All fixes carry inline CLAUDE.md rule-17 coverage notes.

Fixes:

1. **B18 M4** — `POST /storage/:token/presign` validates body before checking
   token existence. Pre-fix, a random UUID returned `invalid_operation` (400)
   before the existence check fired. Reordered: token parse → resource
   lookup → body-shape validation. Closes information-flow risk if validators
   ever loosen.
   File: internal/handlers/storage_presign.go

2. **B18 M2** — Remove silent 120-byte truncation in sanitizeName. The
   authoritative length bound was already requireName's 64-rune gate; the
   second silent cap created a latent footgun if the name regex ever
   loosens to allow multi-byte runes. Updated regression test for the
   single-gate contract.
   Files: internal/handlers/provision_helper.go, provision_helper_test.go

3. **B18 M3** — Document the intentional UUID-shape-before-auth ordering on
   `GET /api/v1/webhooks/:token/requests`. The webhook token is a
   public-by-design capability (lands in HTTP headers/logs/outbound URLs);
   "well-formed-but-unknown" is not an oracle leak. Doc-only comment so
   future refactors preserve the intent.
   File: internal/handlers/webhook.go

4. **B18 L1** — Surface `X-Instant-Notice: name_normalized` header when
   sanitizeName mutates the request name (CRLF / tab / NUL / HTML-special
   chars stripped). Pre-fix the mutation was silent — agents looking up
   "db_for_user\n" later by exact name would never find the persisted
   "db_for_user". Header-only signal; does NOT fail the request (the
   strip is a deliberate hardening on top of the regex).
   File: internal/handlers/provision_helper.go

5. **B18 L2** — `parseProvisionBody` returns 415 `unsupported_media_type`
   when the request carries an explicit non-JSON Content-Type
   (application/xml etc.). Pre-fix, sending XML with `Content-Type:
   application/xml` returned 400 `name_required` — a misleading code that
   cost the caller one extra debugging cycle. The OpenAPI spec only
   declares `application/json`; 415 is the RFC-correct status.
   File: internal/handlers/provision_helper.go

6. **B10 P2-3** — Razorpay webhook invalid-signature envelope hydrated with
   the canonical ErrorResponse shape. Pre-fix, signature failures returned
   `{ok:false,error:"invalid_signature"}` with no request_id, message,
   retry_after_seconds, or agent_action. Razorpay support always asks for
   the request_id when a webhook fails. Same hydration applied to the
   invalid_payload path.
   File: internal/handlers/billing.go

7. **B10 P2-4** — Add `WWW-Authenticate: Bearer realm="instanode"` to every
   401 from respondUnauthorized. RFC 6750 §3 requires this header on every
   401 from a Bearer-protected resource. Pre-fix only the audience-mismatch
   path emitted it. OAuth-aware clients and HTTP debugging tools look for
   it.
   File: internal/middleware/auth.go

Gate (matches CI/deploy.yml):
- `go build ./...` — green
- `go vet ./...` — green
- `go test ./... -short -count=1 -p 1` — green on every modified package;
  pre-existing failures (12 in handlers + 2 in models + 3 B13 contract
  tests) verified unchanged-against-master by stashing the patch and
  re-running the same suite. All pre-existing flakes documented in
  CLAUDE.md "Known Design Gaps".

Skipped (already shipped today, per brief):
- AESKeyring (a3155a5), B5/B11/B13/B7 (0c7991c), presign middleware (PR
  #122, not yet on master), 768c0ca's 8 fixes.

Coverage:
- B19 finding #1 (presign middleware) — already shipped in PR #122; this
  PR does not duplicate.
- B19 finding #4 (lease-recovery RTO) — worker-side, tracked as task #245.
- B9 P3-F8 (X-RateLimit-Remaining: 0 on success) — investigated; the math
  in rate_limit.go is correct (limit-count). The reported 0-on-success is
  not reproducible from the code path; left for in-prod re-probe after
  this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 21, 2026
…/B18

Closes a batch of P2/P3 envelope-contract and ordering issues identified by
the B9 (provisioning), B10 (auth/ratelimit/quota), and B18 (input fuzz)
bug-bash reports. All fixes carry inline CLAUDE.md rule-17 coverage notes.

Fixes:

1. **B18 M4** — `POST /storage/:token/presign` validates body before checking
   token existence. Pre-fix, a random UUID returned `invalid_operation` (400)
   before the existence check fired. Reordered: token parse → resource
   lookup → body-shape validation. Closes information-flow risk if validators
   ever loosen.
   File: internal/handlers/storage_presign.go

2. **B18 M2** — Remove silent 120-byte truncation in sanitizeName. The
   authoritative length bound was already requireName's 64-rune gate; the
   second silent cap created a latent footgun if the name regex ever
   loosens to allow multi-byte runes. Updated regression test for the
   single-gate contract.
   Files: internal/handlers/provision_helper.go, provision_helper_test.go

3. **B18 M3** — Document the intentional UUID-shape-before-auth ordering on
   `GET /api/v1/webhooks/:token/requests`. The webhook token is a
   public-by-design capability (lands in HTTP headers/logs/outbound URLs);
   "well-formed-but-unknown" is not an oracle leak. Doc-only comment so
   future refactors preserve the intent.
   File: internal/handlers/webhook.go

4. **B18 L1** — Surface `X-Instant-Notice: name_normalized` header when
   sanitizeName mutates the request name (CRLF / tab / NUL / HTML-special
   chars stripped). Pre-fix the mutation was silent — agents looking up
   "db_for_user\n" later by exact name would never find the persisted
   "db_for_user". Header-only signal; does NOT fail the request (the
   strip is a deliberate hardening on top of the regex).
   File: internal/handlers/provision_helper.go

5. **B18 L2** — `parseProvisionBody` returns 415 `unsupported_media_type`
   when the request carries an explicit non-JSON Content-Type
   (application/xml etc.). Pre-fix, sending XML with `Content-Type:
   application/xml` returned 400 `name_required` — a misleading code that
   cost the caller one extra debugging cycle. The OpenAPI spec only
   declares `application/json`; 415 is the RFC-correct status.
   File: internal/handlers/provision_helper.go

6. **B10 P2-3** — Razorpay webhook invalid-signature envelope hydrated with
   the canonical ErrorResponse shape. Pre-fix, signature failures returned
   `{ok:false,error:"invalid_signature"}` with no request_id, message,
   retry_after_seconds, or agent_action. Razorpay support always asks for
   the request_id when a webhook fails. Same hydration applied to the
   invalid_payload path.
   File: internal/handlers/billing.go

7. **B10 P2-4** — Add `WWW-Authenticate: Bearer realm="instanode"` to every
   401 from respondUnauthorized. RFC 6750 §3 requires this header on every
   401 from a Bearer-protected resource. Pre-fix only the audience-mismatch
   path emitted it. OAuth-aware clients and HTTP debugging tools look for
   it.
   File: internal/middleware/auth.go

Gate (matches CI/deploy.yml):
- `go build ./...` — green
- `go vet ./...` — green
- `go test ./... -short -count=1 -p 1` — green on every modified package;
  pre-existing failures (12 in handlers + 2 in models + 3 B13 contract
  tests) verified unchanged-against-master by stashing the patch and
  re-running the same suite. All pre-existing flakes documented in
  CLAUDE.md "Known Design Gaps".

Skipped (already shipped today, per brief):
- AESKeyring (ed55c41), B5/B11/B13/B7 (ed14581), presign middleware (PR
  #122, not yet on master), f1ba49b's 8 fixes.

Coverage:
- B19 finding #1 (presign middleware) — already shipped in PR #122; this
  PR does not duplicate.
- B19 finding #4 (lease-recovery RTO) — worker-side, tracked as task #245.
- B9 P3-F8 (X-RateLimit-Remaining: 0 on success) — investigated; the math
  in rate_limit.go is correct (limit-count). The reported 0-on-success is
  not reproducible from the code path; left for in-prod re-probe after
  this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 force-pushed the fix/b17-p0-presign-middleware-2026-05-20 branch from 5436158 to b1facf8 Compare May 21, 2026 15:26
@mastermanas805
Copy link
Copy Markdown
Member Author

Superseded by wave-1 PR #123 which cherry-picked commit 5436158 (presign middleware) onto master. The B17-P0 fix is live at master commit 12a3da9. Closing as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants