Skip to content

fix(bugbash 2026-05-21): API P0/P1 wave-1 — stacks env redeploy, presign middleware, growth plan-ID#123

Merged
mastermanas805 merged 3 commits into
masterfrom
fix/bugbash-2026-05-21-p0p1
May 21, 2026
Merged

fix(bugbash 2026-05-21): API P0/P1 wave-1 — stacks env redeploy, presign middleware, growth plan-ID#123
mastermanas805 merged 3 commits into
masterfrom
fix/bugbash-2026-05-21-p0p1

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

Wave-1 of P0/P1 fixes from BugBash 2026-05-21 (/tmp/bb-2026-05-21/INDEX.md).

Commits:

  1. B17-P0 [P0]: Storage /storage/:token/presign hardened with per-token rate-limit middleware + path-traversal hard-reject + operation allow-list + audit emit (cherry-picked from fix/b17-p0-presign-middleware-2026-05-20).
  2. bugbash wave-1 [P0+P1]: Stacks redeploy/promote now merge persisted env_vars from migration 062 (closes A08 F1 — was silent data-loss). Adds Growth Razorpay plan-ID env mapping (D28 F3). Router wires presign middleware.

Local gate

All green against test postgres/redis/mongo Docker stack:

  • build OK · vet OK
  • go test ./... -short -p 1 PASS

P0/P1 remaining (deferred to wave-2)

  • Resource list filter (B11 F1)
  • GetCredentials AES fail-open (B11 F5)
  • Last-owner protection in UpdateMemberRole (B12 F2)
  • SSE follow flag for expired/deleted (B13 F1)
  • Redeploy goroutine 10m→12m (B13 F4)

Test plan

  • Local gate green
  • Reviewer: confirm stack env_vars survive redeploy via PATCH then POST :slug/redeploy
  • Reviewer: confirm presign rejects path-traversal probes
  • After merge: curl https://api.instanode.dev/healthz | jq .commit_id matches new HEAD

🤖 Generated with Claude Code

claude and others added 3 commits May 21, 2026 18:37
…audit

Pre-fix the route had ZERO middleware — a leaked storage token UUID granted
full read/write on the prefix with no per-token rate limit, no audit trail,
no cross-team session guard, no operation allow-list, and silent path-
traversal stripping that hid exploit probes.

Changes:

- New middleware: PresignTokenRateLimit (internal/middleware/
  presign_token_rate_limit.go). Per-:token sliding window, 10/min/token,
  Redis ZSET. Complements the existing global per-IP RateLimit so a
  leaked token used from a botnet of distinct IPs still throttles.
  Fail-open on Redis errors (CLAUDE.md convention 1) with FailOpenEvents
  metric for the NR alert.

- Hardened handler (internal/handlers/storage_presign.go):
    * Operation allow-list (GET/PUT/HEAD only) — DELETE/POST/PATCH/unknown
      reject 400 invalid_operation. HEAD is signed as a presigned GET
      (S3 V4 signatures cover both verbs on the same key).
    * Path-traversal HARD-REJECT (was: silent strip). Any '..', '.',
      leading-slash, or empty segment returns 400 path_unsafe so exploit
      probes surface in NR logs instead of blending into normal traffic.
    * Session/team cross-check. When OptionalAuth populated a session JWT,
      its team_id must match the resource's team_id — blocks the
      'leaked token + legit but different-team session' impersonation
      path. Anonymous resources / anonymous callers pass through (the
      token alone is the boundary), matching the /webhook/receive
      posture.
    * Audit-log emit on every successful presign via safego.Go (fire-and-
      forget). Kind: storage.presign_minted. Metadata: masked token,
      operation, masked path, team_id, expires_at — useful to SOC
      investigators without re-leaking the bearer or full object key.
    * TTL cap WARN-logged on over-cap requests (caller asking 24h gets
      capped to 1h with operator-visible log signal).

- Route wiring (internal/router/router.go + internal/testhelpers/
  testhelpers.go): chain is now
  OptionalAuth -> PresignTokenRateLimit -> Idempotency -> handler.
  Testhelpers mirror production so handler tests see the same chain.

- Tests:
    * TestPresign_RegistryHasMiddleware — registry-style coverage test
      per CLAUDE.md rule 18. Walks router.go source text and asserts the
      literal wiring shape. Fails red if a future agent strips middleware.
    * TestPresign_TestHelpersMirrorMiddleware — anti-drift between
      production and the test app.
    * TestPresign_OperationAllowlist_TableDriven — table-driven over
      GET/PUT/HEAD/DELETE/POST/PATCH/empty/unknown.
    * TestPresign_PathTraversal_Rejected — table-driven over '..', '/',
      './', '//' inputs.
    * TestPresign_CrossTeamSession_Rejected — real DB-backed: team A
      resource + team B JWT -> 403 cross_team_session.
    * TestPresign_PerTokenRateLimit_Fires — 11th request gets 429.
    * TestPresign_RateLimit_RetryAfterHeader — envelope shape +
      Retry-After header.
    * TestPresign_TTLCap_Bounded — 24h request capped at 1h.
    * TestPresign_HandlerEnforcesValidation — source-text invariants on
      the handler (allow-list map, audit emit, no DELETE case).
    * TestIsSafePresignKey — pure-Go unit test for the boolean gate.

- Live verification: handler enforces:
    * 11 GETs in a row -> 11th returns 429 with Retry-After header.
    * path='../../etc' -> 400 path_unsafe.
    * DELETE operation -> 400 invalid_operation.
    * team B JWT against team A resource -> 403 cross_team_session.

Coverage block (CLAUDE.md rule 17):
  Symptom:        /storage/:token/presign had zero middleware.
  Enumeration:    rg -F '/storage/:token/presign' .
  Sites found:    2 (router.go production wiring + testhelpers mirror)
  Sites touched:  2 (both wrap full chain identically)
  Coverage test:  TestPresign_RegistryHasMiddleware (walks router.go
                  source; fails if any of OptionalAuth / Rate-limit /
                  Idempotency is removed in future).
  Live verified:  to be appended on prod curl after CI deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ars + presign router wiring + Growth Razorpay plan-ID + config plumbing

Covers (from /tmp/bb-2026-05-21/INDEX.md):
- A08 F1 [P0]: stack.go redeploy merges stacks.env_vars from migration 062
- H46 F1 [P0]: router wires B17-P0 presign middleware (cherry-picked in prior commit)
- D28 F3 [P1]: billing.go planIDToTier adds Growth tier mapping
- config.go: RAZORPAY_PLAN_ID_GROWTH env plumbing

Remaining P0/P1 (per INDEX): resource list filter (B11 F1), GetCredentials fail-open (B11 F5), last-owner guard (B12 F2), SSE follow flag (B13 F1), redeploy 12m timeout (B13 F4). To land in a follow-up wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uses OptionalAuth

CI caught what local cache missed:
- TestErrorCode_HasAgentAction: register path_unsafe / cross_team_session / env_load_failed
- TestPresign_RegistryHasMiddleware: router uses OptionalAuth(cfg) not OptionalAuthStrict
  (the test pins the literal — OptionalAuth populates team_id for the cross-team
   check; Strict variant is forward-incompatible with this pin).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit bfea528 into master May 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants