Skip to content

feat(api): add GET /api/v1/whoami for agent identity probes#6

Merged
mastermanas805 merged 1 commit into
masterfrom
feat/whoami-and-diagnostics
May 11, 2026
Merged

feat(api): add GET /api/v1/whoami for agent identity probes#6
mastermanas805 merged 1 commit into
masterfrom
feat/whoami-and-diagnostics

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

`GET /api/v1/whoami` is the canonical "am I authenticated?" probe for agents:

  • Returns 401 (via the `/api/v1` RequireAuth middleware) on missing/expired/invalid tokens.
  • Returns 200 + `{ok, user_id, team_id, team_name?, plan_tier?}` when the token works.

`plan_tier` is best-effort enriched from the teams table so agents avoid a second hop to `/billing` — DB lookup failures drop the field silently rather than failing the whole call.

Replaces the previous pattern of agents probing arbitrary paths like `/api/v1/team` and getting 404 instead of 401, which led to wasted token-mint retry cycles.

Test plan

  • `TestOpenAPI_WhoamiPathExists` — guards the path is documented in `/openapi.json`
  • `TestOpenAPI_WhoamiResponseSchema` — guards the field shape stays stable
  • `go build ./...` passes
  • Integration test with real auth — follow-up

🤖 Generated with Claude Code

Friction-tested 2026-05-11: agents that wanted to verify their bearer
token worked were reaching for arbitrary endpoints like /api/v1/team and
getting 404 instead of 401, which led to wasted token-mint retry cycles.

GET /api/v1/whoami is the canonical "am I authenticated?" probe:

  - Returns 401 (via the /api/v1 RequireAuth middleware) when the token
    is missing/expired/invalid.
  - Returns 200 + { ok, user_id, team_id, team_name?, plan_tier? } when
    the token works.

plan_tier is best-effort enriched from the teams table so agents don't
need a second hop to /billing — failures here drop the field silently
rather than failing the whole call.

OpenAPI:
  - New /api/v1/whoami path with bearerAuth security + 401 in responses
  - New WhoamiResponse schema with field docs

Tests:
  - TestOpenAPI_WhoamiPathExists guards the path is in openapi.json so
    an agent reading the spec can discover it.
  - TestOpenAPI_WhoamiResponseSchema guards the field shape (ok,
    user_id, team_id, plan_tier).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit d23a352 into master May 11, 2026
@mastermanas805 mastermanas805 deleted the feat/whoami-and-diagnostics branch May 11, 2026 08:23
mastermanas805 added a commit that referenced this pull request May 11, 2026
#10)

Earlier PRs (#4, #6, #9) shipped OpenAPI schema tests but lacked
handler-level behavior tests. Code reviewers flagged the gap — a
schema test catches "field documented" regressions but misses "field
actually emitted" or "input actually parsed" regressions.

This PR backfills behavior tests for four shipped behaviors:

  1. upgradeNote / limitExceededNote copy (PR #9, friction #13)
     TestUpgradeNote_DoesNotMentionTrial         — 2 sub-cases
     TestLimitExceededNote_DoesNotMentionTrial   — 4 sub-cases
     Guards: no "14-day trial" framing, contains "Claim to keep" +
     "$9/mo", no instant.dev/start leakage.

  2. POST /api/v1/whoami (PR #6, friction #9)
     TestWhoami_NoTokenReturns401          — 401 on missing bearer
     TestWhoami_ReturnsIdentityForAuthedRequest
       — 200 with uid/tid claims; plan_tier enrichment when DB hit
     Test app now wires /api/v1/whoami so this and future tests can
     hit it through the full RequireAuth middleware.

  3. POST /deploy/new env_vars JSON parsing (PR #4, friction #11)
     TestDeployNew_EnvVarsJSON_Parsed_Into_InitEnv
       — valid JSON merges into deployment.EnvVars; underscore-prefixed
         keys silently stripped (_secret never leaks)
     TestDeployNew_EnvVarsInvalidJSON_Returns400
       — malformed JSON returns 400 error="invalid_env_vars"
         (not a generic 500)
     Includes a multipartDeployBody helper that other deploy tests
     can reuse without colliding with stack_test.go's name.

  4. upgrade_jwt in provisioning responses (PR #9, friction #16)
     TestAnonymousProvisionEmitsUpgradeJWT_OnDedup
       — dedup response includes raw upgrade_jwt JWT (no parsing)
         alongside the legacy upgrade URL; the two presentations of
         the same token must not drift.
     Skips cleanly when local test DB schema lags (env column).

All 10 new test cases pass against postgres:16-alpine + redis:7-alpine.
Total run time <1s.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 11, 2026
… compat (#13)

This PR surfaced from a "test everything" sweep across api/worker/
provisioner/common. Three real production bugs and three test-infra
gaps came out together; they share enough code that splitting them
would force ordering dependencies between PRs.

## 1. respondError used to silently bypass multi-return validators

respondError returned c.Status(status).JSON()'s result — nil on every
successful body write. Helper functions like resolveEnv and
requireTeamMatch composed it as:

    return zeroValue, respondError(c, 400, "invalid_x", "...")

Their callers checked `if err != nil`, which was false on the happy
path of respondError (response written successfully). The handler
then continued PAST the validation gate with zeroValue, producing:

  - silent acceptance of invalid env names (response said
    env="production" while the URL said env=Prod)
  - 500 instead of 403 when an actor's JWT team didn't match the path
    :team_id (handler proceeded with uuid.Nil → DB error)

Fix:
  - respondError now returns ErrResponseWritten (non-nil sentinel)
    regardless of the JSON write result.
  - The custom ErrorHandlers in router.go, testhelpers.go, and the
    three per-test fiber.App builders (stack_test, billing_test,
    vault_test, teams_test) detect this sentinel and return nil so
    the original 400/403/etc. body is preserved.

The visible effect in prod (verified live on v2.2.0):

    $ curl -X POST 'https://api.instanode.dev/db/new?env=Prod' -d '{}'
    Before: 201 with env="production"  (invalid input silently coerced)
    After:  400 {"error":"invalid_env", ...}

## 2. Vault tier-check order was misleading

Hobby-tier PUT to /api/v1/vault/staging/X when already at the 20-entry
quota used to return 402 vault_quota_exceeded. The agent reading that
might add seats or upgrade for capacity — but the real block was the
env allowlist (staging isn't a hobby env). Now env check fires first
(403 vault_env_not_allowed) so the agent learns what would actually
help.

## 3. Test-infra fixes uncovered by the sweep

  - vault_test space-encoding: Go 1.26 httptest.NewRequest panics
    on unescaped spaces in URL paths. Pre-encode to %20.
  - stack_test domain assertion: still checked "instant.dev/start" —
    updated to "instanode.dev/start".
  - deploy_env_vars_test body double-read: readBody(t, resp) was
    being evaluated unconditionally in a require.NotEqual message
    arg, consuming the body before the success-path Decode could
    read it. Read once into a buffer, reuse.
  - dashboardsvc/server_test resourceSelectColumns added the env
    column (migration 009) — sqlmock was emitting 18 columns into a
    model that scans 19, surfacing as "sql: expected 18 destination
    arguments in Scan, not 19".
  - testhelpers wires /api/v1/whoami (the route added in PR #6 was
    in production but missing from the integration test app).

## Sweep results

Before this PR: 12 test failures across api/internal/handlers,
internal/dashboardsvc, internal/middleware.
After: zero failures. Full sweep across api, worker, provisioner,
common all green.

Verified live: v2.2.0-validation-bugs deployed; sample env=Prod
returns 400 invalid_env as expected; valid env=production still
succeeds with 201.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 14, 2026
#6) (#110)

The Idempotency-Key middleware silently skipped caching for any 4xx
response produced through respondError* — the production error path
that returns the ErrResponseWritten sentinel after committing the
body to the wire. The middleware's `if err := c.Next(); err != nil
{ return err }` clause treated the sentinel as an abort and bypassed
the cache write.

User-visible impact: agent hits /deploy/new over its tier cap, server
returns 402, agent retries with the same Idempotency-Key, server runs
the handler fresh — re-hitting Razorpay/billing side effects on every
retry. The whole point of idempotency middleware (one logical request
→ one side effect) was broken for error-path responses.

The existing TestIdempotency_4xxIsCached passed for the wrong reason:
the test fixture used c.Status().JSON() (returns nil) instead of the
real handler error path handlers.WriteFiberError -> respondError
(returns ErrResponseWritten). The middleware's bail clause skipped
the cache write in production but not in test — a false-positive.

Fix (≤30 LOC in middleware): special-case the ErrResponseWritten
sentinel and fall through to the cache logic. Real errors (DB down,
panic-recovered, fiber.NewError from upstream) still bypass caching.
Use a registered-callback pattern (handlers.init() wires the check)
to avoid a handlers->middleware->handlers import cycle.

Tests: 13 idempotency tests now pass (up from 9). New coverage:
  - TestIdempotency_4xxIsCached rewritten to use handlers.WriteFiberError
    so it exercises the real respondError path (FAILS before fix).
  - TestIdempotency_RealHandlerErrorPathCaches reproduces the exact
    BB2-D5 scenario: handler returns 402 via respondError, agent retry
    with same Idempotency-Key must replay (handler hits stay at 1).
  - TestIdempotency_5xxFromRespondError_NotCached guards against
    over-correction — 5xx via respondError still bypasses caching so
    retries can complete the work once upstream recovers.
  - TestIdempotency_NonSentinelErrorNotCached pins that plumbing
    errors (errors.New, fiber.NewError) keep bypassing caching.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 19, 2026
Load-test finding F2 (LOAD-CHAOS-REPORT-2026-05-19): a 30-way simultaneous
burst from one fingerprint minted 22-29 anonymous tokens instead of capping
at 5.

Root cause — TOCTOU fall-through in the over-cap branch. checkProvisionLimit's
Redis INCR is atomic and correctly hands every burst caller a distinct slot,
so callers 6..N all see limitExceeded == true. The over-cap branch then looks
up an existing resource to dedup against. But during a *simultaneous* burst
the <=5 winning provisions have claimed their INCR slot yet not committed a
`resources` row — so both GetActiveResourceByFingerprintType and the
cross-service GetActiveResourceByFingerprint return ErrResourceNotFound, and
control FELL THROUGH the limitExceeded block to CreateResource. Every burst
caller minted a fresh token.

Fix — shared helper denyProvisionOverCap in provision_helper.go: an over-cap
caller that finds no existing resource is genuinely over the cap (its atomic
slot number proved it); the missing row only means the winners are in-flight.
Such a caller is now hard-denied with 429, never allowed to fall through to a
fresh provision. The atomic INCR (slot claim) plus the hard deny (no
fall-through) together make the cap race-safe: at most `cap` callers ever
reach CreateResource. Applied once in the helper, wired into all seven
anonymous provisioning handlers (db/cache/nosql/queue/storage/webhook + vector).

Fail-open preserved (CLAUDE.md #6): a Redis error in checkProvisionLimit still
returns (false, err) and the caller proceeds.

Tests (provision_cap_concurrency_test.go, CI under `go test ./...`):
- TestProvisionCap_ConcurrentBurst_CapsAt5 — reproduces F2 deterministically:
  30 goroutines, one fingerprint, race window held open for the whole burst.
  Runs the same burst twice — pre-fix mode mints all 30 (asserted, so the
  test genuinely fails without the fix), fixed mode mints exactly 5, denies 25.
- TestProvisionCap_Sequential_FiveThenExisting — confirms the unchanged
  non-burst contract: 5 succeed, 6th+ return the existing token.
- TestCheckProvisionLimit_AtomicUnderConcurrency — unit-level proof the
  atomic INCR gate clears exactly `cap` of a 40-way burst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit that referenced this pull request May 20, 2026
…ling/middleware/resource

Sweep of P1/P2 api-repo findings from BUGBASH-2026-05-20/MASTER-LEDGER.md
still open after today's earlier P0 waves (B5-P0, B7-P0-1, B11-F2/F4/F5,
B13-F1, B17-STORAGE-P0-1..4 all already shipped or in flight). Each fix
includes a regression test where one fit cleanly.

—————————————————————————————————————————————————————————————————————————————
B4-F1  POST /auth/email/start per-email RL silent absorption
       internal/metrics/metrics.go: add instant_magic_link_email_rate_limited_total
         counter (Prometheus).
       internal/handlers/magic_link.go: increment counter + WARN log on every
         rate-limited absorption — operator now sees the abuse pattern in NR
         despite the user-visible 202 staying identical to the success path.

B4-F2  emailRateLimitKey sha256[:8] birthday-collision risk
       internal/handlers/magic_link.go:88-89: full sha256 hex (64 chars) instead
         of h[:8] (8 hex chars). 2^32 → 2^128 collision space.
       internal/handlers/magic_link_test.go: TestEmailRateLimitKey_FullHashFingerprint
         pins the full-hash suffix length so a re-truncation regression fails CI.

B4-F4  RFC 5321 local-part >64 chars not gated
       internal/handlers/magic_link.go:looksLikeEmail: reject local-part > 64
         octets per §4.5.3.1.1 (guaranteed-undeliverable).
       internal/handlers/magic_link_test.go: TestLooksLikeEmail_LocalPartCap
         table-driven (under/at/over/way-over cap).

B4-F5  10MB JSON body accepted on /auth/email/start
       internal/handlers/magic_link.go: 1 KiB body cap on POST /auth/email/start
         (real bodies are ~80 bytes; global Fiber BodyLimit = 50 MiB for
         /deploy/new tarballs is way too generous for a 2-field JSON envelope).

B4-F7 / B5-P1-1 / B10-P1-3 / B13-F6
       internal/handlers/helpers.go: extend codeToAgentAction with three
         missing entries — invalid_email, invalid_email_format,
         provision_limit_reached. The 429 provision-limit envelope is now
         carrying agent_action + upgrade_url to the agent (CLAUDE.md
         convention #6 promised these, agents were getting nothing).

B11-F1 8 Razorpay events fell to default 200
       internal/handlers/billing.go: handle subscription.deauthenticated
         (treat as cancel — mandate revoked, cannot charge), subscription.updated
         (route to handleSubscriptionCharged — idempotent re-resolve of tier),
         refund.processed (info-log + span attribute, no tier change).
       The remaining default-200 falls (payment.captured/authorized,
         order.paid, invoice.paid, subscription.authenticated) are
         intentionally informational — Razorpay sends them but tier state
         is already correct from the corresponding subscription.* event.

B18-M1 empty Idempotency-Key header silently coerced to fingerprint path
       internal/middleware/idempotency.go: detect "header present but value
         empty/blank" via raw c.Request().Header.Peek() (Fiber's c.Get()
         returns "" for both omitted-and-present-empty). Reject with 400
         invalid_idempotency_key — matches the OpenAPI contract for
         malformed keys.

B20-P2-1 soft-deleted resources still surfaced via GET /api/v1/resources/:id
       internal/handlers/resource.go:Get: 404 when resource.Status == "deleted".
       Narrow surgical change to the read path only — the DELETE path itself
       still needs to fetch the row to soft-delete it, so leaving
       GetResourceByToken untouched. The customer-visible contract is "the
       resource is gone after DELETE"; this enforces it on the GET.

—————————————————————————————————————————————————————————————————————————————
Coverage block per CLAUDE.md rule 17:

  Symptom:        per-bug, see ledger references in code comments
  Enumeration:    grep -rn for each token, see code-side comments
  Sites found:    1 each (single-site fixes by design)
  Sites touched:  1 each
  Coverage tests: TestEmailRateLimitKey_FullHashFingerprint,
                  TestLooksLikeEmail_LocalPartCap,
                  TestAgentActionContract (covers all three new
                  codeToAgentAction entries automatically — map iteration)
  Live verified:  pending — /healthz commit_id check post-deploy

Gate: `cd api && make gate`-equivalent (go test ./... -short -count=1 -p 1)
green except the 4 pre-existing flakes (TestAdminList_*, TestDBNew_*,
TestBulkTwin_*, TestGetExpiredDeployments_*) explicitly listed in the task
brief as known-acceptable. All other 30+ packages pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant