feat(api): add GET /api/v1/whoami for agent identity probes#6
Merged
Conversation
Friction-tested 2026-05-11: agents that wanted to verify their bearer
token worked were reaching for arbitrary endpoints like /api/v1/team and
getting 404 instead of 401, which led to wasted token-mint retry cycles.
GET /api/v1/whoami is the canonical "am I authenticated?" probe:
- Returns 401 (via the /api/v1 RequireAuth middleware) when the token
is missing/expired/invalid.
- Returns 200 + { ok, user_id, team_id, team_name?, plan_tier? } when
the token works.
plan_tier is best-effort enriched from the teams table so agents don't
need a second hop to /billing — failures here drop the field silently
rather than failing the whole call.
OpenAPI:
- New /api/v1/whoami path with bearerAuth security + 401 in responses
- New WhoamiResponse schema with field docs
Tests:
- TestOpenAPI_WhoamiPathExists guards the path is in openapi.json so
an agent reading the spec can discover it.
- TestOpenAPI_WhoamiResponseSchema guards the field shape (ok,
user_id, team_id, plan_tier).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 11, 2026
#10) Earlier PRs (#4, #6, #9) shipped OpenAPI schema tests but lacked handler-level behavior tests. Code reviewers flagged the gap — a schema test catches "field documented" regressions but misses "field actually emitted" or "input actually parsed" regressions. This PR backfills behavior tests for four shipped behaviors: 1. upgradeNote / limitExceededNote copy (PR #9, friction #13) TestUpgradeNote_DoesNotMentionTrial — 2 sub-cases TestLimitExceededNote_DoesNotMentionTrial — 4 sub-cases Guards: no "14-day trial" framing, contains "Claim to keep" + "$9/mo", no instant.dev/start leakage. 2. POST /api/v1/whoami (PR #6, friction #9) TestWhoami_NoTokenReturns401 — 401 on missing bearer TestWhoami_ReturnsIdentityForAuthedRequest — 200 with uid/tid claims; plan_tier enrichment when DB hit Test app now wires /api/v1/whoami so this and future tests can hit it through the full RequireAuth middleware. 3. POST /deploy/new env_vars JSON parsing (PR #4, friction #11) TestDeployNew_EnvVarsJSON_Parsed_Into_InitEnv — valid JSON merges into deployment.EnvVars; underscore-prefixed keys silently stripped (_secret never leaks) TestDeployNew_EnvVarsInvalidJSON_Returns400 — malformed JSON returns 400 error="invalid_env_vars" (not a generic 500) Includes a multipartDeployBody helper that other deploy tests can reuse without colliding with stack_test.go's name. 4. upgrade_jwt in provisioning responses (PR #9, friction #16) TestAnonymousProvisionEmitsUpgradeJWT_OnDedup — dedup response includes raw upgrade_jwt JWT (no parsing) alongside the legacy upgrade URL; the two presentations of the same token must not drift. Skips cleanly when local test DB schema lags (env column). All 10 new test cases pass against postgres:16-alpine + redis:7-alpine. Total run time <1s. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 11, 2026
… compat (#13) This PR surfaced from a "test everything" sweep across api/worker/ provisioner/common. Three real production bugs and three test-infra gaps came out together; they share enough code that splitting them would force ordering dependencies between PRs. ## 1. respondError used to silently bypass multi-return validators respondError returned c.Status(status).JSON()'s result — nil on every successful body write. Helper functions like resolveEnv and requireTeamMatch composed it as: return zeroValue, respondError(c, 400, "invalid_x", "...") Their callers checked `if err != nil`, which was false on the happy path of respondError (response written successfully). The handler then continued PAST the validation gate with zeroValue, producing: - silent acceptance of invalid env names (response said env="production" while the URL said env=Prod) - 500 instead of 403 when an actor's JWT team didn't match the path :team_id (handler proceeded with uuid.Nil → DB error) Fix: - respondError now returns ErrResponseWritten (non-nil sentinel) regardless of the JSON write result. - The custom ErrorHandlers in router.go, testhelpers.go, and the three per-test fiber.App builders (stack_test, billing_test, vault_test, teams_test) detect this sentinel and return nil so the original 400/403/etc. body is preserved. The visible effect in prod (verified live on v2.2.0): $ curl -X POST 'https://api.instanode.dev/db/new?env=Prod' -d '{}' Before: 201 with env="production" (invalid input silently coerced) After: 400 {"error":"invalid_env", ...} ## 2. Vault tier-check order was misleading Hobby-tier PUT to /api/v1/vault/staging/X when already at the 20-entry quota used to return 402 vault_quota_exceeded. The agent reading that might add seats or upgrade for capacity — but the real block was the env allowlist (staging isn't a hobby env). Now env check fires first (403 vault_env_not_allowed) so the agent learns what would actually help. ## 3. Test-infra fixes uncovered by the sweep - vault_test space-encoding: Go 1.26 httptest.NewRequest panics on unescaped spaces in URL paths. Pre-encode to %20. - stack_test domain assertion: still checked "instant.dev/start" — updated to "instanode.dev/start". - deploy_env_vars_test body double-read: readBody(t, resp) was being evaluated unconditionally in a require.NotEqual message arg, consuming the body before the success-path Decode could read it. Read once into a buffer, reuse. - dashboardsvc/server_test resourceSelectColumns added the env column (migration 009) — sqlmock was emitting 18 columns into a model that scans 19, surfacing as "sql: expected 18 destination arguments in Scan, not 19". - testhelpers wires /api/v1/whoami (the route added in PR #6 was in production but missing from the integration test app). ## Sweep results Before this PR: 12 test failures across api/internal/handlers, internal/dashboardsvc, internal/middleware. After: zero failures. Full sweep across api, worker, provisioner, common all green. Verified live: v2.2.0-validation-bugs deployed; sample env=Prod returns 400 invalid_env as expected; valid env=production still succeeds with 201. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 14, 2026
#6) (#110) The Idempotency-Key middleware silently skipped caching for any 4xx response produced through respondError* — the production error path that returns the ErrResponseWritten sentinel after committing the body to the wire. The middleware's `if err := c.Next(); err != nil { return err }` clause treated the sentinel as an abort and bypassed the cache write. User-visible impact: agent hits /deploy/new over its tier cap, server returns 402, agent retries with the same Idempotency-Key, server runs the handler fresh — re-hitting Razorpay/billing side effects on every retry. The whole point of idempotency middleware (one logical request → one side effect) was broken for error-path responses. The existing TestIdempotency_4xxIsCached passed for the wrong reason: the test fixture used c.Status().JSON() (returns nil) instead of the real handler error path handlers.WriteFiberError -> respondError (returns ErrResponseWritten). The middleware's bail clause skipped the cache write in production but not in test — a false-positive. Fix (≤30 LOC in middleware): special-case the ErrResponseWritten sentinel and fall through to the cache logic. Real errors (DB down, panic-recovered, fiber.NewError from upstream) still bypass caching. Use a registered-callback pattern (handlers.init() wires the check) to avoid a handlers->middleware->handlers import cycle. Tests: 13 idempotency tests now pass (up from 9). New coverage: - TestIdempotency_4xxIsCached rewritten to use handlers.WriteFiberError so it exercises the real respondError path (FAILS before fix). - TestIdempotency_RealHandlerErrorPathCaches reproduces the exact BB2-D5 scenario: handler returns 402 via respondError, agent retry with same Idempotency-Key must replay (handler hits stay at 1). - TestIdempotency_5xxFromRespondError_NotCached guards against over-correction — 5xx via respondError still bypasses caching so retries can complete the work once upstream recovers. - TestIdempotency_NonSentinelErrorNotCached pins that plumbing errors (errors.New, fiber.NewError) keep bypassing caching. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 19, 2026
Load-test finding F2 (LOAD-CHAOS-REPORT-2026-05-19): a 30-way simultaneous burst from one fingerprint minted 22-29 anonymous tokens instead of capping at 5. Root cause — TOCTOU fall-through in the over-cap branch. checkProvisionLimit's Redis INCR is atomic and correctly hands every burst caller a distinct slot, so callers 6..N all see limitExceeded == true. The over-cap branch then looks up an existing resource to dedup against. But during a *simultaneous* burst the <=5 winning provisions have claimed their INCR slot yet not committed a `resources` row — so both GetActiveResourceByFingerprintType and the cross-service GetActiveResourceByFingerprint return ErrResourceNotFound, and control FELL THROUGH the limitExceeded block to CreateResource. Every burst caller minted a fresh token. Fix — shared helper denyProvisionOverCap in provision_helper.go: an over-cap caller that finds no existing resource is genuinely over the cap (its atomic slot number proved it); the missing row only means the winners are in-flight. Such a caller is now hard-denied with 429, never allowed to fall through to a fresh provision. The atomic INCR (slot claim) plus the hard deny (no fall-through) together make the cap race-safe: at most `cap` callers ever reach CreateResource. Applied once in the helper, wired into all seven anonymous provisioning handlers (db/cache/nosql/queue/storage/webhook + vector). Fail-open preserved (CLAUDE.md #6): a Redis error in checkProvisionLimit still returns (false, err) and the caller proceeds. Tests (provision_cap_concurrency_test.go, CI under `go test ./...`): - TestProvisionCap_ConcurrentBurst_CapsAt5 — reproduces F2 deterministically: 30 goroutines, one fingerprint, race window held open for the whole burst. Runs the same burst twice — pre-fix mode mints all 30 (asserted, so the test genuinely fails without the fix), fixed mode mints exactly 5, denies 25. - TestProvisionCap_Sequential_FiveThenExisting — confirms the unchanged non-burst contract: 5 succeed, 6th+ return the existing token. - TestCheckProvisionLimit_AtomicUnderConcurrency — unit-level proof the atomic INCR gate clears exactly `cap` of a 40-way burst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 20, 2026
…ling/middleware/resource
Sweep of P1/P2 api-repo findings from BUGBASH-2026-05-20/MASTER-LEDGER.md
still open after today's earlier P0 waves (B5-P0, B7-P0-1, B11-F2/F4/F5,
B13-F1, B17-STORAGE-P0-1..4 all already shipped or in flight). Each fix
includes a regression test where one fit cleanly.
—————————————————————————————————————————————————————————————————————————————
B4-F1 POST /auth/email/start per-email RL silent absorption
internal/metrics/metrics.go: add instant_magic_link_email_rate_limited_total
counter (Prometheus).
internal/handlers/magic_link.go: increment counter + WARN log on every
rate-limited absorption — operator now sees the abuse pattern in NR
despite the user-visible 202 staying identical to the success path.
B4-F2 emailRateLimitKey sha256[:8] birthday-collision risk
internal/handlers/magic_link.go:88-89: full sha256 hex (64 chars) instead
of h[:8] (8 hex chars). 2^32 → 2^128 collision space.
internal/handlers/magic_link_test.go: TestEmailRateLimitKey_FullHashFingerprint
pins the full-hash suffix length so a re-truncation regression fails CI.
B4-F4 RFC 5321 local-part >64 chars not gated
internal/handlers/magic_link.go:looksLikeEmail: reject local-part > 64
octets per §4.5.3.1.1 (guaranteed-undeliverable).
internal/handlers/magic_link_test.go: TestLooksLikeEmail_LocalPartCap
table-driven (under/at/over/way-over cap).
B4-F5 10MB JSON body accepted on /auth/email/start
internal/handlers/magic_link.go: 1 KiB body cap on POST /auth/email/start
(real bodies are ~80 bytes; global Fiber BodyLimit = 50 MiB for
/deploy/new tarballs is way too generous for a 2-field JSON envelope).
B4-F7 / B5-P1-1 / B10-P1-3 / B13-F6
internal/handlers/helpers.go: extend codeToAgentAction with three
missing entries — invalid_email, invalid_email_format,
provision_limit_reached. The 429 provision-limit envelope is now
carrying agent_action + upgrade_url to the agent (CLAUDE.md
convention #6 promised these, agents were getting nothing).
B11-F1 8 Razorpay events fell to default 200
internal/handlers/billing.go: handle subscription.deauthenticated
(treat as cancel — mandate revoked, cannot charge), subscription.updated
(route to handleSubscriptionCharged — idempotent re-resolve of tier),
refund.processed (info-log + span attribute, no tier change).
The remaining default-200 falls (payment.captured/authorized,
order.paid, invoice.paid, subscription.authenticated) are
intentionally informational — Razorpay sends them but tier state
is already correct from the corresponding subscription.* event.
B18-M1 empty Idempotency-Key header silently coerced to fingerprint path
internal/middleware/idempotency.go: detect "header present but value
empty/blank" via raw c.Request().Header.Peek() (Fiber's c.Get()
returns "" for both omitted-and-present-empty). Reject with 400
invalid_idempotency_key — matches the OpenAPI contract for
malformed keys.
B20-P2-1 soft-deleted resources still surfaced via GET /api/v1/resources/:id
internal/handlers/resource.go:Get: 404 when resource.Status == "deleted".
Narrow surgical change to the read path only — the DELETE path itself
still needs to fetch the row to soft-delete it, so leaving
GetResourceByToken untouched. The customer-visible contract is "the
resource is gone after DELETE"; this enforces it on the GET.
—————————————————————————————————————————————————————————————————————————————
Coverage block per CLAUDE.md rule 17:
Symptom: per-bug, see ledger references in code comments
Enumeration: grep -rn for each token, see code-side comments
Sites found: 1 each (single-site fixes by design)
Sites touched: 1 each
Coverage tests: TestEmailRateLimitKey_FullHashFingerprint,
TestLooksLikeEmail_LocalPartCap,
TestAgentActionContract (covers all three new
codeToAgentAction entries automatically — map iteration)
Live verified: pending — /healthz commit_id check post-deploy
Gate: `cd api && make gate`-equivalent (go test ./... -short -count=1 -p 1)
green except the 4 pre-existing flakes (TestAdminList_*, TestDBNew_*,
TestBulkTwin_*, TestGetExpiredDeployments_*) explicitly listed in the task
brief as known-acceptable. All other 30+ packages pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`GET /api/v1/whoami` is the canonical "am I authenticated?" probe for agents:
`plan_tier` is best-effort enriched from the teams table so agents avoid a second hop to `/billing` — DB lookup failures drop the field silently rather than failing the whole call.
Replaces the previous pattern of agents probing arbitrary paths like `/api/v1/team` and getting 404 instead of 401, which led to wasted token-mint retry cycles.
Test plan
🤖 Generated with Claude Code