feat(api): add GET /api/v1/whoami for agent identity probes by mastermanas805 · Pull Request #6 · InstaNode-dev/api

mastermanas805 · 2026-05-11T08:23:49Z

Summary

`GET /api/v1/whoami` is the canonical "am I authenticated?" probe for agents:

Returns 401 (via the `/api/v1` RequireAuth middleware) on missing/expired/invalid tokens.
Returns 200 + `{ok, user_id, team_id, team_name?, plan_tier?}` when the token works.

`plan_tier` is best-effort enriched from the teams table so agents avoid a second hop to `/billing` — DB lookup failures drop the field silently rather than failing the whole call.

Replaces the previous pattern of agents probing arbitrary paths like `/api/v1/team` and getting 404 instead of 401, which led to wasted token-mint retry cycles.

Test plan

`TestOpenAPI_WhoamiPathExists` — guards the path is documented in `/openapi.json`
`TestOpenAPI_WhoamiResponseSchema` — guards the field shape stays stable
`go build ./...` passes
Integration test with real auth — follow-up

🤖 Generated with Claude Code

Friction-tested 2026-05-11: agents that wanted to verify their bearer token worked were reaching for arbitrary endpoints like /api/v1/team and getting 404 instead of 401, which led to wasted token-mint retry cycles. GET /api/v1/whoami is the canonical "am I authenticated?" probe: - Returns 401 (via the /api/v1 RequireAuth middleware) when the token is missing/expired/invalid. - Returns 200 + { ok, user_id, team_id, team_name?, plan_tier? } when the token works. plan_tier is best-effort enriched from the teams table so agents don't need a second hop to /billing — failures here drop the field silently rather than failing the whole call. OpenAPI: - New /api/v1/whoami path with bearerAuth security + 401 in responses - New WhoamiResponse schema with field docs Tests: - TestOpenAPI_WhoamiPathExists guards the path is in openapi.json so an agent reading the spec can discover it. - TestOpenAPI_WhoamiResponseSchema guards the field shape (ok, user_id, team_id, plan_tier). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#10) Earlier PRs (#4, #6, #9) shipped OpenAPI schema tests but lacked handler-level behavior tests. Code reviewers flagged the gap — a schema test catches "field documented" regressions but misses "field actually emitted" or "input actually parsed" regressions. This PR backfills behavior tests for four shipped behaviors: 1. upgradeNote / limitExceededNote copy (PR #9, friction #13) TestUpgradeNote_DoesNotMentionTrial — 2 sub-cases TestLimitExceededNote_DoesNotMentionTrial — 4 sub-cases Guards: no "14-day trial" framing, contains "Claim to keep" + "$9/mo", no instant.dev/start leakage. 2. POST /api/v1/whoami (PR #6, friction #9) TestWhoami_NoTokenReturns401 — 401 on missing bearer TestWhoami_ReturnsIdentityForAuthedRequest — 200 with uid/tid claims; plan_tier enrichment when DB hit Test app now wires /api/v1/whoami so this and future tests can hit it through the full RequireAuth middleware. 3. POST /deploy/new env_vars JSON parsing (PR #4, friction #11) TestDeployNew_EnvVarsJSON_Parsed_Into_InitEnv — valid JSON merges into deployment.EnvVars; underscore-prefixed keys silently stripped (_secret never leaks) TestDeployNew_EnvVarsInvalidJSON_Returns400 — malformed JSON returns 400 error="invalid_env_vars" (not a generic 500) Includes a multipartDeployBody helper that other deploy tests can reuse without colliding with stack_test.go's name. 4. upgrade_jwt in provisioning responses (PR #9, friction #16) TestAnonymousProvisionEmitsUpgradeJWT_OnDedup — dedup response includes raw upgrade_jwt JWT (no parsing) alongside the legacy upgrade URL; the two presentations of the same token must not drift. Skips cleanly when local test DB schema lags (env column). All 10 new test cases pass against postgres:16-alpine + redis:7-alpine. Total run time <1s. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… compat (#13) This PR surfaced from a "test everything" sweep across api/worker/ provisioner/common. Three real production bugs and three test-infra gaps came out together; they share enough code that splitting them would force ordering dependencies between PRs. ## 1. respondError used to silently bypass multi-return validators respondError returned c.Status(status).JSON()'s result — nil on every successful body write. Helper functions like resolveEnv and requireTeamMatch composed it as: return zeroValue, respondError(c, 400, "invalid_x", "...") Their callers checked `if err != nil`, which was false on the happy path of respondError (response written successfully). The handler then continued PAST the validation gate with zeroValue, producing: - silent acceptance of invalid env names (response said env="production" while the URL said env=Prod) - 500 instead of 403 when an actor's JWT team didn't match the path :team_id (handler proceeded with uuid.Nil → DB error) Fix: - respondError now returns ErrResponseWritten (non-nil sentinel) regardless of the JSON write result. - The custom ErrorHandlers in router.go, testhelpers.go, and the three per-test fiber.App builders (stack_test, billing_test, vault_test, teams_test) detect this sentinel and return nil so the original 400/403/etc. body is preserved. The visible effect in prod (verified live on v2.2.0): $ curl -X POST 'https://api.instanode.dev/db/new?env=Prod' -d '{}' Before: 201 with env="production" (invalid input silently coerced) After: 400 {"error":"invalid_env", ...} ## 2. Vault tier-check order was misleading Hobby-tier PUT to /api/v1/vault/staging/X when already at the 20-entry quota used to return 402 vault_quota_exceeded. The agent reading that might add seats or upgrade for capacity — but the real block was the env allowlist (staging isn't a hobby env). Now env check fires first (403 vault_env_not_allowed) so the agent learns what would actually help. ## 3. Test-infra fixes uncovered by the sweep - vault_test space-encoding: Go 1.26 httptest.NewRequest panics on unescaped spaces in URL paths. Pre-encode to %20. - stack_test domain assertion: still checked "instant.dev/start" — updated to "instanode.dev/start". - deploy_env_vars_test body double-read: readBody(t, resp) was being evaluated unconditionally in a require.NotEqual message arg, consuming the body before the success-path Decode could read it. Read once into a buffer, reuse. - dashboardsvc/server_test resourceSelectColumns added the env column (migration 009) — sqlmock was emitting 18 columns into a model that scans 19, surfacing as "sql: expected 18 destination arguments in Scan, not 19". - testhelpers wires /api/v1/whoami (the route added in PR #6 was in production but missing from the integration test app). ## Sweep results Before this PR: 12 test failures across api/internal/handlers, internal/dashboardsvc, internal/middleware. After: zero failures. Full sweep across api, worker, provisioner, common all green. Verified live: v2.2.0-validation-bugs deployed; sample env=Prod returns 400 invalid_env as expected; valid env=production still succeeds with 201. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#6) (#110) The Idempotency-Key middleware silently skipped caching for any 4xx response produced through respondError* — the production error path that returns the ErrResponseWritten sentinel after committing the body to the wire. The middleware's `if err := c.Next(); err != nil { return err }` clause treated the sentinel as an abort and bypassed the cache write. User-visible impact: agent hits /deploy/new over its tier cap, server returns 402, agent retries with the same Idempotency-Key, server runs the handler fresh — re-hitting Razorpay/billing side effects on every retry. The whole point of idempotency middleware (one logical request → one side effect) was broken for error-path responses. The existing TestIdempotency_4xxIsCached passed for the wrong reason: the test fixture used c.Status().JSON() (returns nil) instead of the real handler error path handlers.WriteFiberError -> respondError (returns ErrResponseWritten). The middleware's bail clause skipped the cache write in production but not in test — a false-positive. Fix (≤30 LOC in middleware): special-case the ErrResponseWritten sentinel and fall through to the cache logic. Real errors (DB down, panic-recovered, fiber.NewError from upstream) still bypass caching. Use a registered-callback pattern (handlers.init() wires the check) to avoid a handlers->middleware->handlers import cycle. Tests: 13 idempotency tests now pass (up from 9). New coverage: - TestIdempotency_4xxIsCached rewritten to use handlers.WriteFiberError so it exercises the real respondError path (FAILS before fix). - TestIdempotency_RealHandlerErrorPathCaches reproduces the exact BB2-D5 scenario: handler returns 402 via respondError, agent retry with same Idempotency-Key must replay (handler hits stay at 1). - TestIdempotency_5xxFromRespondError_NotCached guards against over-correction — 5xx via respondError still bypasses caching so retries can complete the work once upstream recovers. - TestIdempotency_NonSentinelErrorNotCached pins that plumbing errors (errors.New, fiber.NewError) keep bypassing caching. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Load-test finding F2 (LOAD-CHAOS-REPORT-2026-05-19): a 30-way simultaneous burst from one fingerprint minted 22-29 anonymous tokens instead of capping at 5. Root cause — TOCTOU fall-through in the over-cap branch. checkProvisionLimit's Redis INCR is atomic and correctly hands every burst caller a distinct slot, so callers 6..N all see limitExceeded == true. The over-cap branch then looks up an existing resource to dedup against. But during a *simultaneous* burst the <=5 winning provisions have claimed their INCR slot yet not committed a `resources` row — so both GetActiveResourceByFingerprintType and the cross-service GetActiveResourceByFingerprint return ErrResourceNotFound, and control FELL THROUGH the limitExceeded block to CreateResource. Every burst caller minted a fresh token. Fix — shared helper denyProvisionOverCap in provision_helper.go: an over-cap caller that finds no existing resource is genuinely over the cap (its atomic slot number proved it); the missing row only means the winners are in-flight. Such a caller is now hard-denied with 429, never allowed to fall through to a fresh provision. The atomic INCR (slot claim) plus the hard deny (no fall-through) together make the cap race-safe: at most `cap` callers ever reach CreateResource. Applied once in the helper, wired into all seven anonymous provisioning handlers (db/cache/nosql/queue/storage/webhook + vector). Fail-open preserved (CLAUDE.md #6): a Redis error in checkProvisionLimit still returns (false, err) and the caller proceeds. Tests (provision_cap_concurrency_test.go, CI under `go test ./...`): - TestProvisionCap_ConcurrentBurst_CapsAt5 — reproduces F2 deterministically: 30 goroutines, one fingerprint, race window held open for the whole burst. Runs the same burst twice — pre-fix mode mints all 30 (asserted, so the test genuinely fails without the fix), fixed mode mints exactly 5, denies 25. - TestProvisionCap_Sequential_FiveThenExisting — confirms the unchanged non-burst contract: 5 succeed, 6th+ return the existing token. - TestCheckProvisionLimit_AtomicUnderConcurrency — unit-level proof the atomic INCR gate clears exactly `cap` of a 40-way burst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ling/middleware/resource Sweep of P1/P2 api-repo findings from BUGBASH-2026-05-20/MASTER-LEDGER.md still open after today's earlier P0 waves (B5-P0, B7-P0-1, B11-F2/F4/F5, B13-F1, B17-STORAGE-P0-1..4 all already shipped or in flight). Each fix includes a regression test where one fit cleanly. ————————————————————————————————————————————————————————————————————————————— B4-F1 POST /auth/email/start per-email RL silent absorption internal/metrics/metrics.go: add instant_magic_link_email_rate_limited_total counter (Prometheus). internal/handlers/magic_link.go: increment counter + WARN log on every rate-limited absorption — operator now sees the abuse pattern in NR despite the user-visible 202 staying identical to the success path. B4-F2 emailRateLimitKey sha256[:8] birthday-collision risk internal/handlers/magic_link.go:88-89: full sha256 hex (64 chars) instead of h[:8] (8 hex chars). 2^32 → 2^128 collision space. internal/handlers/magic_link_test.go: TestEmailRateLimitKey_FullHashFingerprint pins the full-hash suffix length so a re-truncation regression fails CI. B4-F4 RFC 5321 local-part >64 chars not gated internal/handlers/magic_link.go:looksLikeEmail: reject local-part > 64 octets per §4.5.3.1.1 (guaranteed-undeliverable). internal/handlers/magic_link_test.go: TestLooksLikeEmail_LocalPartCap table-driven (under/at/over/way-over cap). B4-F5 10MB JSON body accepted on /auth/email/start internal/handlers/magic_link.go: 1 KiB body cap on POST /auth/email/start (real bodies are ~80 bytes; global Fiber BodyLimit = 50 MiB for /deploy/new tarballs is way too generous for a 2-field JSON envelope). B4-F7 / B5-P1-1 / B10-P1-3 / B13-F6 internal/handlers/helpers.go: extend codeToAgentAction with three missing entries — invalid_email, invalid_email_format, provision_limit_reached. The 429 provision-limit envelope is now carrying agent_action + upgrade_url to the agent (CLAUDE.md convention #6 promised these, agents were getting nothing). B11-F1 8 Razorpay events fell to default 200 internal/handlers/billing.go: handle subscription.deauthenticated (treat as cancel — mandate revoked, cannot charge), subscription.updated (route to handleSubscriptionCharged — idempotent re-resolve of tier), refund.processed (info-log + span attribute, no tier change). The remaining default-200 falls (payment.captured/authorized, order.paid, invoice.paid, subscription.authenticated) are intentionally informational — Razorpay sends them but tier state is already correct from the corresponding subscription.* event. B18-M1 empty Idempotency-Key header silently coerced to fingerprint path internal/middleware/idempotency.go: detect "header present but value empty/blank" via raw c.Request().Header.Peek() (Fiber's c.Get() returns "" for both omitted-and-present-empty). Reject with 400 invalid_idempotency_key — matches the OpenAPI contract for malformed keys. B20-P2-1 soft-deleted resources still surfaced via GET /api/v1/resources/:id internal/handlers/resource.go:Get: 404 when resource.Status == "deleted". Narrow surgical change to the read path only — the DELETE path itself still needs to fetch the row to soft-delete it, so leaving GetResourceByToken untouched. The customer-visible contract is "the resource is gone after DELETE"; this enforces it on the GET. ————————————————————————————————————————————————————————————————————————————— Coverage block per CLAUDE.md rule 17: Symptom: per-bug, see ledger references in code comments Enumeration: grep -rn for each token, see code-side comments Sites found: 1 each (single-site fixes by design) Sites touched: 1 each Coverage tests: TestEmailRateLimitKey_FullHashFingerprint, TestLooksLikeEmail_LocalPartCap, TestAgentActionContract (covers all three new codeToAgentAction entries automatically — map iteration) Live verified: pending — /healthz commit_id check post-deploy Gate: `cd api && make gate`-equivalent (go test ./... -short -count=1 -p 1) green except the 4 pre-existing flakes (TestAdminList_*, TestDBNew_*, TestBulkTwin_*, TestGetExpiredDeployments_*) explicitly listed in the task brief as known-acceptable. All other 30+ packages pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mastermanas805 merged commit d23a352 into master May 11, 2026

mastermanas805 deleted the feat/whoami-and-diagnostics branch May 11, 2026 08:23

mastermanas805 mentioned this pull request May 21, 2026

fix(reliability): migration 064 — forwarder_sent.audit_log_id FK (closes gap #6) #126

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add GET /api/v1/whoami for agent identity probes#6

feat(api): add GET /api/v1/whoami for agent identity probes#6
mastermanas805 merged 1 commit into
masterfrom
feat/whoami-and-diagnostics

mastermanas805 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mastermanas805 commented May 11, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant