Skip to content

fix(api): envelope hygiene bundle 2026-05-30 (BUG-API-020/417/423)#182

Merged
mastermanas805 merged 1 commit into
masterfrom
fix/api-envelope-hygiene-2026-05-30
May 30, 2026
Merged

fix(api): envelope hygiene bundle 2026-05-30 (BUG-API-020/417/423)#182
mastermanas805 merged 1 commit into
masterfrom
fix/api-envelope-hygiene-2026-05-30

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

Three localized envelope-hygiene fixes from the QA backlog. Each is a self-contained 4xx shape change in the api repo — no contract surfaces moved, no tier/pricing field touched, no host strings altered. Registry-iterating tests added per CLAUDE.md rule 18.

  • BUG-API-020invalid_token agent_action drops the INSTANODE_TOKEN noun. The code is emitted from 9 non-auth sites (webhook path token, invitation token, storage path token, onboarding claim JWT, stack manifest needs token, deploy logs path token); old copy sent every one of them on the wrong remediation. The real auth/Bearer 401 path stays unchanged.
  • BUG-API-417/healthz now emits now (RFC 3339 ms) so canaries/SDKs/agents can detect clock skew without an extra round trip.
  • BUG-API-423/webhook/receive/:token 404 now uses webhook_not_found instead of generic not_found; both branches (unknown token + wrong-resource-type) share the surface-specific code (preserves no-enumeration property).

Coverage block (rule 17)

Bug Enumeration Sites found Sites touched Coverage test
BUG-API-020 rg -nF '"invalid_token"' internal/handlers/ 9 emit + 1 registry 1 registry entry (registry-iterating per rule 18) TestInvalidToken_AgentAction_DoesNotNameInstanodeToken
BUG-API-417 rg -nF '"now":' internal/router/ 1 emit site 1 emit + 1 in-process fixture mirror TestHealthzShape (asserts now present + parses + drift <5s)
BUG-API-423 rg -nF '"webhook_not_found"' internal/handlers/ 0 (new) 2 emit + 1 new registry entry TestWebhookNotFound_AgentAction_HasSurfaceSpecificCopy

Verification plan

Local gate (rule 23):

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run — 0 issues
  • go test ./internal/router/ — green
  • go test ./internal/middleware/ — green
  • go test ./internal/handlers/ — new tests pass; pre-existing local-only failures (TestDBNew_, TestQueue_, TestBulkTwin_, TestAdminList_) reproduce identically on baseline master (NATS / customer-DB unavailable on bare laptop per Makefile gate target comment) — CI provides those backends.

Live verification after merge (rule 14):

  • curl https://api.instanode.dev/healthz | jq .commit_id must equal git rev-parse --short HEAD
  • curl https://api.instanode.dev/healthz | jq .now returns RFC 3339 timestamp
  • curl https://api.instanode.dev/webhook/receive/aaaa | jq .agent_action no longer contains "INSTANODE_TOKEN"
  • curl -X POST https://api.instanode.dev/webhook/receive/00000000-0000-0000-0000-000000000000 -d 'x' | jq .error returns webhook_not_found

Test plan

  • go build ./... green
  • go vet ./... green
  • golangci-lint run clean
  • New tests pass locally
  • Existing tests unchanged (no new failures vs baseline)
  • CI green
  • Live SHA round-trip on /healthz after deploy
  • Live agent_action and webhook_not_found round-trip after deploy

Three localized envelope-hygiene fixes pulled off the QA inbox. Each is a
self-contained 4xx shape change — no contract surfaces moved, no tier or
pricing change, no host-string touch — and each is registry-iterating
tested per CLAUDE.md rule 18.

BUG-API-020 — invalid_token agent_action drops the INSTANODE_TOKEN noun

The `invalid_token` error code is emitted from 9 handler sites, NONE of
which are about the user's Bearer credential: webhook receiver URL path
token (webhook.go:528/811), invitation token (teams.go:170/248), storage
URL path token (storage_presign.go:114), onboarding claim JWT
(onboarding.go:92/282/294), stack manifest needs token (stack.go:539),
deploy logs URL path token (logs.go:148). Pre-fix the agent_action told
every one of those surfaces to "have the user log in at instanode.dev/login
to mint a new INSTANODE_TOKEN" — wrong remediation for all 9 sites. The
new copy stays neutral: the supplied token in the URL path or claim JWT,
with a pointer to the docs. The real auth/Bearer 401 path stays at
middleware/auth.go:61 in `unauthorizedAgentAction` (still names
INSTANODE_TOKEN there — correct wording for that surface).

BUG-API-417 — /healthz emits `now` server timestamp for clock-skew detection

Canaries / SDKs / agents could not detect clock skew between their host
and the api pod without an extra round trip. /healthz now emits `now` as
RFC 3339 with millisecond precision (same format as audit-log and
forwarder_sent rows). build_time stays as the immutable image stamp; a
probe can compute pod uptime from the difference of the two.

BUG-API-423 — /webhook/receive/:token 404 uses `webhook_not_found`

Both 404 branches in webhook.Receive (unknown token + wrong-resource-type)
previously emitted the generic `not_found` code which agents grepping on
the error code couldn't disambiguate from any other 404. The new
surface-specific `webhook_not_found` code makes the surface explicit; the
shared code across both branches is intentional (we MUST NOT confirm
whether the token belongs to a different resource type — that would let a
probe enumerate resources by token shape). Wire shape (status + message)
unchanged.

Coverage block (rule 17):

  Symptom:       BUG-API-020 — agents emit "log in to mint a new
                 INSTANODE_TOKEN" for path-token surfaces. BUG-API-417 —
                 no server `now` field for clock-skew detection.
                 BUG-API-423 — webhook senders cannot branch on the
                 surface-specific 404 code.
  Enumeration:   `rg -nF '"invalid_token"' internal/handlers/` (9 emit
                 sites + 1 registry entry); `rg -nF '"webhook_not_found"'
                 internal/handlers/` (2 emit sites + 1 registry entry);
                 `rg -nF '"now":' internal/router/` (1 emit site).
  Sites found:   9 / 2 / 1 respectively.
  Sites touched: invalid_token: registry entry rewrites the agent_action
                 for ALL 9 sites at once (rule 18 — registry-iterating).
                 webhook_not_found: 2 emit sites updated + 1 new
                 registry entry. /healthz now: 1 emit site + 1 mirror
                 in healthz_test.go's in-process fixture.
  Coverage test: TestInvalidToken_AgentAction_DoesNotNameInstanodeToken
                 + TestWebhookNotFound_AgentAction_HasSurfaceSpecificCopy
                 (new file: envelope_hygiene_2026_05_30_test.go);
                 TestHealthzShape now asserts `now` is present + parses
                 + within 5s drift.
  Live verified: pending post-merge SHA round-trip on /healthz (rule 14).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 19dcb5f into master May 30, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant