Skip to content

fix(webhook): QA review criticals — shutdown drain, readiness token, pending-map leak, tests#169

Merged
bobakemamian merged 2 commits intomainfrom
fix/webhook-shutdown-and-readiness
Apr 20, 2026
Merged

fix(webhook): QA review criticals — shutdown drain, readiness token, pending-map leak, tests#169
bobakemamian merged 2 commits intomainfrom
fix/webhook-shutdown-and-readiness

Conversation

@bobakemamian
Copy link
Copy Markdown
Contributor

Closes the four findings from the post-merge QA review of #161.

What's in

C1 — in-flight drawer presses were abandoned on Ctrl-C. `h.inFlight` was incremented under a mutex but never read. Replaced with `sync.WaitGroup` + shared `pressCtx`. Shutdown now drains presses up to 30s then cancels the context so cooperative-cancel goroutines unwind cleanly.

C3 — `waitForReady` over-trusted DNS. A stale DNS record pointing the tunnel hostname at some other origin that returns 2xx on `/healthz` would pass a naive status-only check. Server now mints a per-process random readiness token, embeds it in `/healthz` JSON, and `waitForReady` parses + matches before accepting the tunnel. `MintReadyToken()` exposed for `cmd/serve.go` which runs its own mux.

I1 — `pending` map was an unbounded leak. Dropped entirely. `Wait()` reworked as a convenience around explicit `Register()`/`Deregister()`; callers now register BEFORE publishing the external URL, eliminating the "event arrives before Wait" race that `pending` existed to cover.

P1 — zero tests in `internal/webhook/`. Added coverage for correlation-id uniqueness (10k no-dup), register-before-receive, ctx-cancel waiter cleanup, /webhook path guard, /healthz token contract, all four auth types including env-ref resolution and JWT expired/issuer/signature paths.

Test plan

  • `go build ./...` clean
  • `go vet ./...` clean
  • `go test ./...` all green including 12 new webhook cases
  • Manual: verify Ctrl-C on `buttons webhook listen` during an in-flight drawer press drains cleanly
  • Manual: verify the tunnel-readiness check rejects a stale-DNS origin

🤖 Generated with Claude Code

bobakemamian and others added 2 commits April 20, 2026 18:56
Closes four findings from the post-merge review of PR #161:

C1 — in-flight drawer presses were abandoned on listen shutdown.
h.inFlight counter was incremented/decremented but never read, so
Ctrl-C dropped goroutines mid-press. Replaced with sync.WaitGroup +
shared pressCtx: shutdown now drains in-flight presses up to 30s,
then cancels the shared context so cooperative-cancel goroutines
unwind cleanly before runListen returns.

C3 — waitForReady over-trusted DNS. A stale DNS record pointing the
hostname at some unrelated origin that returns 2xx on /healthz would
pass a naive status-only check. Server now mints a per-process
random readiness token, embeds it in /healthz JSON, and waitForReady
parses + matches it before accepting the tunnel as ready. Closes the
loop through edge DNS → CF → local process rather than just testing
any leg of it. MintReadyToken() exposed for callers (cmd/serve.go)
that run their own HTTP mux.

I1 — pending map was an unbounded leak. Dropped entirely. Replaced
Wait() semantics with an explicit Register()/Deregister() API; Wait
remains as a convenience that handles both. Callers now Register
BEFORE publishing the external URL, eliminating the "event arrives
before Wait" race that pending was there to cover. webhook test
updated to match.

P1 — internal/webhook had no tests. Added:
- TestNewCorrelationID_Uniqueness (10k ids, no dups)
- TestServer_RegisterBeforeReceive (main flow)
- TestServer_WaitContextCancelled (no waiter leak)
- TestServer_HandleWebhook_RejectsSlashInID (path guard)
- TestServer_Healthz_TokenEmbedded (readiness contract)
- TestVerifyAuth for none/basic/header/jwt + env-ref + expired token

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vet flagged a context-leak path: cancelPresses was assigned but not
guaranteed to fire if the readyToken mint failed before the later
shutdown call. Added `defer cancelPresses()` immediately after the
context is created so every return path closes it. Idempotent — the
explicit call in the shutdown drain still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bobakemamian bobakemamian merged commit 38f718c into main Apr 20, 2026
16 checks passed
@bobakemamian bobakemamian deleted the fix/webhook-shutdown-and-readiness branch April 20, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant