Skip to content

test(onboard): retry callback request to absorb listener-startup race#103

Merged
qarlosh merged 1 commit into
masterfrom
test/onboard-callback-retry
May 15, 2026
Merged

test(onboard): retry callback request to absorb listener-startup race#103
qarlosh merged 1 commit into
masterfrom
test/onboard-callback-retry

Conversation

@qarlosh
Copy link
Copy Markdown
Collaborator

@qarlosh qarlosh commented May 15, 2026

Summary

Hardens TestBuild_OAuthE2E (and its Microsoft variants) against a flake observed on CI for PR #102 (Get http://127.0.0.1:NNNN/callback...: EOF).

The binary in cmd/chaperone-onboard binds the callback listener and prints the auth URL as soon as net.Listen returns. The goroutine that actually calls Accept may not be scheduled yet on a busy runner — the test then fires simulateCallback into a half-ready server and trips EOF.

simulateCallback now retries on transport errors for up to ~1s (10 attempts × 100ms backoff) and respects the test context. The callback handler is single-use (it triggers server.Shutdown after a successful response), so retries only execute when the first attempt never reached the handler — they cannot mask a real callback failure.

Test-only change; no production code touched.

Test plan

  • go test -race -run 'TestBuild_OAuthE2E|TestBuild_MicrosoftE2E' -count=5 ./cmd/chaperone-onboard/ — 15 runs, all green
  • go test -race ./cmd/chaperone-onboard/ — full package green
  • CI green on this PR

TestBuild_OAuthE2E (and the two Microsoft variants) flakes intermittently
on CI with `Get http://127.0.0.1:NNNN/callback...: EOF`. The binary binds
the local listener and prints the auth URL as soon as net.Listen returns,
but the goroutine that calls Accept may not be scheduled yet on a busy
runner — the test then fires the callback GET into a half-ready server
and trips EOF.

simulateCallback now retries on transport errors for up to ~1s
(10 attempts at 100ms backoff) and respects the test context. The
callback handler is single-use (it triggers server.Shutdown after a
successful response), so retries only execute when the first attempt
never reached the handler — they cannot mask a real callback failure.

Verified by running the three E2E tests with -race -count=5 locally
(15 runs, all green).
@qarlosh qarlosh requested a review from arnaugiralt May 15, 2026 08:20
@qarlosh qarlosh closed this May 15, 2026
@qarlosh qarlosh reopened this May 15, 2026
@qarlosh qarlosh merged commit 4eb8879 into master May 15, 2026
20 of 24 checks passed
@qarlosh qarlosh deleted the test/onboard-callback-retry branch May 15, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants