fix: eliminate flaky timing race in TestReady_BlocksUntilRunStarts#7079
Conversation
The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
There was a problem hiding this comment.
Pull request overview
This PR fixes CI flakiness in pkg/grpcbroker by restructuring TestReady_BlocksUntilRunStarts to avoid a shared short deadline across multiple test phases.
Changes:
- Split the test into two phases with independent contexts (short timeout to prove blocking; long timeout to prove eventual readiness).
- Strengthened assertions to explicitly validate
context.DeadlineExceededfor the pre-Run()Ready()call. - Increased the “failure wait” window to avoid scheduler-related false negatives on slow runners.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
…zure#7079) The test used a single 100ms context for both phases: confirming Ready() blocks (50ms) and then waiting for it to unblock after Run() starts (remaining ~50ms). On slow CI machines, goroutine scheduling could consume the remaining budget. Split into two independent phases: a 50ms context expected to timeout (proving Ready() blocks), then a fresh 5s context for the second Ready() call that completes after Run() starts.
Problem
TestReady_BlocksUntilRunStartsinpkg/grpcbrokerwas flaky in CI. It used a single 100ms context shared across two test phases:Ready()blocks beforeRun()starts.Ready()to unblock afterRun()starts.On slow CI machines (e.g., Windows runners), goroutine scheduling delays could consume the remaining context budget, causing
Ready()to returncontext deadline exceededinstead ofnil(failed build).
Fix
Split the test into two independent phases with separate contexts:
Ready()blocks whenRun()hasn't started.Ready()call that should complete promptly onceRun()starts.This eliminates the shared-deadline race and gives generous time for goroutine scheduling on slow machines.