Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve flaky timeouts in unit tests #136

Merged
merged 16 commits into from
Jun 19, 2022
Merged

Conversation

CBielstein
Copy link
Owner

@CBielstein CBielstein commented Jun 13, 2022

Description

Currently, we execute tests in parallel using our own wall-clock test timeout check to guard against infinite-running tests. However, the parallelism and incomplete test isolation makes the tests impossible to time accurately, resulting in intermittent transient failures.

Specifically, we were facing the following challenges:

  1. Custom synchronous timeout code can be flaky
  2. Running tests in parallel with tests from other classes makes wall-clock timeouts unreliable
  3. Improper AprsIsClient disposal lead to further interactions between tests regressing individual test performances.

The third item in that list was an issue in the interplay between AprsIsClient disposal and mocking. AprsIsClient.Dispose() relied on disposing the underlying TcpConnection object. In production, a disposed object would result in TcpConnection.Connected returning false and terminating the worker task. However, in unit tests, the mocked TcpConnection would return true when querying connected no matter what else happened, resulting in the worker task never terminating. As tests accumulated, background tasks would start to add up and cause timeouts.

To address these issues, the following changes were made:

  1. Switched to using xUnit timeout parameter on async tests and a TaskCompletionSource instead of synchronous polling
  2. Disabled parallelism for tests with timeout (both required by xUnit when using a timeout and a good idea to preserve any sense of meaning for a wall-clock time)
  3. Ensure the background worker task stops by calling Disconnect() from Dispose() in AprsIsClient

This resolves #125.

Changes

  • Disable parallelism for tests with timeout requirements through the introduction of a new test collection
  • Remove WaitForCondition method and switch to using xUnit timeout setting
  • Use await on a TaskCompletionSource instead of polling a specific lambda to know when the test can continue
  • call AprsIsClient.Disconnect() as part of AprsIsClient.Dispose() to ensure background task terminates
  • Removed unnecessary Thread.Yield() code in AprsIsClient receive task as the stream read call at the top of the loop is blocking, so Thread.Yield() should never be needed.

Validation

  • Tests now run in milliseconds instead of seconds repeatedly
  • Will repeat tests in GitHub Actions as well for verification

@CBielstein CBielstein added the bug Something isn't working label Jun 13, 2022
@CBielstein CBielstein self-assigned this Jun 13, 2022
@CBielstein CBielstein changed the title Switch to using xUnit timeout for tests Resolve flaky timeouts in unit tests Jun 17, 2022
@CBielstein CBielstein marked this pull request as ready for review June 17, 2022 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update test timeout from custom WaitForCondition to xunit timeout
1 participant