Resolve flaky timeouts in unit tests #136

CBielstein · 2022-06-13T06:09:52Z

Description

Currently, we execute tests in parallel using our own wall-clock test timeout check to guard against infinite-running tests. However, the parallelism and incomplete test isolation makes the tests impossible to time accurately, resulting in intermittent transient failures.

Specifically, we were facing the following challenges:

Custom synchronous timeout code can be flaky
Running tests in parallel with tests from other classes makes wall-clock timeouts unreliable
Improper AprsIsClient disposal lead to further interactions between tests regressing individual test performances.

The third item in that list was an issue in the interplay between AprsIsClient disposal and mocking. AprsIsClient.Dispose() relied on disposing the underlying TcpConnection object. In production, a disposed object would result in TcpConnection.Connected returning false and terminating the worker task. However, in unit tests, the mocked TcpConnection would return true when querying connected no matter what else happened, resulting in the worker task never terminating. As tests accumulated, background tasks would start to add up and cause timeouts.

To address these issues, the following changes were made:

Switched to using xUnit timeout parameter on async tests and a TaskCompletionSource instead of synchronous polling
Disabled parallelism for tests with timeout (both required by xUnit when using a timeout and a good idea to preserve any sense of meaning for a wall-clock time)
Ensure the background worker task stops by calling Disconnect() from Dispose() in AprsIsClient

This resolves #125.

Changes

Disable parallelism for tests with timeout requirements through the introduction of a new test collection
Remove WaitForCondition method and switch to using xUnit timeout setting
Use await on a TaskCompletionSource instead of polling a specific lambda to know when the test can continue
call AprsIsClient.Disconnect() as part of AprsIsClient.Dispose() to ensure background task terminates
Removed unnecessary Thread.Yield() code in AprsIsClient receive task as the stream read call at the top of the loop is blocking, so Thread.Yield() should never be needed.

Validation

Tests now run in milliseconds instead of seconds repeatedly
Will repeat tests in GitHub Actions as well for verification

This reverts commit 09b060d.

This reverts commit 224fd64.

This reverts commit 7e69e09.

CBielstein added 4 commits June 12, 2022 22:06

Shorten timeouts to ensure local repro.

ddfdbda

Switch to xunit timeouts.

8b69bd7

Make WaitForCondition async

96f7d9c

Remove untimed tests from collection.

21c04d6

CBielstein added the bug Something isn't working label Jun 13, 2022

CBielstein self-assigned this Jun 13, 2022

CBielstein added 7 commits June 16, 2022 22:28

Switch from polling to TaskCompletionSource

8c16054

Use same namespace for the fixture.

b800277

Switch to async task for receive, avoid unnecessary yield.

0aabe8b

Ensure AprsIsClient is disposed after each test.

7e69e09

Switch dispose to base class.

224fd64

Move condition wait and parallel disable to base class.

09b060d

Switch receive task back to synchronous as async is unnecessary here.

86497dc

CBielstein changed the title ~~Switch to using xUnit timeout for tests~~ Resolve flaky timeouts in unit tests Jun 17, 2022

CBielstein marked this pull request as ready for review June 17, 2022 06:37

CBielstein requested a review from eddywine as a code owner June 17, 2022 06:37

CBielstein added 5 commits June 19, 2022 14:52

Revert "Move condition wait and parallel disable to base class."

09983cf

This reverts commit 09b060d.

Revert "Switch dispose to base class."

6908f91

This reverts commit 224fd64.

Revert "Ensure AprsIsClient is disposed after each test."

c1dd8fa

This reverts commit 7e69e09.

Ensure client disconnects when disposed.'

aca1321

Reduce timeout.

c05f21b

CBielstein merged commit 4b58fb6 into main Jun 19, 2022

CBielstein deleted the issue-125-test-timeout branch June 19, 2022 22:10

CBielstein mentioned this pull request Jun 19, 2022

WeatherInfo separates out additional information in the comment field for easy reference #105

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve flaky timeouts in unit tests #136

Resolve flaky timeouts in unit tests #136

CBielstein commented Jun 13, 2022 •

edited

Resolve flaky timeouts in unit tests #136

Resolve flaky timeouts in unit tests #136

Conversation

CBielstein commented Jun 13, 2022 • edited

Description

Changes

Validation

CBielstein commented Jun 13, 2022 •

edited