Track test failure statistics across all PRs

The flaky test dashboard has been invaluable both in improving the stability of our CI, and as a PR author, in identifying which test failures are real and which can be ignored since they're flaky.

However, the test dashboard only reflects builds on main. Main is only a small percentage of all the builds we run. There are plenty of tests that are flaky but haven't failed on main recently.

Could we record outcomes of every test run for PRs (maybe into a separate database)? Write concurrency to sqlite could be tricky but hopefully we can figure something out.

This would increase the amount of data we have on test failures by an order of magnitude. More data is very helpful when trying to measure something with nondeterministic behavior (like flaky tests).

Obviously, many test failures in branches are just due to changes in that branch, so the data within a single branch wouldn't be very meaningful. But I have a feeling that common failures _across_ branches are an important signal.

With more data, we could:
1. better identify the failure rate of some tests, and the priority of fixing them (how frequently do they fail? how widely do they fail? what's the chance of a given PR encountering this failure?)
1. better visualize test failures: this failure rate data would make for a great dashboard, especially with a toggle between "only main" and "all PRs"
1. better identify when a particular test started failing (if a test fails on main, search for the first build on any branch where it failed—that's probably closest to the commit that actually broke it). This could help a lot in trying to identify the responsible PR in https://github.com/dask/distributed/issues/6969.
1. better identify problematic tests: if the same test fails in multiple (say >3) different PRs within a time window (1 week?), it's probably flaky (versus related to the changes made in those PRs), even if it hasn't failed on main yet. It could then also qualify for getting an issue opened: https://github.com/dask/distributed/issues/6969.

cc @fjetter @hendrikmakait @ian-r-rose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track test failure statistics across all PRs #6970

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Track test failure statistics across all PRs #6970

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions