Skip to content

Track test failure statistics across all PRs #6970

@gjoseph92

Description

@gjoseph92

The flaky test dashboard has been invaluable both in improving the stability of our CI, and as a PR author, in identifying which test failures are real and which can be ignored since they're flaky.

However, the test dashboard only reflects builds on main. Main is only a small percentage of all the builds we run. There are plenty of tests that are flaky but haven't failed on main recently.

Could we record outcomes of every test run for PRs (maybe into a separate database)? Write concurrency to sqlite could be tricky but hopefully we can figure something out.

This would increase the amount of data we have on test failures by an order of magnitude. More data is very helpful when trying to measure something with nondeterministic behavior (like flaky tests).

Obviously, many test failures in branches are just due to changes in that branch, so the data within a single branch wouldn't be very meaningful. But I have a feeling that common failures across branches are an important signal.

With more data, we could:

  1. better identify the failure rate of some tests, and the priority of fixing them (how frequently do they fail? how widely do they fail? what's the chance of a given PR encountering this failure?)
  2. better visualize test failures: this failure rate data would make for a great dashboard, especially with a toggle between "only main" and "all PRs"
  3. better identify when a particular test started failing (if a test fails on main, search for the first build on any branch where it failed—that's probably closest to the commit that actually broke it). This could help a lot in trying to identify the responsible PR in Automatically open issues for failing tests on main #6969.
  4. better identify problematic tests: if the same test fails in multiple (say >3) different PRs within a time window (1 week?), it's probably flaky (versus related to the changes made in those PRs), even if it hasn't failed on main yet. It could then also qualify for getting an issue opened: Automatically open issues for failing tests on main #6969.

cc @fjetter @hendrikmakait @ian-r-rose

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprove existing functionality or make things work bettertestsUnit tests and/or continuous integration

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions