Build: Add test-retry-gradle-plugin to retry flaky tests#3030
Build: Add test-retry-gradle-plugin to retry flaky tests#3030nastra wants to merge 1 commit intoapache:masterfrom
Conversation
dcea03d to
202f75a
Compare
|
I would recommend against retrying flaky tests. |
|
I strongly agree with @pvary. Flaky tests should be considered bugs in themselves. |
|
I absolutely agree with you that flaky tests should be treated like bugs. The reason I brought up this retrying is that non-committers need to seek help from committers to restart CI when it fails due to potential flakiness in other parts that they haven't touched. Therefore it seemed reasonable to me to let CI run through and retry those flaky ones (and report them). |
|
How do we want to treat test failures in the future? Do we want to open a ticket when somebody runs into a non-related failing test + disable the test in the meantime until it's fixed? |
We have a smaller, more tighter knit community here, and we usually were able to discuss and fix the issues as they were surfaced. Usually if someone found an issue, then they tried to fix it, or wrote to the dev list / created an issue and we were able to fix those without a formal process. In Hive we have much bigger codebase and bigger number of contributors, so we have a more formal process:
OTOH if we have a better codebase, like we have here in Iceberg, we might be better off identifying the commit which caused the flakiness and reverting it (or if the issue is not caused by the test, then fixing the bug). |
Wanted to point out that I believe CI can be forced to rerun by closing and reopening the PR. |
|
Yes, reopening a PR will retrigger tests. |
I was curious whether this would bring any value to the project as it seems we occasionally run into flaky tests. The latest flaky failure that happened in one of my PRs is reported in #3033 (which passes locally).
The test-retry-gradle-plugin causes failed tests to be retried within the same task. After executing all tests, any failed tests are retried. The process repeats with tests that continue to fail until the maximum specified number of retries has been attempted, or there are no more failing tests.
By default, all failed tests passing on retry prevents the test task from failing. This mode prevents flaky tests from causing build failure. This setting can be changed so that flaky tests cause build failure, which can be used to identify flaky tests.
When something goes badly wrong and all tests start failing, it can be preferable to not keep retrying tests. The plugin can be configured to stop retrying after a certain number of total test failures.
In case we don't want to retry everything in general, there's also the option to only retry a given set of tests, which can be specified via
includeClasses/includeAnnotationClasses.