Build: Add test-retry-gradle-plugin to retry flaky tests by nastra · Pull Request #3030 · apache/iceberg

nastra · 2021-08-26T08:47:57Z

I was curious whether this would bring any value to the project as it seems we occasionally run into flaky tests. The latest flaky failure that happened in one of my PRs is reported in #3033 (which passes locally).

The test-retry-gradle-plugin causes failed tests to be retried within the same task. After executing all tests, any failed tests are retried. The process repeats with tests that continue to fail until the maximum specified number of retries has been attempted, or there are no more failing tests.

By default, all failed tests passing on retry prevents the test task from failing. This mode prevents flaky tests from causing build failure. This setting can be changed so that flaky tests cause build failure, which can be used to identify flaky tests.

When something goes badly wrong and all tests start failing, it can be preferable to not keep retrying tests. The plugin can be configured to stop retrying after a certain number of total test failures.

In case we don't want to retry everything in general, there's also the option to only retry a given set of tests, which can be specified via includeClasses / includeAnnotationClasses.

pvary · 2021-08-26T11:35:35Z

I would recommend against retrying flaky tests.
In Hive we spent some time on cleaning flaky tests several years ago, and in almost all of the cases it turned out to be a bug which only manifested in some edge cases.
IMHO we should fix, or disable the flaky tests and not waste resources on running them and creating a false security that the code is tested.

rdblue · 2021-08-26T15:31:22Z

I strongly agree with @pvary. Flaky tests should be considered bugs in themselves.

nastra · 2021-08-27T07:03:10Z

I absolutely agree with you that flaky tests should be treated like bugs. The reason I brought up this retrying is that non-committers need to seek help from committers to restart CI when it fails due to potential flakiness in other parts that they haven't touched. Therefore it seemed reasonable to me to let CI run through and retry those flaky ones (and report them).

nastra · 2021-08-27T07:26:55Z

How do we want to treat test failures in the future? Do we want to open a ticket when somebody runs into a non-related failing test + disable the test in the meantime until it's fixed?

pvary · 2021-08-27T07:42:57Z

How do we want to treat test failures in the future? Do we want to open a ticket when somebody runs into a non-related failing test + disable the test in the meantime until it's fixed?

We have a smaller, more tighter knit community here, and we usually were able to discuss and fix the issues as they were surfaced. Usually if someone found an issue, then they tried to fix it, or wrote to the dev list / created an issue and we were able to fix those without a formal process.

In Hive we have much bigger codebase and bigger number of contributors, so we have a more formal process:

We have a jenkins job which runs a specific test case 100 times, and if any of the runs fail we call the test flaky - we use it to confirm a flaky tests
If a test is confirmed flaky, we disable it (even without test runs)
We create a jira "Reenable flaky test: ...." and try to ping users who were working on the tests lately

OTOH if we have a better codebase, like we have here in Iceberg, we might be better off identifying the commit which caused the flakiness and reverting it (or if the issue is not caused by the test, then fixing the bug).

kbendick · 2021-08-27T21:33:26Z

How do we want to treat test failures in the future? Do we want to open a ticket when somebody runs into a non-related failing test + disable the test in the meantime until it's fixed?

Wanted to point out that I believe CI can be forced to rerun by closing and reopening the PR.

rdblue · 2021-08-27T22:15:37Z

Yes, reopening a PR will retrigger tests.

Build: Add test-retry-gradle-plugin to retry flaky tests

202f75a

github-actions bot added the build label Aug 26, 2021

nastra force-pushed the retry-on-test-flakiness branch from dcea03d to 202f75a Compare August 26, 2021 08:48

nastra requested a review from rdblue August 26, 2021 08:53

nastra closed this Sep 14, 2021

nastra deleted the retry-on-test-flakiness branch September 14, 2021 06:23

rakeshsayini mentioned this pull request May 18, 2021

Feature Request: To query iceberg tables from BI tools(like tableau,DBVisualizer) using jdbc/odbc connectors #2605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: Add test-retry-gradle-plugin to retry flaky tests#3030

Build: Add test-retry-gradle-plugin to retry flaky tests#3030
nastra wants to merge 1 commit intoapache:masterfrom
nastra:retry-on-test-flakiness

nastra commented Aug 26, 2021 •

edited

Loading

Uh oh!

pvary commented Aug 26, 2021

Uh oh!

rdblue commented Aug 26, 2021

Uh oh!

nastra commented Aug 27, 2021

Uh oh!

nastra commented Aug 27, 2021

Uh oh!

pvary commented Aug 27, 2021

Uh oh!

kbendick commented Aug 27, 2021

Uh oh!

rdblue commented Aug 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nastra commented Aug 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Aug 26, 2021

Uh oh!

rdblue commented Aug 26, 2021

Uh oh!

nastra commented Aug 27, 2021

Uh oh!

nastra commented Aug 27, 2021

Uh oh!

pvary commented Aug 27, 2021

Uh oh!

kbendick commented Aug 27, 2021

Uh oh!

rdblue commented Aug 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nastra commented Aug 26, 2021 •

edited

Loading