Add solution for quarantining flaky tests #9427

lhotari · 2021-02-02T15:40:54Z

Motivation

Pulsar contains a large amount of flaky tests. It will be useful to have a solution for quarantining tests.

This solution takes inspiration from Airflow's Quarantined tests solution. Quarantining (or categorizing) flaky tests has been suggested by @potiuk in the draft PIP "Changes to GitHub Actions based Pulsar CI" and also by @dlg99 in in the draft PIP "Changes to flaky test handling". Thank you @potiuk and @dlg99 for suggesting this.

Modifications

Adds @Quarantined annotation for quarantining flaky tests
Implements TestNG IAnnotationTransformer for integrating with TestNG in QuarantinedTestNGAnnotationTransformer.
- this class is configured as the TestNG listener in the maven build
Quarantined tests are not run by default.
Quarantined tests are executed when runQuarantinedTests system property is set to true
- this is similar feature as Airflow's --include-quarantined flag
No other tests than Quarantined tests are executed when runOnlyQuarantinedTests
system property is set to true.
- this is similar feature as Airflow CI's -m quarantined flag

- Inspired by Airflow's Quarantined tests solution, https://github.com/apache/airflow/blob/master/TESTING.rst#quarantined-tests - Quarantined tests are not run by default. - Quarantined tests are executed when "runQuarantinedTests" system property is set to true - this is similar feature as Airflow's "--include-quarantined" flag - No other tests than Quarantined tests are executed when "runOnlyQuarantinedTests" system property is set to true. - this is similar feature as Airflow CI's "-m quarantined" flag

eolivelli

I am +1 with this framework, but we must create a high priority issue for any test marked as Quarantined

otherwise we are going to lose code coverage.

Code coverage with flaky tests is not very meaningful...so it is okay to disable them.
probably we could have a separate CI job, not bound to PRs that runs quarantined tests

lhotari · 2021-02-02T15:53:38Z

probably we could have a separate CI job, not bound to PRs that runs quarantined tests

Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.

potiuk · 2021-02-02T19:12:18Z

probably we could have a separate CI job, not bound to PRs that runs quarantined tests

Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.

Yes we have in airflow a separate job. Failure of that job does not stop anything (it has continue-on-error: true set: https://github.com/apache/airflow/blob/6fd93badaa86d5fd53a8dd9858467ab7e85208a6/.github/workflows/ci.yml#L738 ). But if those Quarantined tests fail, the whole job gets "red" status, however you can clearly see that those are the quarantined tests that failed, not the "stable" ones. And when you see they are not related to a change you can still merge it.

Since those tests are flaky but not "broken", they do succeed more often than they don't so we notice and can react if we see that those tests in "Quarantine" start to fail consistently. This is easy to spot actually if you are active committer who merges a number of commits a day/week.

We also have a separate workflow which is "scheduled" and only runs quarantined tests, but this did not work as well as I wanted. We were submitting status of such quarantined tests to apache/airflow#10118 automatically (last 20 runs were kept there). This however turned out to be flaky on its own.
Some of the flaky tests simply hang and then they make this exercise a bit pointless. Also the flaky tests tend to be less flaky when run in isolation so there are tests that always succeed when run separately. We call them Heisentests (akin to https://en.wikipedia.org/wiki/Heisenbug) and we actually introduced another "marker" for those (@heisentests) and they are always run in isolation.

Also yeah - we have a bug for every quarantined test: https://github.com/apache/airflow/issues?q=is%3Aopen+is%3Aissue+label%3AQuarantine but - due to 2.0.0 release (2 years in the making) and follow up 2.0.1 we are working on on where we fix teething problems of 2.0.0 those are a little neglected - they are all part of the "2.0 Cleanup milestone" and I actually hope to get them fixed as soon as we release 2.0.1

potiuk · 2021-02-02T19:15:59Z

BTW. Glad that you liked the idea :). And the name for those tests is very, very appropriate taking into account the external circumstances ;)

dlg99

LGTM

eolivelli · 2021-02-11T07:10:48Z

@lhotari can you please merge with current master and Kickstart CI again?
It would be useful to merge this patch soon and move forward with CI cleanup

lhotari · 2021-02-11T07:28:10Z

@lhotari can you please merge with current master and Kickstart CI again?
It would be useful to merge this patch soon and move forward with CI cleanup

I'm postponing this PR until I have time to continue with this. It doesn't make sense to merge this in it's current form. Matteo provided some detailed requirements for the quarantined tests solution in the last Pulsar Community meeting. One of the ideas was to have a solution where the retries could be kept for quarantined tests. The rational behind this is that even though some of the tests are flaky, they provide some value to the reviewer. Removing the tests completely would leave a gap and if the tests are flaky, they won't be useful without retries.

Having this kind of hybrid solution in place would require that the quarantined test results are shown on the PR to the reviewer in a certain format. GitHub Actions has this feature called "annotations" which means that you can add custom test results that are displayed in the results for the test runs all the way on the PR. It would be necessary to revisit the solution and take this into account before merging the changes in this PR.

I'm making progress on the CI clean, I'll hopefully have some public updates later today or tomorrow.

eolivelli · 2021-02-11T07:32:07Z

@lhotari I see. thanks for the explanation

eolivelli approved these changes Feb 2, 2021

View reviewed changes

sijie assigned lhotari Feb 3, 2021

sijie approved these changes Feb 3, 2021

View reviewed changes

sijie added this to the 2.8.0 milestone Feb 3, 2021

sijie added the area/test label Feb 3, 2021

dlg99 approved these changes Feb 11, 2021

View reviewed changes

lhotari marked this pull request as draft February 11, 2021 07:19

lhotari closed this Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add solution for quarantining flaky tests #9427

Add solution for quarantining flaky tests #9427

lhotari commented Feb 2, 2021

eolivelli left a comment

lhotari commented Feb 2, 2021

potiuk commented Feb 2, 2021

potiuk commented Feb 2, 2021

dlg99 left a comment

eolivelli commented Feb 11, 2021

lhotari commented Feb 11, 2021

eolivelli commented Feb 11, 2021

Add solution for quarantining flaky tests #9427

Add solution for quarantining flaky tests #9427

Conversation

lhotari commented Feb 2, 2021

Motivation

Modifications

eolivelli left a comment

Choose a reason for hiding this comment

lhotari commented Feb 2, 2021

potiuk commented Feb 2, 2021

potiuk commented Feb 2, 2021

dlg99 left a comment

Choose a reason for hiding this comment

eolivelli commented Feb 11, 2021

lhotari commented Feb 11, 2021

eolivelli commented Feb 11, 2021