Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add solution for quarantining flaky tests #9427

Closed

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Feb 2, 2021

Motivation

Pulsar contains a large amount of flaky tests. It will be useful to have a solution for quarantining tests.

This solution takes inspiration from Airflow's Quarantined tests solution. Quarantining (or categorizing) flaky tests has been suggested by @potiuk in the draft PIP "Changes to GitHub Actions based Pulsar CI" and also by @dlg99 in in the draft PIP "Changes to flaky test handling". Thank you @potiuk and @dlg99 for suggesting this.

Modifications

  • Adds @Quarantined annotation for quarantining flaky tests

  • Implements TestNG IAnnotationTransformer for integrating with TestNG in QuarantinedTestNGAnnotationTransformer.

    • this class is configured as the TestNG listener in the maven build
  • Quarantined tests are not run by default.

  • Quarantined tests are executed when runQuarantinedTests system property is set to true

    • this is similar feature as Airflow's --include-quarantined flag
  • No other tests than Quarantined tests are executed when runOnlyQuarantinedTests
    system property is set to true.

    • this is similar feature as Airflow CI's -m quarantined flag

- Inspired by Airflow's Quarantined tests solution,
  https://github.com/apache/airflow/blob/master/TESTING.rst#quarantined-tests
- Quarantined tests are not run by default.
- Quarantined tests are executed when "runQuarantinedTests" system property is set to true
  - this is similar feature as Airflow's "--include-quarantined" flag
- No other tests than Quarantined tests are executed when "runOnlyQuarantinedTests"
  system property is set to true.
  - this is similar feature as Airflow CI's "-m quarantined" flag
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 with this framework, but we must create a high priority issue for any test marked as Quarantined

otherwise we are going to lose code coverage.

Code coverage with flaky tests is not very meaningful...so it is okay to disable them.
probably we could have a separate CI job, not bound to PRs that runs quarantined tests

@lhotari
Copy link
Member Author

lhotari commented Feb 2, 2021

probably we could have a separate CI job, not bound to PRs that runs quarantined tests

Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.

@potiuk
Copy link
Member

potiuk commented Feb 2, 2021

probably we could have a separate CI job, not bound to PRs that runs quarantined tests

Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.

Yes we have in airflow a separate job. Failure of that job does not stop anything (it has continue-on-error: true set: https://github.com/apache/airflow/blob/6fd93badaa86d5fd53a8dd9858467ab7e85208a6/.github/workflows/ci.yml#L738 ). But if those Quarantined tests fail, the whole job gets "red" status, however you can clearly see that those are the quarantined tests that failed, not the "stable" ones. And when you see they are not related to a change you can still merge it.

Since those tests are flaky but not "broken", they do succeed more often than they don't so we notice and can react if we see that those tests in "Quarantine" start to fail consistently. This is easy to spot actually if you are active committer who merges a number of commits a day/week.

We also have a separate workflow which is "scheduled" and only runs quarantined tests, but this did not work as well as I wanted. We were submitting status of such quarantined tests to apache/airflow#10118 automatically (last 20 runs were kept there). This however turned out to be flaky on its own.
Some of the flaky tests simply hang and then they make this exercise a bit pointless. Also the flaky tests tend to be less flaky when run in isolation so there are tests that always succeed when run separately. We call them Heisentests (akin to https://en.wikipedia.org/wiki/Heisenbug) and we actually introduced another "marker" for those (@heisentests) and they are always run in isolation.

Also yeah - we have a bug for every quarantined test: https://github.com/apache/airflow/issues?q=is%3Aopen+is%3Aissue+label%3AQuarantine but - due to 2.0.0 release (2 years in the making) and follow up 2.0.1 we are working on on where we fix teething problems of 2.0.0 those are a little neglected - they are all part of the "2.0 Cleanup milestone" and I actually hope to get them fixed as soon as we release 2.0.1

@potiuk
Copy link
Member

potiuk commented Feb 2, 2021

BTW. Glad that you liked the idea :). And the name for those tests is very, very appropriate taking into account the external circumstances ;)

@sijie sijie added this to the 2.8.0 milestone Feb 3, 2021
@sijie sijie added the area/test label Feb 3, 2021
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli
Copy link
Contributor

@lhotari can you please merge with current master and Kickstart CI again?
It would be useful to merge this patch soon and move forward with CI cleanup

@lhotari lhotari marked this pull request as draft February 11, 2021 07:19
@lhotari
Copy link
Member Author

lhotari commented Feb 11, 2021

@lhotari can you please merge with current master and Kickstart CI again?
It would be useful to merge this patch soon and move forward with CI cleanup

I'm postponing this PR until I have time to continue with this. It doesn't make sense to merge this in it's current form. Matteo provided some detailed requirements for the quarantined tests solution in the last Pulsar Community meeting. One of the ideas was to have a solution where the retries could be kept for quarantined tests. The rational behind this is that even though some of the tests are flaky, they provide some value to the reviewer. Removing the tests completely would leave a gap and if the tests are flaky, they won't be useful without retries.

Having this kind of hybrid solution in place would require that the quarantined test results are shown on the PR to the reviewer in a certain format. GitHub Actions has this feature called "annotations" which means that you can add custom test results that are displayed in the results for the test runs all the way on the PR. It would be necessary to revisit the solution and take this into account before merging the changes in this PR.

I'm making progress on the CI clean, I'll hopefully have some public updates later today or tomorrow.

@eolivelli
Copy link
Contributor

@lhotari I see. thanks for the explanation

@lhotari lhotari closed this Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants