exile job for unreliable tests. #3781

mbien · 2022-03-15T01:05:11Z

lets give this a try to make job 13 more reliable

identify unreliable tests
isolate them into their own job (easier restartable without having to run well behaving tests multiple times)
maybe use travis_retry

logs from a different PR where i had to restart job 13 four times in a row:

https://app.travis-ci.com/github/apache/netbeans/jobs/563140526
org.netbeans.modules.java.lsp.server.explorer.ProjectViewTest

      failed: testOnceExistedDeletedItemInfo

org.netbeans.modules.java.lsp.server.project.LspBrokenReferencesImplTest

      failed: testIgnoredErrorsTimeout

org.netbeans.modules.java.source.parsing.PartialReparseTest

      failed: testIntroduceParseError2
---
https://app.travis-ci.com/github/apache/netbeans/jobs/563138079
org.netbeans.modules.java.lsp.server.explorer.ProjectViewTest

      failed: testEmptyPackageRemains
---
https://app.travis-ci.com/github/apache/netbeans/jobs/563294404
org.netbeans.modules.java.lsp.server.project.LspBrokenReferencesImplTest

      failed: testIgnoredErrorsTimeout
---
https://app.travis-ci.com/github/apache/netbeans/jobs/563294404
org.netbeans.modules.java.lsp.server.explorer.ProjectViewTest

      failed: testProjectSourcePackages

vieiro

Looks good to me (I've requested a small clarification, though).

vieiro · 2022-03-15T05:32:28Z

.travis.yml

+            - nbbuild/travis/ant.sh $OPTS build
+          script:
+            - travis_retry ant $OPTS -f java/java.lsp.server test
+            - travis_retry hide-logs.sh ant $OPTS -f java/gradle.java test


The "hide-logs.sh" filter in the second line will also show the failed tests of the previous "java/java.lsp.server" test, right? I understand this is on purpose, not a typo, right?

I don't know actually. hide-logs.sh was there before, i just moved the line to a different job. I assumed it being a streaming filter which removes some spammy log lines to make CI not error - but I haven't actually looked at the script.

Part of this test is me figuring out how practical travis_retry is. If you think this is a problem i can remove it. Main intention was to isolate bad behaving tests so that they would be in a short(er) restartable job, separated from reliable tests.

This is also somewhat of a contingency plan in case the tests can't be stabilized despite all efforts. I didn't plan to merge this if it turns out that the pending reliability PRs fix the problem.

But if you want I can merge it anyway (after your PRs are integrated).

I see. "hide-logs.sh" is quite useful, I think.
Regarding retries: if tests are reliable there's no need to retry them, right? :-).
I think we should try to achieve a "Green Travis" ASAP, no need to wait por PRs. There're system-wide refactorings taking place, and we have no safety net.
But maybe we want to discuss in the dev mailing list before merging.

Regarding retries: if tests are reliable there's no need to retry them, right? :-).

right. Thats why I wanted to wait till your PR is integrated to check how the improved test behaves.

Btw travis_retry is used in 14 other places in that travis.yml.

mbien · 2022-03-24T05:37:06Z

@matthiasblaesing @JaroslavTulach @sdedic should we give this a try?

matthiasblaesing · 2022-03-24T08:12:09Z

I would say lets try it. If it helps great, if not it was an idea.

JaroslavTulach · 2022-03-24T08:14:40Z

Apache NetBeans project needs to do something with the unreliable tests. Adding @RandomlyFails (e.g. excluding the tests from the CI runs) would be my preferred choice. However I am ready to support any solution that does something with the failures.

Ideally the owners of the tests (@sdedic & other OracleLabs guys as ServerTest and Gradle is observed by the team) should try their best to address the instability. However I haven't noticed any activity in this respect recently. CCing @MartinBalin

mbien · 2022-03-24T08:20:53Z

awesome. I wanted to make sure everyone knows that some tests might be in a different job now.

sdedic · 2022-03-24T08:32:17Z

@JaroslavTulach re our schedule, the bugfixing period should start next week. I hope I'll be able to address at least the Gradle/java one. I've already found some race - but apparently not all of them.

JaroslavTulach · 2022-03-24T08:55:16Z

the bugfixing period should start next week. I hope I'll be able to address at least the Gradle/java one. I've already found some race - but apparently not all of them.

+1, but:

The purpose of pre-commit CI validation is: prevent others to cause regressions. Keeping unreliable tests in the shared CI pool undermines the main purpose - it only annoys others as they are being punished by your own incapability to stabilize your tests.

Set a private job (like vscode did with java/java.lsp.server tests) to run your tests including @RandomlyFails ones. Execute often and perform stabilization of your code & tests there until the code is stable. Meanwhile just annotate every randomly failing test in pre-commit CI as @RandomlyFails ones.

I fully understand why engineers don't want to disable tests, but consider the harm caused by keeping annoying randomly failing tests in the shared CI pool.

mbien added the CI Continuous Integration label Mar 15, 2022

exile job for unreliable tests.

a52bee6

mbien force-pushed the exile-tests branch from 766325e to 3db87ab Compare March 15, 2022 03:54

vieiro approved these changes Mar 15, 2022

View reviewed changes

mbien force-pushed the exile-tests branch from 3db87ab to a52bee6 Compare March 15, 2022 17:03

mbien mentioned this pull request Mar 19, 2022

github actions ci pipeline. #3785

Merged

mbien marked this pull request as ready for review March 23, 2022 17:07

JaroslavTulach self-requested a review March 24, 2022 08:14

JaroslavTulach approved these changes Mar 24, 2022

View reviewed changes

mbien merged commit 0876224 into apache:master Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exile job for unreliable tests. #3781

exile job for unreliable tests. #3781

mbien commented Mar 15, 2022 •

edited

vieiro left a comment

vieiro Mar 15, 2022

mbien Mar 15, 2022

vieiro Mar 15, 2022

mbien Mar 15, 2022

mbien commented Mar 24, 2022

matthiasblaesing commented Mar 24, 2022

JaroslavTulach commented Mar 24, 2022

mbien commented Mar 24, 2022

sdedic commented Mar 24, 2022 •

edited

JaroslavTulach commented Mar 24, 2022

exile job for unreliable tests. #3781

exile job for unreliable tests. #3781

Conversation

mbien commented Mar 15, 2022 • edited

vieiro left a comment

Choose a reason for hiding this comment

vieiro Mar 15, 2022

Choose a reason for hiding this comment

mbien Mar 15, 2022

Choose a reason for hiding this comment

vieiro Mar 15, 2022

Choose a reason for hiding this comment

mbien Mar 15, 2022

Choose a reason for hiding this comment

mbien commented Mar 24, 2022

matthiasblaesing commented Mar 24, 2022

JaroslavTulach commented Mar 24, 2022

mbien commented Mar 24, 2022

sdedic commented Mar 24, 2022 • edited

JaroslavTulach commented Mar 24, 2022

mbien commented Mar 15, 2022 •

edited

sdedic commented Mar 24, 2022 •

edited