-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exile job for unreliable tests. #3781
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (I've requested a small clarification, though).
- nbbuild/travis/ant.sh $OPTS build | ||
script: | ||
- travis_retry ant $OPTS -f java/java.lsp.server test | ||
- travis_retry hide-logs.sh ant $OPTS -f java/gradle.java test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "hide-logs.sh" filter in the second line will also show the failed tests of the previous "java/java.lsp.server" test, right? I understand this is on purpose, not a typo, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know actually. hide-logs.sh
was there before, i just moved the line to a different job. I assumed it being a streaming filter which removes some spammy log lines to make CI not error - but I haven't actually looked at the script.
Part of this test is me figuring out how practical travis_retry
is. If you think this is a problem i can remove it. Main intention was to isolate bad behaving tests so that they would be in a short(er) restartable job, separated from reliable tests.
This is also somewhat of a contingency plan in case the tests can't be stabilized despite all efforts. I didn't plan to merge this if it turns out that the pending reliability PRs fix the problem.
But if you want I can merge it anyway (after your PRs are integrated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. "hide-logs.sh" is quite useful, I think.
Regarding retries: if tests are reliable there's no need to retry them, right? :-).
I think we should try to achieve a "Green Travis" ASAP, no need to wait por PRs. There're system-wide refactorings taking place, and we have no safety net.
But maybe we want to discuss in the dev mailing list before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding retries: if tests are reliable there's no need to retry them, right? :-).
right. Thats why I wanted to wait till your PR is integrated to check how the improved test behaves.
Btw travis_retry
is used in 14 other places in that travis.yml.
@matthiasblaesing @JaroslavTulach @sdedic should we give this a try? |
I would say lets try it. If it helps great, if not it was an idea. |
Apache NetBeans project needs to do something with the unreliable tests. Adding Ideally the owners of the tests (@sdedic & other OracleLabs guys as |
awesome. I wanted to make sure everyone knows that some tests might be in a different job now. |
@JaroslavTulach re our schedule, the bugfixing period should start next week. I hope I'll be able to address at least the Gradle/java one. I've already found some race - but apparently not all of them. |
+1, but: The purpose of pre-commit CI validation is: prevent others to cause regressions. Keeping unreliable tests in the shared CI pool undermines the main purpose - it only annoys others as they are being punished by your own incapability to stabilize your tests. Set a private job (like vscode did with I fully understand why engineers don't want to disable tests, but consider the harm caused by keeping annoying randomly failing tests in the shared CI pool. |
lets give this a try to make job 13 more reliable
travis_retry
logs from a different PR where i had to restart job 13 four times in a row: