Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test slepc/step-36_parallel_02: very rare test failures #15758

Closed
tamiko opened this issue Jul 17, 2023 · 3 comments
Closed

Test slepc/step-36_parallel_02: very rare test failures #15758

tamiko opened this issue Jul 17, 2023 · 3 comments
Milestone

Comments

@tamiko
Copy link
Member

tamiko commented Jul 17, 2023

[start automated regression testsuite report]

Dear all,

this is the automated regression testsuite reporting a new regression between

Summary:

I have identified the following pull requests as possible candidates:

Notes:

  • The regression reported above is a subset of the following full set of regressions compared to the baseline:
  • I will close and unpin this issue automatically once a full run is complete and compares cleanly to the baseline.
  • If closed I will reopen the issue if the testsuite run identified an additional regression compared to what I have reported so far.

[end automated regression testsuite report]

@tamiko tamiko added Regression tester Issues reported by the regression tester bot High Priority ⚠️ labels Jul 17, 2023
@tamiko tamiko pinned this issue Jul 17, 2023
@masterleinad
Copy link
Member

It seems very unlikely that #15757 causes a SLEPc test to fail.

@tamiko
Copy link
Member Author

tamiko commented Jul 17, 2023

@masterleinad The configuration in question doesn't even have ArborX installed. There was also no change in userland (I didn't upgrade anything). It seems that one of the solvers just randomly returned a slightly different spectrum.

But there is literally no code change between the current and previous run. So if this is really a random failure mode then we're talking about a rate of 1 in 1000 or less.

Rerunning the test configuration in question...

@tamiko tamiko unpinned this issue Jul 17, 2023
@tamiko tamiko removed the Regression tester Issues reported by the regression tester bot label Jul 17, 2023
@tamiko tamiko changed the title Regression tester regressed 34ae3d Test slepc/step-36_parallel_02: very rare test failures Jul 17, 2023
@tamiko tamiko added this to the Release 9.6 milestone Jul 17, 2023
@tamiko
Copy link
Member Author

tamiko commented Jul 17, 2023

@masterleinad And sure enough - rerunning the test and it succeeds. I will try to "reproduce" in a quiet minute sometime this week which will be tricky. My best guess is some undefined behavior condition within slepcs + just the right scheduling of the tasks on cores. Maybe UBSAN can shed some light.

@tamiko tamiko closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants