-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing a flaky test #924
Fixing a flaky test #924
Conversation
fix lint issue, otherwise looks good |
demo/lib/orchestrator.go
Outdated
var started bool | ||
for trial < 5 { | ||
for trial := 0; trial < 5; trial += 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have triggered some extra flakiness in our CI and tests!
Now the "TestLocalOrchestration" test is failing almost always.
We were previously never incrementing the "trial" counter in the loop, so it would keep running non-stop without failing, either until the daemon recovered, or would just "let the other nodes" do their jobs and it used to work thanks to the fact we can support some failure as long as we reached our threshold...
😢
Notice that this just makes sure it fails faster when it actually fails.
This should fix one case where the test
TestRunDKGReshareTimeout
would fail because its daemon weren't yet properly running while the test would carry on.Here is the typical output one would get:
This is now patched by properly waiting on the Status of each Daemon to be "isRunning".
We could also be checking if they are "isStarted", but it's not exactly at the same place in the code path and we need to clarify if we really need so many states or if we could do with just one state replacing both of these.
Notice this does not appear to be the same issue as in #922 sadly, but I've added a comment about a potentially unintended AdvanceMockClock that could maybe be related. I'll have to come back to studying if this test is really properly working later on.
Currently this is just patching the flaky CI, hopefully.