New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix race condition in build wait timers for substantiated workers #6344
fix race condition in build wait timers for substantiated workers #6344
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6344 +/- ##
==========================================
- Coverage 91.99% 91.89% -0.10%
==========================================
Files 335 335
Lines 37583 37582 -1
==========================================
- Hits 34573 34536 -37
- Misses 3010 3046 +36
Continue to review full report at Codecov.
|
master/buildbot/worker/latent.py
Outdated
@@ -236,7 +236,9 @@ def substantiate(self, wfb, build): | |||
return defer.succeed(False) | |||
|
|||
if self.state == States.SUBSTANTIATED and self.conn is not None: | |||
self._setBuildWaitTimer() | |||
# Skip the wait timer if a build is already queued | |||
if not self.building: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is equivalent to deleting self._setBuildWaitTimer()
. self.building checks whether at least one of WorkerForBuilder instances associated with the worker are busy (via isBusy
). They will necessarily be busy, because at least one got its buildStarted
called in the path to Build.startBuild
which eventually indirectly calls LatentWorker.substantiate()
.
However, this also means that _setBuildWaitTimer
is completely unnecessary here. If buildStarted
is guaranteed to be called when we are here, then regardless of what happens buildFinished
will be called too for that particular build. That is the place for _setBuildWaitTimer
which we already do there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, thanks for the PR, it identified an important bug and did 95% of the work fixing it. I'll finish the polishing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, thanks. Do you think self.building
/ isBusy
should be a part of the state machine? I found that lack of information in the worker's state to be terribly confusing to debug, which led to me exploring way too much of the call tree to find the root issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.building
is quite separate from the internal state machine in that the states aren't dependent on each other. If we merge them then the state machine would grow double in size.
I agree that the latent worker itself is quite confusing. It became such once all edge cases got covered. It seems that there should be a way to refactor everything for simplicity, but nobody found any such way yet.
d191fa8
to
05b507b
Compare
Calling _setBuildWaitTimer() here in substantiate() may cause the worker to be insubstantiated later while build is running. Three places where the timer is reset are _setBuildWaitTimer() itself, stopService() and buildStarted(). The first two are not interesting and buildStarted() has already been called for a particular build when we are in substantiate(): - buildStarted() is called via WorkerForBuilder.buildStarted() which is called from Builder._startBuildFor() - substantiate() is called via LatentWorkerForBuilder.substantiate_if_needed() via Build.startBuild() from Builder._startBuildFor(). Once the current build is finished for any reason, _setBuildWaitTimer() will be called from buildFinished(). Thus calling _setBuildWaitTimer() from substantiate() is not needed. Commit adjusted by: Povilas Kanapickas <povilas@radix.lt>
05b507b
to
1592ac2
Compare
This PR fixes #5988. |
Patched an issue where latent workers that have already been substantiated will start a
build_wait_timer
during substantiation of a new build. This timer starts after thebuild_wait_timer
is cleared from a new build arriving, disconnecting the workerbuild_wait_timeout
seconds later, mid-build or not.Contributor Checklist:
newsfragments
directory (and read theREADME.txt
in that directory)