New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a deadlock bug related to distinct but equal BuildSteps #3650

Merged
merged 7 commits into from Sep 28, 2017

Conversation

Projects
None yet
3 participants
@egirault
Contributor

egirault commented Sep 24, 2017

If a build factory has a buildstep with a lock, this will trigger a deadlock whenever 3 or more builds of this factory are started at the same time. The main reason is that BaseLock uses the == operator to compare the owners of a given lock. But if owners are BuildSteps, it doesn't work because they inherit from ComparableMixin (#d6c6b550), so two distinct BuildStep objects sharing the same properties will be considered as equal, although they are distinct Python objects: b0 == b1 but b0 is not b1. Because of this, the first two buildsteps acquiring the lock were able to complete, but then the waiting list was empty and the remaining buildsteps were waiting forever. It was not possible to stop them with the Stop button of the web interface; only a master restart stopped them.
We use the is operator instead of == to fix this.

@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Sep 24, 2017

@egirault, thanks for your PR! By analyzing the history of the files in this pull request, we identified @catlee, @rutsky and @tardyp to be potential reviewers.

mention-bot commented Sep 24, 2017

@egirault, thanks for your PR! By analyzing the history of the files in this pull request, we identified @catlee, @rutsky and @tardyp to be potential reviewers.

@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Sep 24, 2017

@egirault, thanks for your PR! By analyzing the history of the files in this pull request, we identified @catlee, @rutsky and @tardyp to be potential reviewers.

mention-bot commented Sep 24, 2017

@egirault, thanks for your PR! By analyzing the history of the files in this pull request, we identified @catlee, @rutsky and @tardyp to be potential reviewers.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Sep 24, 2017

Codecov Report

Merging #3650 into master will increase coverage by 0.04%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3650      +/-   ##
==========================================
+ Coverage    88.3%   88.34%   +0.04%     
==========================================
  Files         323      323              
  Lines       33900    33936      +36     
==========================================
+ Hits        29934    29981      +47     
+ Misses       3966     3955      -11
Impacted Files Coverage Δ
master/buildbot/db/pool.py 87.02% <100%> (+0.09%) ⬆️
master/buildbot/db/buildsets.py 96.09% <100%> (+0.03%) ⬆️
master/buildbot/locks.py 95.97% <100%> (+1.34%) ⬆️
master/buildbot/data/buildsets.py 97.45% <60%> (-1.67%) ⬇️
master/buildbot/process/build.py 92.82% <0%> (+0.03%) ⬆️
master/buildbot/interfaces.py 92.07% <0%> (+0.03%) ⬆️
master/buildbot/worker/docker.py 80% <0%> (+0.1%) ⬆️
worker/buildbot_worker/runprocess.py 75.77% <0%> (+0.18%) ⬆️
master/buildbot/process/remotecommand.py 83.88% <0%> (+0.41%) ⬆️
master/buildbot/schedulers/forcesched.py 86.27% <0%> (+0.89%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b1a73a...161dcb3. Read the comment docs.

codecov bot commented Sep 24, 2017

Codecov Report

Merging #3650 into master will increase coverage by 0.04%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3650      +/-   ##
==========================================
+ Coverage    88.3%   88.34%   +0.04%     
==========================================
  Files         323      323              
  Lines       33900    33936      +36     
==========================================
+ Hits        29934    29981      +47     
+ Misses       3966     3955      -11
Impacted Files Coverage Δ
master/buildbot/db/pool.py 87.02% <100%> (+0.09%) ⬆️
master/buildbot/db/buildsets.py 96.09% <100%> (+0.03%) ⬆️
master/buildbot/locks.py 95.97% <100%> (+1.34%) ⬆️
master/buildbot/data/buildsets.py 97.45% <60%> (-1.67%) ⬇️
master/buildbot/process/build.py 92.82% <0%> (+0.03%) ⬆️
master/buildbot/interfaces.py 92.07% <0%> (+0.03%) ⬆️
master/buildbot/worker/docker.py 80% <0%> (+0.1%) ⬆️
worker/buildbot_worker/runprocess.py 75.77% <0%> (+0.18%) ⬆️
master/buildbot/process/remotecommand.py 83.88% <0%> (+0.41%) ⬆️
master/buildbot/schedulers/forcesched.py 86.27% <0%> (+0.89%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b1a73a...161dcb3. Read the comment docs.

@tardyp

This comment has been minimized.

Show comment
Hide comment
@tardyp

tardyp Sep 24, 2017

Member

Hi @egirault I added an integration test in your branch (wasn't that easy...), which reproduced another issue #3591, and another case of ==, which I also fixed.

Member

tardyp commented Sep 24, 2017

Hi @egirault I added an integration test in your branch (wasn't that easy...), which reproduced another issue #3591, and another case of ==, which I also fixed.

egirault and others added some commits Sep 24, 2017

Fixed a deadlock occurring when a lock is taken by 2 distinct owners …
…but o1 == o2

If a build factory has a buildstep with a lock, this will trigger a deadlock whenever 3 or more builds of this factory are started at the same time. The main reason is that BaseLock uses the == operator to compare the owners of a given lock. But if owners are BuildSteps, it doesn't work because they inherit from ComparableMixin (#d6c6b550), so two distinct BuildStep objects sharing the same properties will be considered as equal, although they are distinct Python objects: `b0 == b1` but `b0 is not b1`. We use the `is` operator to fix this.
@egirault

This comment has been minimized.

Show comment
Hide comment
@egirault

egirault Sep 25, 2017

Contributor

Yes indeed I forgot one !=. Thanks!

Contributor

egirault commented Sep 25, 2017

Yes indeed I forgot one !=. Thanks!

tardyp added some commits Sep 27, 2017

@tardyp tardyp merged commit a2fa746 into buildbot:master Sep 28, 2017

29 checks passed

bb Build done.
Details
bb/coverage/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/coverage/3.5/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/docs/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/flake8/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/flake8/3.5/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/isort/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/js/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/pylint/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/smokes/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/smokes/3.5/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:14.0.2/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:15.4.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:15.5.0/sqla:0.8.0/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:15.5.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:latest/sqla:latest/db:mysql+mysqldb://travis@127.0.0.1/bbtest Build done.
Details
bb/trial/2.7/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/2.7/tw:latest/sqla:latest/db:sqlite:////tmp/test_db.sqlite Build done.
Details
bb/trial/3.4/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/trial/3.6/tw:latest/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.6/tw:14.0.2/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.6/tw:15.4.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.7/tw:10.2.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.7/tw:11.1.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.7/tw:12.2.0/sqla:latest/db:sqlite:// Build done.
Details
bb/trial_worker/2.7/tw:13.2.0/sqla:latest/db:sqlite:// Build done.
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@egirault

This comment has been minimized.

Show comment
Hide comment
@egirault

egirault Sep 28, 2017

Contributor

Thank you!

Contributor

egirault commented Sep 28, 2017

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment