-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buildbot stop is not reliable in buildbot 0.9 #3535
Comments
|
This is a known issue related to txrequests, and cleanup of txrequests threads. I was never able to find a reliable method to shutdown the threads correctly sometimes it works some times it doesn't.. This is why I added support to treq, which doesn't use threads. You may switch to treq as a backend for httpclientservice. |
|
We seems to be using treq. (Though I never manually installed any of them. Probably buildbot by default used treq in my case). [bash]$ pip freeze | grep txrequests Do you expect this problem with treq as well? |
|
Are you using any latent workers? EC2 and OpenStack use threads I recall. |
|
@seankelly no. @tardyp This problem happens frequently on heavily loaded master. I think the problem is that even after running The workaround which works for me is:
I think new builds should not start once |
The activity lock should be taken much less longer there. |
|
@tardyp The workaround I mentioned above (using --clean) has been working perfectly fine for me for last few months. I just tried to do regular "buildbot stop" and it failed again (as expected).
Yes, I have that change. Also, I am not not sure if buildbot is waiting for activity_lock or stuck somewhere else. I notice the line "doing housekeeping for master" in the logs. I have added a log statement at https://git.io/vdKlC, but that wasn't printed in the logs (atleast this time). So, buildbot might be stuck in _masterDeactivatedHousekeeping() https://git.io/vAWTf or somewhere else. Note that the number of builds running currently doesn't decrease for me (except for few builds which finish normally). Here are the redacted twistd.log: https://goo.gl/NwzJcC I think that one problem which is surely present is that even after running buildbot stop command, new builds keeps starting. "buildbot stop" behavior should be same as "buildbot stop --clean", in the sense that new builds shouldn't start after issuing stop command. |
|
@tardyp, for sure, "buildbot stop --clean" prevents new builds from starting, while "buildbot stop" doesn't. Can you please check why? |
|
@tardyp any idea why "buildbot stop" doesn't prevents new builds from starting, while "buildbot stop --clean" does? |
|
My guess is because the first instructs Buildbot to stop immediately while a clean stop doesn't. Because stop should be immediate it's assumed a build won't be able to start. That is less likely to be true on a busy or large Buildbot instance though. |
|
Make sense. Is this the code which prevents new builds from starting (while using "buildbot stop --clean")? https://github.com/buildbot/buildbot/blob/master/master/buildbot/process/botmaster.py#L105 |
|
That highlighted line adds all running builds to a list so they can be monitored. I don't know where builds are queued, it appears to be elsewhere than this file. My guess is the BuildRequestDistributor can't distribute them because it's disowned so the builds naturally queue. |
|
Wow. 5 year old issue and this is still happening for me. I probably can't use buildbot at this point. I'm not even on section 1.2.3 of the tutorial and open source quality rears its head. |
|
@openSourceBugs Which version are you using? The reliability of Note that you need to run |
This was on what I pip installed yesterday from the tutorial docs. There was a "waiting for build to stop" status that never went away for 30 minutes after attempting to stop a failed build. Me stopping the build arose because the tutorial and the code it is using are obsolete (using git:// instead of https) and hasn't been updated for 5 years at least. There was a timeout when running the tutorial as it is now. I probably won't be using buildbot moving forward and this will be my last comment on this. |
Interesting. It would be nice if #6029 can also be fixed. |
|
For the record, the issue with using obsolete code in tutorial was recently fixed: #6599. |
|
This issue was fixed in #6620. |
If have noticed multiple times that buildbot stop never finish. Running 'buildbot stop' again results in Error "twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running". Only option left at that time is to "kill -9 pid".
'buildbot stop' should be more reliable.
For example, below are logs from a test instance (single master) which was not running any build at the time of 'buildbot stop'. However, 'buildbot stop' never finished, issuing another 'buildbot stop' didn't help either.
The text was updated successfully, but these errors were encountered: