Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
latent docker worker randomly dies for no apparent reason #3800
Having issue with Docker workers dying or being killed for unknown reasons.
There is no error message except a message that the docker worker disconnected for unknown reason. This issues causes rebuilds, waste time, and messes up the master ( so it will not shutdown)
I found the issue. It is in how Buildbot generates the docker name and the way the docker python library filters. When making names for docker workers in a general form like:
you get a case in which docker-worker-1 is being started up to do a build. At this time we might have docker-worker-11 running. the code will do a lookup on running workers and try to do a filter on a name of "buildbot". filter will incorrectly return back docker-worker-11 in the list ( it seems anything that startwith() will matches). The code will then try to force stop these workers from running, killing good builds. The fix I have submitted is to move the hash value to the end to avoid the possibility of an odd match