Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Concourse worker paralyzes whole server #1989
Hello, I have issues with concourse worker.
Sometimes, the worker container stalls (builds are frozen, check-resources hang indefinitely), even though still appearing as "running " when using the fly CLI worker command.
Problem is, this container cannot be stopped.
Stopping/restarting the docker service does not work, rebooting the OS does not work.
Only hard-rebooting the server makes the problem goes away. This is a huge problem imo, since this means I can't run concourse on any important server (I can't reboot production servers just like that).
Bug reports are pretty free-form; just replace this with whatever. You can also help us triage the issue by including steps to reproduce, expected results, and the actual result. Help us help you!
The following can also be handy:
If worker is listed as
Is there any logs from worker when it gets into this stage?
hello, sorry, I don't have particular logs, but I had googled them at the time and there wasn't any particular clue
I agree that often we do
What about augmenting the worker state reported by
Or even better: assume the subsystems are garden, baggageclaim, foo. We represent each of these as one letter:
The day Concourse worker has a a new subsystem, say
This would GREATLY make the life of operators better :-)
I think I should put this comment in a separate ticket because it is unrelated wether the worker paralizes the server or not, it is way more wide, it is about simple-to-obtain worker diagnostic.