-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker (.net version) is notoriously flaky (workaround included) #138
Comments
Duplicate of #136 (but adding much more info and a workaround) I'm doing Bret's course (120% useful 😉) and I can confirm the same behaviour with current Docker CE version
|
One issue that has caused this is the lack of correct |
I must have an older verion of your repo, because the worker is listed before the db and I had this issue. Creating the db service before the worker fixed it for me! |
I'm thinking this issue should be good to go now. Thoughts @BretFisher ? |
Description
There's something with the .net version of the worker app that causes it to randomly not connect to db or redis on startup, which results in a crash. Over the last year I've had thousands of students use this app to learn docker and swarm, and one of the most common issues is this worker failing on startup. It happens on all modern docker versions, across platforms, and there is no common theme to why it doesn't work.
Today in testing, we killed the broken service, which was re-creating the task over and over and it was failing, and once the service was recreated, and it worked... with no changes in how we created it. See log below for typical behavior. The stack trace tells you it can't resolve something, but doesn't show what it can't look up, so I can't tell what it thinks the problem is. From all the cases I've seen and testing I've done, it's not related to other services being down or general network issues.
This also happens for Kubernetes, as seen by other issues reported in this repo.
Hundreds of people have reported this problem to me, and deploying the java version fixes the issue.
Workaround
Deploy the java version of the worker, which I have build here:
bretfisher/examplevotingapp_worker:java
Steps to reproduce the issue, if relevant:
In swarm:
Describe the results you received:
Notice below that we created a worker service with two replicas, and you'll see one replica work, and the other fail, then get re-created on the same node and work the 2nd time. It's random if it fails, and which one would fail.
Describe the results you expected:
Worker always works :)
Additional information you deem important (e.g. issue happens only occasionally):
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, Docker for Mac, Docker for Windows, VirtualBox, physical, etc.):
This has happened on Docker Desktop, Digital Ocean, Docker Toolbox on VirtualBox, and more.
The text was updated successfully, but these errors were encountered: