Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017
Hopefully, this backport gets into 17.09.1, the current stable release corrupts the managers' quorum when the number of overlay networks gets close to 2000, in addition to that some workers go down randomly.
It starts to log lots of
Eventually, it logs a failure
Nov 20, 2017
2 checks passed
No, I don't have a date for 17.09.1.
17.10 and 17.11 are edge releases, so would only get critical ("P0") patches; with 17.11 being released, docker 17.10 reached EOL.
For your specific issue, the error message indicates it may not be an issue in the networking stack, but due to the raft state becoming too big for syncing between managers; this pull request in SwarmKit raises the maximum size to 128MB; docker/swarmkit#2375, which is being backported to 17.09.1 through docker/docker-ce#323, and should address the direct problem.
A bigger change is being worked on in docker/swarmkit#2458, which will use a streaming mechanism to send raft snapshots.
@thaJeztah raft state becoming too big was only happening when approximately 2000 overlay networks were being created in a 30min time window. In contrary, if 2000 services were being created it was all fine. Supposedly the raft messages for service creation are bigger.
Is there a way to know if
I just tested on
Yes; the size change was in 17.09.0-ce-rc3; https://github.com/docker/docker-ce/blob/v17.09.0-ce-rc3/components/engine/vendor/github.com/docker/swarmkit/manager/manager.go#L59
Note that rc3 is a release candidate; not the final release (17.09.0-ce was released after that)