New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

Merged
merged 6 commits into from Nov 20, 2017

Conversation

Projects
None yet
3 participants
@thaJeztah
Member

thaJeztah commented Nov 20, 2017

Cherry-pick of #1944 and #1960 for 17.09

git checkout -b 17.09-backport-netdb-fix-reap upstream/bump_17.09
git cherry-pick -s -S -x 10cd98c56fb17737834c1031cad56a9932a266f4
git cherry-pick -s -S -x 3feb3aadf388d64bb8ea7e0a6614933d3d8b885a
git cherry-pick -s -S -x fbba555561bab4af332d6c1ee592976f16c609c3

Also included #1985 and #1991, which look to depend on this

# PR: https://github.com/docker/libnetwork/pull/1985
git cherry-pick -s -S -x 1c04e1980dfd6debab6e43d9db672c0f01528868

# PR: https://github.com/docker/libnetwork/pull/1991
git cherry-pick -s -S -x 52a9ab55ce92a541798d130a87a84ca34f0b6178

@thaJeztah thaJeztah changed the title from [17.09] Fix reapTime logic in NetworkDB to [17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container Nov 20, 2017

@eduardolundgren

This comment has been minimized.

Show comment
Hide comment
@eduardolundgren

eduardolundgren Nov 20, 2017

@thaJeztah thank you for backing port this, would be possible to make a release with it?

eduardolundgren commented Nov 20, 2017

@thaJeztah thank you for backing port this, would be possible to make a release with it?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 20, 2017

Member

@eduardolundgren this cherry-pick is in preparation of a possible 17.09.1 release; it's still being decided which changes need to be backported and which not.

Member

thaJeztah commented Nov 20, 2017

@eduardolundgren this cherry-pick is in preparation of a possible 17.09.1 release; it's still being decided which changes need to be backported and which not.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 20, 2017

Member

Looks like I may have to cherry-pick #1947 into this one to fix the lint-errors

Member

thaJeztah commented Nov 20, 2017

Looks like I may have to cherry-pick #1947 into this one to fix the lint-errors

@eduardolundgren

This comment has been minimized.

Show comment
Hide comment
@eduardolundgren

eduardolundgren Nov 20, 2017

Hopefully, this backport gets into 17.09.1, the current stable release corrupts the managers' quorum when the number of overlay networks gets close to 2000, in addition to that some workers go down randomly.

It starts to log lots of netPeers messages

e0cc780) - netID:kt2qn6cpuq7okhcy3d5pzo6ps leaving:false netPeers:10 entries:20 Queue qLen:0 netMsg/s:0"

Then, segmented wal file info appears

dockerd[2069]: segmented wal file /mnt/ebs/var/lib/docker/swarm/raft/wal-v3-encrypted/0000000000000039-00

Eventually, it logs a failure

dockerd[2069]: time="" level=error msg="Failed to commit allocation of network resources for node w58o35tebq6atsq1kgh665cow" error="raft: raft message is too large and can't be sent" module=node node.id=7ih3j4dbd0f3r6zy0uhvicr0o

eduardolundgren commented Nov 20, 2017

Hopefully, this backport gets into 17.09.1, the current stable release corrupts the managers' quorum when the number of overlay networks gets close to 2000, in addition to that some workers go down randomly.

It starts to log lots of netPeers messages

e0cc780) - netID:kt2qn6cpuq7okhcy3d5pzo6ps leaving:false netPeers:10 entries:20 Queue qLen:0 netMsg/s:0"

Then, segmented wal file info appears

dockerd[2069]: segmented wal file /mnt/ebs/var/lib/docker/swarm/raft/wal-v3-encrypted/0000000000000039-00

Eventually, it logs a failure

dockerd[2069]: time="" level=error msg="Failed to commit allocation of network resources for node w58o35tebq6atsq1kgh665cow" error="raft: raft message is too large and can't be sent" module=node node.id=7ih3j4dbd0f3r6zy0uhvicr0o
@fcrisciani

This comment has been minimized.

Show comment
Hide comment
@fcrisciani

fcrisciani Nov 20, 2017

Member

@thaJeztah lint commit is in, can you try to rebase?

Member

fcrisciani commented Nov 20, 2017

@thaJeztah lint commit is in, can you try to rebase?

fcrisciani added some commits Sep 19, 2017

Fix reapTime logic in NetworkDB
- Added remainingReapTime field in the table event.
  Wihtout it a node that did not have a state for the element
  was marking the element for deletion setting the max reapTime.
  This was creating the possibility to keep the entry being resync
  between nodes forever avoding the purpose of the reap time
  itself.

- On broadcast of the table event the node owner was rewritten
  with the local node name, this was not correct because the owner
  should continue to remain the original one of the message

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 10cd98c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Changed ReapTable logic
- Changed the loop per network. Previous implementation was taking a
  ReadLock to update the reapTime but now with the residualReapTime
  also the bulkSync is using the same ReadLock creating possible
  issues in concurrent read and update of the value.
  The new logic fetches the list of networks and proceed to the
  cleanup network by network locking the database and releasing it
  after each network. This should ensure a fair locking avoiding
  to keep the database blocked for too much time.

  Note: The ticker does not guarantee that the reap logic runs
  precisely every reapTimePeriod, actually documentation says that
  if the routine is too long will skip ticks. In case of slowdown
  of the process itself it is possible that the lifetime of the
  deleted entries increases, it still should not be a huge problem
  because now the residual reaptime is propagated among all the nodes
  a slower node will let the deleted entry being repropagate multiple
  times but the state will still remain consistent.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 3feb3aa)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Avoid alignment of reapNetwork and tableEntries
Make sure that the network is garbage collected after
the entries. Entries to be deleted requires that the network
is present.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit fbba555)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fix comparison against wrong constant
The comparison was against the wrong constant value.
As described in the comment the check is there to guarantee
to not propagate events realted to stale deleted elements

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 6f11d29)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Handle cleanup DNS for attachable container
Attachable containers they are tasks with no service associated
their cleanup was not done properly so it was possible to have
a leak of their name resolution if that was the last container
on the network.
Cleanupservicebindings was not able to do the cleanup because there
is no service, while also the notification of the delete arrives
after that the network is already being cleaned

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 1c04e19)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add test for cleanupServiceDiscovery
Unit test for the cleanupServiceDiscovery,
follow up of PR: #1985

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 52a9ab5)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 20, 2017

Member

rebased 👍

Member

thaJeztah commented Nov 20, 2017

rebased 👍

@fcrisciani

This comment has been minimized.

Show comment
Hide comment
@fcrisciani

fcrisciani Nov 20, 2017

Member

LGTM

Member

fcrisciani commented Nov 20, 2017

LGTM

@fcrisciani fcrisciani merged commit 690b4c0 into docker:bump_17.09 Nov 20, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
dco-signed All commits are signed

@thaJeztah thaJeztah deleted the thaJeztah:17.09-backport-netdb-fix-reap branch Nov 20, 2017

@eduardolundgren

This comment has been minimized.

Show comment
Hide comment
@eduardolundgren

eduardolundgren Nov 20, 2017

Do you know when 17.09.1 release is gonna happen?
Is this going to also get into 17.10.x and 17.11.x?
Sorry for so many questions :)
Thank you again for the fast response.

eduardolundgren commented Nov 20, 2017

Do you know when 17.09.1 release is gonna happen?
Is this going to also get into 17.10.x and 17.11.x?
Sorry for so many questions :)
Thank you again for the fast response.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 21, 2017

Member

No, I don't have a date for 17.09.1.

17.10 and 17.11 are edge releases, so would only get critical ("P0") patches; with 17.11 being released, docker 17.10 reached EOL.

For your specific issue, the error message indicates it may not be an issue in the networking stack, but due to the raft state becoming too big for syncing between managers; this pull request in SwarmKit raises the maximum size to 128MB; docker/swarmkit#2375, which is being backported to 17.09.1 through docker/docker-ce#323, and should address the direct problem.

A bigger change is being worked on in docker/swarmkit#2458, which will use a streaming mechanism to send raft snapshots.

Member

thaJeztah commented Nov 21, 2017

No, I don't have a date for 17.09.1.

17.10 and 17.11 are edge releases, so would only get critical ("P0") patches; with 17.11 being released, docker 17.10 reached EOL.

For your specific issue, the error message indicates it may not be an issue in the networking stack, but due to the raft state becoming too big for syncing between managers; this pull request in SwarmKit raises the maximum size to 128MB; docker/swarmkit#2375, which is being backported to 17.09.1 through docker/docker-ce#323, and should address the direct problem.

A bigger change is being worked on in docker/swarmkit#2458, which will use a streaming mechanism to send raft snapshots.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 21, 2017

Member

Actually, the size fix was already included in 17.09.0 through docker/docker-ce#242, so I'm not sure if this will address your situation

Member

thaJeztah commented Nov 21, 2017

Actually, the size fix was already included in 17.09.0 through docker/docker-ce#242, so I'm not sure if this will address your situation

@eduardolundgren

This comment has been minimized.

Show comment
Hide comment
@eduardolundgren

eduardolundgren Nov 21, 2017

@thaJeztah raft state becoming too big was only happening when approximately 2000 overlay networks were being created in a 30min time window. In contrary, if 2000 services were being created it was all fine. Supposedly the raft messages for service creation are bigger.

Is there a way to know if 17.09.0-ce-rc3 contains docker/swarmkit#2375?

I just tested on 17.09.0-ce-rc3 from Thu Sep 21 02:32:26 2017 and it seems to be working well. I only had one manager and one worker on this test though, will try again with more managers and see if it's still fine.

eduardolundgren commented Nov 21, 2017

@thaJeztah raft state becoming too big was only happening when approximately 2000 overlay networks were being created in a 30min time window. In contrary, if 2000 services were being created it was all fine. Supposedly the raft messages for service creation are bigger.

Is there a way to know if 17.09.0-ce-rc3 contains docker/swarmkit#2375?

I just tested on 17.09.0-ce-rc3 from Thu Sep 21 02:32:26 2017 and it seems to be working well. I only had one manager and one worker on this test though, will try again with more managers and see if it's still fine.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 21, 2017

Member

Yes; the size change was in 17.09.0-ce-rc3; https://github.com/docker/docker-ce/blob/v17.09.0-ce-rc3/components/engine/vendor/github.com/docker/swarmkit/manager/manager.go#L59

Note that rc3 is a release candidate; not the final release (17.09.0-ce was released after that)

Member

thaJeztah commented Nov 21, 2017

Yes; the size change was in 17.09.0-ce-rc3; https://github.com/docker/docker-ce/blob/v17.09.0-ce-rc3/components/engine/vendor/github.com/docker/swarmkit/manager/manager.go#L59

Note that rc3 is a release candidate; not the final release (17.09.0-ce was released after that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment