Remove attachable network on swarm leave #30157

aboch · 2017-01-14T04:21:16Z

When the node leaves the cluster, if any user run
container(s) is connected to the swarm (attachable) network,
then daemon needs to detach the container(s) and
remove the network.

- A picture of a cute animal (not mandatory but encouraged)

cpuguy83 · 2017-01-14T13:28:31Z

Seems like it's the right thing to do... what happens to the networking in the container when this happens?

Also, build failure:

05:42:45 ./docker_cli_swarm_test.go:380: undefined: strings.TrimSpae

aboch · 2017-01-14T16:55:53Z

Thanks @cpuguy83

what happens to the networking in the container when this happens?

The network interface which was connecting the container to the attachable network is removed.

allencloud · 2017-01-16T05:39:55Z

daemon/network.go

@@ -468,3 +468,35 @@ func (daemon *Daemon) deleteNetwork(networkID string, dynamic bool) error {
 func (daemon *Daemon) GetNetworks() []libnetwork.Network {
 	return daemon.getAllNetworks()
 }
+
+// clearAttachableNetworks remove the attachable networks


remove -> removes?

Thanks, I will fix

aboch · 2017-01-18T22:15:51Z

ping @mavenugo

thaJeztah · 2017-01-25T15:50:35Z

daemon/network.go

+				continue
+			}
+			containerID := sb.ContainerID()
+			if err := daemon.DisconnectContainerFromNetwork(containerID, nwName, true); err != nil {


Should this use network-id instead of network-name?

Not necessarily. This API can take either name of ID (it eventually calls FindNetwork()), it was in fact the first function which processes a network disconnect POST request. ~~I am saying "it was" because later with the addition of swarm networks being lazily programmed FindNetwork had to be called before anyway.~~
Edit: ^^Looks like that FindNetwork call in the postNetworkDisconnect() has now been removed from master as well (it was there in 1.11.x branch where I was mistakenly looking at)

In this case the network is known to this node and there is no need to pass the ID.

But I agree it will spare the reader from wondering about this logic when reading the code, I will change it to ID.

Thanks.

I recall a case where both a "bridge" and "overlay" network existed on a node (due to a race?), resulting in the service failing to start. Although "corner case", removing the network by name could potentially result in the wrong network being deleted.

Ah, then it makes sense.
If user created that scenario, it is expected from him to handle things right when calling docker network rm ....
But in this case, being an automatic action taken by the daemon, we must delete the right network, with no user intervention.
So the right thing to do here is to pass the network ID as you suggested. Thanks for your comment.

mavenugo · 2017-01-25T17:26:40Z

LGTM

@aboch curious to know how this will behave in a case where all the swarm managers leave, while the worker node (with an attachable network and containers) stays ? I guess, it will fix itself when the worker rejoins to new managers I assume. Can you pls confirm ?

aboch · 2017-01-25T19:10:35Z

@mavenugo
Nothing happens. The worker nodes keep trying to reach the manager. When I restart the manager, they reconnect. The network is never removed and the containers never detached.

Not sure if this is the test you wanted to check.

Is the worker node going into a failed state if we wait long enough and cluster provider be set to nil ?
If that is the case, then the cleanup logic would kick in.

mavenugo · 2017-01-25T20:11:10Z

@aboch yes. I think that is reasonable and expected behavior. am good.

thaJeztah · 2017-01-25T21:18:28Z

daemon/network.go

+// after disconnecting any connected container
+func (daemon *Daemon) clearAttachableNetworks() {
+	var networks []libnetwork.Network
+	for _, n := range daemon.GetNetworks() {


Wondering, should we combine the two loops, and simply

for _, n := range daemon.GetNetworks() { if !n.Info().Attachable() { continue }

Or is there a special reason for using the intermediate networks variable?

Thanks, it in fact looks like Getnetworks() returns a new slice of networks.
Then there is no need to create a new slice here.
Will fix it shortly. Thanks.

aboch · 2017-01-25T21:59:32Z

@thaJeztah Taken care of your comment, and retested it, thanks. PTAL when you get a chance.

tonistiigi · 2017-01-25T22:01:07Z

@aboch Can DeleteManagedNetwork call back to cluster pkg? Want to make sure that there isn't a possibility for a deadlock as that issue is not fixed in v1.13.

aboch · 2017-01-25T22:16:21Z

Thanks @tonistiigi Unless I missed it, DeleteManagedNetwork and the downstream calls in libnetwork do not call back to the docker cluster pkg.

Also, I do not think this fix will go in 1.13.x

tonistiigi · 2017-01-25T22:18:16Z

LGTM

mavenugo · 2017-01-26T13:03:35Z

@dnephin @aboch are we good to go with this PR ? FYI, This is marked for 1.13.1 and @vieux is in the process of cherry-picking the relevant patches to the 1.13.x branch.

aboch · 2017-01-26T18:38:47Z

@dnephin Addressed your comment, PTAL

(I am having some weird build error, so I just pushed the change to speed up the review)

dnephin · 2017-01-26T18:42:14Z

LGTM

Thank you!

aboch · 2017-01-26T19:06:33Z

CI failures due to #28409. @dnephin is looking into it.

- When the node leaves the cluster, if any user run container(s) is connected to the swarm network, then daemon needs to detach the container(s) and remove the network. Signed-off-by: Alessandro Boch <aboch@docker.com>

GordonTheTurtle added the status/0-triage label Jan 14, 2017

cpuguy83 added status/1-design-review and removed status/0-triage labels Jan 14, 2017

aboch added the rebuild/experimental label Jan 14, 2017

GordonTheTurtle removed the rebuild/experimental label Jan 14, 2017

aboch added the rebuild/experimental label Jan 15, 2017

GordonTheTurtle removed the rebuild/experimental label Jan 15, 2017

allencloud reviewed Jan 16, 2017

View reviewed changes

aboch added rebuild/experimental labels Jan 18, 2017

GordonTheTurtle removed rebuild/experimental labels Jan 18, 2017

aboch added the rebuild/janky label Jan 18, 2017

GordonTheTurtle removed the rebuild/janky label Jan 18, 2017

mavenugo added this to the 1.13.1 milestone Jan 25, 2017

thaJeztah reviewed Jan 25, 2017

View reviewed changes

vieux added the process/cherry-pick label Jan 25, 2017

thaJeztah reviewed Jan 25, 2017

View reviewed changes

GordonTheTurtle removed the rebuild/windowsRS1 label Jan 26, 2017

Remove attachable network on swarm leave

3cedca5

- When the node leaves the cluster, if any user run container(s) is connected to the swarm network, then daemon needs to detach the container(s) and remove the network. Signed-off-by: Alessandro Boch <aboch@docker.com>

vieux added rebuild/experimental labels Jan 26, 2017

GordonTheTurtle removed rebuild/experimental labels Jan 26, 2017

ehazlett added rebuild/experimental labels Jan 26, 2017

GordonTheTurtle removed rebuild/experimental labels Jan 26, 2017

ehazlett added the rebuild/experimental label Jan 26, 2017

GordonTheTurtle removed the rebuild/experimental label Jan 26, 2017

ehazlett added rebuild/janky and removed rebuild/janky labels Jan 26, 2017

GordonTheTurtle removed the rebuild/janky label Jan 26, 2017

tiborvass merged commit 43544cf into moby:master Jan 27, 2017

vieux added process/cherry-picked and removed process/cherry-pick labels Jan 27, 2017

vieux mentioned this pull request Jan 27, 2017

1.13.1 rc1 cherrypicks #30331

Merged

aboch mentioned this pull request Apr 7, 2017

Daemon to take care of ingress cleanup on cluster leave and graceful shutdown #32283

Merged

aboch deleted the att branch November 8, 2017 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove attachable network on swarm leave #30157

Remove attachable network on swarm leave #30157

aboch commented Jan 14, 2017

cpuguy83 commented Jan 14, 2017

aboch commented Jan 14, 2017

allencloud Jan 16, 2017

aboch Jan 17, 2017

aboch commented Jan 18, 2017

thaJeztah Jan 25, 2017

aboch Jan 25, 2017 •

edited

thaJeztah Jan 25, 2017

aboch Jan 25, 2017

mavenugo commented Jan 25, 2017

aboch commented Jan 25, 2017

mavenugo commented Jan 25, 2017

thaJeztah Jan 25, 2017

aboch Jan 25, 2017

aboch commented Jan 25, 2017 •

edited

tonistiigi commented Jan 25, 2017

aboch commented Jan 25, 2017

tonistiigi commented Jan 25, 2017

mavenugo commented Jan 26, 2017

aboch commented Jan 26, 2017

dnephin commented Jan 26, 2017

aboch commented Jan 26, 2017

Remove attachable network on swarm leave #30157

Remove attachable network on swarm leave #30157

Conversation

aboch commented Jan 14, 2017

cpuguy83 commented Jan 14, 2017

aboch commented Jan 14, 2017

allencloud Jan 16, 2017

Choose a reason for hiding this comment

aboch Jan 17, 2017

Choose a reason for hiding this comment

aboch commented Jan 18, 2017

thaJeztah Jan 25, 2017

Choose a reason for hiding this comment

aboch Jan 25, 2017 • edited

Choose a reason for hiding this comment

thaJeztah Jan 25, 2017

Choose a reason for hiding this comment

aboch Jan 25, 2017

Choose a reason for hiding this comment

mavenugo commented Jan 25, 2017

aboch commented Jan 25, 2017

mavenugo commented Jan 25, 2017

thaJeztah Jan 25, 2017

Choose a reason for hiding this comment

aboch Jan 25, 2017

Choose a reason for hiding this comment

aboch commented Jan 25, 2017 • edited

tonistiigi commented Jan 25, 2017

aboch commented Jan 25, 2017

tonistiigi commented Jan 25, 2017

mavenugo commented Jan 26, 2017

aboch commented Jan 26, 2017

dnephin commented Jan 26, 2017

aboch commented Jan 26, 2017

aboch Jan 25, 2017 •

edited

aboch commented Jan 25, 2017 •

edited