New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove attachable network on swarm leave #30157
Conversation
Seems like it's the right thing to do... what happens to the networking in the container when this happens? Also, build failure:
|
Thanks @cpuguy83
The network interface which was connecting the container to the attachable network is removed. |
@@ -468,3 +468,35 @@ func (daemon *Daemon) deleteNetwork(networkID string, dynamic bool) error { | |||
func (daemon *Daemon) GetNetworks() []libnetwork.Network { | |||
return daemon.getAllNetworks() | |||
} | |||
|
|||
// clearAttachableNetworks remove the attachable networks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove -> removes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix
ping @mavenugo |
continue | ||
} | ||
containerID := sb.ContainerID() | ||
if err := daemon.DisconnectContainerFromNetwork(containerID, nwName, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use network-id instead of network-name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. This API can take either name of ID (it eventually calls FindNetwork()
), it was in fact the first function which processes a network disconnect POST request. I am saying "it was" because later with the addition of swarm networks being lazily programmed FindNetwork
had to be called before anyway.
Edit: ^^Looks like that FindNetwork call in the postNetworkDisconnect() has now been removed from master as well (it was there in 1.11.x branch where I was mistakenly looking at)
In this case the network is known to this node and there is no need to pass the ID.
But I agree it will spare the reader from wondering about this logic when reading the code, I will change it to ID.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall a case where both a "bridge" and "overlay" network existed on a node (due to a race?), resulting in the service failing to start. Although "corner case", removing the network by name could potentially result in the wrong network being deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, then it makes sense.
If user created that scenario, it is expected from him to handle things right when calling docker network rm ...
.
But in this case, being an automatic action taken by the daemon, we must delete the right network, with no user intervention.
So the right thing to do here is to pass the network ID as you suggested. Thanks for your comment.
LGTM @aboch curious to know how this will behave in a case where all the swarm managers leave, while the worker node (with an attachable network and containers) stays ? I guess, it will fix itself when the worker rejoins to new managers I assume. Can you pls confirm ? |
@mavenugo Not sure if this is the test you wanted to check. Is the worker node going into a failed state if we wait long enough and cluster provider be set to nil ? |
@aboch yes. I think that is reasonable and expected behavior. am good. |
// after disconnecting any connected container | ||
func (daemon *Daemon) clearAttachableNetworks() { | ||
var networks []libnetwork.Network | ||
for _, n := range daemon.GetNetworks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering, should we combine the two loops, and simply
for _, n := range daemon.GetNetworks() {
if !n.Info().Attachable() {
continue
}
Or is there a special reason for using the intermediate networks
variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it in fact looks like Getnetworks()
returns a new slice of networks.
Then there is no need to create a new slice here.
Will fix it shortly. Thanks.
@thaJeztah Taken care of your comment, and retested it, thanks. PTAL when you get a chance. |
@aboch Can |
Thanks @tonistiigi Unless I missed it, Also, I do not think this fix will go in 1.13.x |
LGTM |
@dnephin Addressed your comment, PTAL (I am having some weird build error, so I just pushed the change to speed up the review) |
LGTM Thank you! |
- When the node leaves the cluster, if any user run container(s) is connected to the swarm network, then daemon needs to detach the container(s) and remove the network. Signed-off-by: Alessandro Boch <aboch@docker.com>
container(s) is connected to the swarm (attachable) network,
then daemon needs to detach the container(s) and
remove the network.
Fixes #30084
- A picture of a cute animal (not mandatory but encouraged)