-
Notifications
You must be signed in to change notification settings - Fork 348
Is restarting cri-containerd after configuring container networking required? #545
Comments
I've posted this over on the kube-router repo because I'm unclear if this is an issue with cri-containerd or kube-router (or neither, maybe it's just me!). Here's the link: |
CNI config is static, and maybe you are configuring it late in the process thus requiring a reboot to load the newly changed config? @Random-Liu wdyt |
Yes, when kube-router starts it alters the configuration in |
Based on the code, CNI should be updated with latest configure. If it doesn't work, it might be a bug. |
@Random-Liu This is a weird state I am encountering during cluster bootstrapping only. I do not intend to change the CIDR during the course of normal operation. My intent is to make sure things are configured correctly before I start any (non-network-host) pods. At the moment, it seems I am in a chicken/egg scenario. Starting kube-router (as a daemonset) requires cri-containerd to be running. During the startup process of kube-router, the CNI configuration on each Node is modified:
Can you help me understand what you mean by this?
|
@Random-Liu This is a similar issue I have encountered with the cni stuff that I discussed with you last week. @tkellen I think restarting containerd might trigger a restart of cri-containerd (because of watch on cri-containerd @Random-Liu correct me if I am wrong here) So that might be the reason why things might start to work after reloading containerd. Feel free to open a bug and I can take a look at this. |
Thanks for following up @abhi! I've just confirmed restarting |
Heads up I've just edited my prior posts (and the title of this issue) to reference restarting
|
Let me also share the contents of Placed on the host during cluster bootstrapping: {
"name": "kubernetes",
"type": "bridge",
"bridge": "kube-bridge",
"ipam": {
"type": "host-local"
},
"isDefaultGateway": true
} ...how kube-router modifies it during startup (just adding the subnet): {
"bridge": "kube-bridge",
"ipam": {
"subnet": "10.1.0.0/24",
"type": "host-local"
},
"isDefaultGateway": true,
"name": "kubernetes",
"type": "bridge"
} |
@abhi Is there anything else I can provide that would be useful here? You mentioned having me open a bug but I'm not sure where to do that. |
@tkellen I mean our cni package uses fsnotify to watch cni config directory. If |
There are debug prints for every event from the watcher (logged as A long time ago I think I observed that if |
No, looks like there is an explicit check and error for that case these days. |
It seems that aside: am I the only one who is very confused by |
This is not quite accurate. If there is no configuration present during |
Even if a valid configuration is found at start of day we still need to watch for updates to that config. This is best achieved (to avoid racing with the watcher) by moving the initial sync into `monitorNetDir`. Discovered via containerd/cri#545. Signed-off-by: Ian Campbell <ijc@docker.com>
cri-o/ocicni#13 is my fix for this, once it is merged I'll PR in an update here. |
I've put this and another fix related to this and #573 in my |
Thank you @ijc! If you'd be willing to give me a bit of guidance about how I could test this I'd be glad to verify if this fixes the issue. If that seems like too much work I'll keep my eyes open for the next release that includes these changes and report my findings then. |
Even if a valid configuration is found at start of day we still need to watch for updates to that config. This is best achieved (to avoid racing with the watcher) by moving the initial sync into `monitorNetDir`. Discovered via containerd/cri#545. Signed-off-by: Ian Campbell <ijc@docker.com>
@tkellen In your usual Or you could do:
To checkout exactly my branch into a new local branch called |
Even if a valid configuration is found at start of day we still need to watch for updates to that config. This is best achieved (to avoid racing with the watcher) by moving the initial sync into `monitorNetDir`. Discovered via containerd/cri#545. Signed-off-by: Ian Campbell <ijc@docker.com>
I am still seeing this behavior. Could we re-open this? I'm planning to provide some detailed logging output on this in the next day or two. |
@tkellen Yeah, it is required now. We dicussed with the networking team, and they think that dynamically change CNI config should not be supported. If you really want to do it in 2 steps, e.g. one daemonset to deploy the initial cni config, the second change the CIDR, can you let the first daemonset put the cni config in a different directory, and let the second one changes the cni config, and copy it to |
Thanks for the prompt reply @Random-Liu! You're right, supporting dynamically changing config doesn't make any sense. For what it is worth, from the perspective of someone operating kube-router, the configuration is not changing dynamically, it's just briefly in a bootstrapping state. Would you accept a PR that defers the read on a new configuration file appearing for a few seconds? If that seems like a colossal hack to you, I'll accept that this is a wrinkle in the implementation of kube-router that should be smoothed downstream. Do you have any thoughts on this @murali-reddy? |
@tkellen Let's reopen this issue. We can either fix @tkellen Since you said that @abhi Are you ok with changing containerd to support dynamic cni config loading? Last time I discussed with @freehan, and he also didn't think we should support dynamic loading. |
@Random-Liu I still don't think we should do dynamic config behind the scenes. |
SGTM but maybe we could just setup a file system notifier if dynamic mode.. |
Fixed by #825 |
Thanks all! Looking forward to testing this out. Apologies I couldn't be more directly helpful. |
Confirmed resolved! Thanks again 💃 |
First, thank you for cri-containerd. I've been using it a lot lately and it works wonderfully!
I'm experiencing an issue related to bootstrapping a cluster that I am hoping you all can help with.
If I don't restart cri-containerd after deploying my container networking pods, any pod not connected to the host network fails to start.
I have a full example process, including the cri-containerd restart documented here:
https://github.com/tkellen/kwm#first-time-user-guide
If you skip the cri-containerd restart step you'll see that kube-dns (or any other pod not connected to the host network) fails to start with the error "Failed create pod sandbox". As soon as you restart cri-containerd, the issue resolves itself. Is this expected? If not, can you help me understand what the correct process for this would be?
Thanks so much!
The text was updated successfully, but these errors were encountered: