Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

Is restarting cri-containerd after configuring container networking required? #545

Closed
tkellen opened this issue Jan 15, 2018 · 28 comments
Closed
Assignees

Comments

@tkellen
Copy link
Contributor

tkellen commented Jan 15, 2018

First, thank you for cri-containerd. I've been using it a lot lately and it works wonderfully!

I'm experiencing an issue related to bootstrapping a cluster that I am hoping you all can help with.

If I don't restart cri-containerd after deploying my container networking pods, any pod not connected to the host network fails to start.

I have a full example process, including the cri-containerd restart documented here:
https://github.com/tkellen/kwm#first-time-user-guide

If you skip the cri-containerd restart step you'll see that kube-dns (or any other pod not connected to the host network) fails to start with the error "Failed create pod sandbox". As soon as you restart cri-containerd, the issue resolves itself. Is this expected? If not, can you help me understand what the correct process for this would be?

Thanks so much!

@tkellen
Copy link
Contributor Author

tkellen commented Jan 15, 2018

I've posted this over on the kube-router repo because I'm unclear if this is an issue with cri-containerd or kube-router (or neither, maybe it's just me!). Here's the link:
cloudnativelabs/kube-router#286

@mikebrow
Copy link
Member

mikebrow commented Jan 15, 2018

CNI config is static, and maybe you are configuring it late in the process thus requiring a reboot to load the newly changed config? @Random-Liu wdyt

@tkellen
Copy link
Contributor Author

tkellen commented Jan 15, 2018

Yes, when kube-router starts it alters the configuration in /etc/cni/net.d/10-kuberouter on each node to take on the correct CIDR.

@Random-Liu
Copy link
Member

Based on the code, CNI should be updated with latest configure. If it doesn't work, it might be a bug.
However, @tkellen will you change CIDR when there are already non-hostnetwork pods running? Won't that confuse CNI?

@tkellen
Copy link
Contributor Author

tkellen commented Jan 16, 2018

@Random-Liu This is a weird state I am encountering during cluster bootstrapping only. I do not intend to change the CIDR during the course of normal operation. My intent is to make sure things are configured correctly before I start any (non-network-host) pods.

At the moment, it seems I am in a chicken/egg scenario. Starting kube-router (as a daemonset) requires cri-containerd to be running. During the startup process of kube-router, the CNI configuration on each Node is modified:

Kube-router does not perform subnet management and relies upon Kubernetes controll manager to do the subnet allocation for each node. Each node in the cluster is allocated unique subnet from the cluster CIDR range for allocation of IP’s to the pods running on the node.
https://cloudnativelabs.github.io/post/2017-05-22-kube-pod-networking/

Can you help me understand what you mean by this?

Based on the code, CNI should be updated with latest configure. If it doesn't work, it might be a bug.

@abhi
Copy link
Member

abhi commented Jan 16, 2018

@Random-Liu This is a similar issue I have encountered with the cni stuff that I discussed with you last week. @tkellen I think restarting containerd might trigger a restart of cri-containerd (because of watch on cri-containerd @Random-Liu correct me if I am wrong here) So that might be the reason why things might start to work after reloading containerd. Feel free to open a bug and I can take a look at this.

@tkellen
Copy link
Contributor Author

tkellen commented Jan 16, 2018

Thanks for following up @abhi! I've just confirmed restarting cri-containerd resolves the issue also. Where should I open a bug?

@tkellen tkellen changed the title Is restarting containerd after configuring container networking required? Is restarting cri-containerd after configuring container networking required? Jan 16, 2018
@tkellen
Copy link
Contributor Author

tkellen commented Jan 16, 2018

Heads up I've just edited my prior posts (and the title of this issue) to reference restarting cri-containerd so future readers aren't sent on a wild goose chase. Also, here are my logs for cri-containerd from initial failure to resolution (showing initial startup + service restart after kube-router modifies its CNI configuration by communicating with the controller manager to find the correct CIDR):

Jan 16 04:43:05 tiny cri-containerd[2019]: E0116 04:43:05.279222    2019 instrumented_service.go:40] RunPodSandbox for &PodSandboxMetadata{Name:kube-dns-574cf969f4-c64mq,Uid:b6d08be0-fa77-11e7-8e33-12612308a448,Namespace:kube-system,Attem
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=info msg="Got pod network {Name:kube-dns-574cf969f4-c64mq Namespace:kube-system ID:ba21cf3657bb6e300d0d37a31cf68e6db73a718d21a5812e34787924bc737f7a NetNS:/var/ru
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=info msg="About to add CNI network cni-loopback (type=loopback)"
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=info msg="Got pod network {Name:kube-dns-574cf969f4-c64mq Namespace:kube-system ID:ba21cf3657bb6e300d0d37a31cf68e6db73a718d21a5812e34787924bc737f7a NetNS:/var/ru
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=info msg="About to add CNI network kubernetes (type=bridge)"
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=error msg="Error adding network: no IP ranges specified"
Jan 16 04:43:19 tiny cri-containerd[2019]: time="2018-01-16T04:43:19Z" level=error msg="Error while adding to cni network: no IP ranges specified"
Jan 16 04:43:19 tiny cri-containerd[2019]: E0116 04:43:19.287222    2019 instrumented_service.go:40] RunPodSandbox for &PodSandboxMetadata{Name:kube-dns-574cf969f4-c64mq,Uid:b6d08be0-fa77-11e7-8e33-12612308a448,Namespace:kube-system,Attem
Jan 16 04:43:27 tiny systemd[1]: Stopping Kubernetes containerd CRI shim...
Jan 16 04:43:27 tiny cri-containerd[2019]: E0116 04:43:27.480775    2019 service.go:217] Failed to serve grpc grpc request: accept unix /var/run/cri-containerd.sock: use of closed network connection
Jan 16 04:43:27 tiny cri-containerd[2019]: E0116 04:43:27.480816    2019 events.go:65] Failed to handle event stream: rpc error: code = Canceled desc = context canceled
Jan 16 04:43:27 tiny cri-containerd[2019]: E0116 04:43:27.480841    2019 service.go:198] Failed to start streaming server: http: Server closed
Jan 16 04:43:27 tiny systemd[1]: Stopped Kubernetes containerd CRI shim.
Jan 16 04:43:27 tiny systemd[1]: Started Kubernetes containerd CRI shim.
Jan 16 04:43:27 tiny cri-containerd[4257]: I0116 04:43:27.593002    4257 cri_containerd.go:100] Run cri-containerd &{Config:{ContainerdConfig:{RootDir:/var/lib/containerd Snapshotter:overlayfs Endpoint:/run/containerd/containerd.sock Runt
Jan 16 04:43:27 tiny cri-containerd[4257]: time="2018-01-16T04:43:27Z" level=info msg="CNI network kubernetes (type=bridge) is used from /etc/cni/net.d/10-kuberouter.conf"
Jan 16 04:43:27 tiny cri-containerd[4257]: time="2018-01-16T04:43:27Z" level=info msg="CNI network kubernetes (type=bridge) is used from /etc/cni/net.d/10-kuberouter.conf"
Jan 16 04:43:33 tiny cri-containerd[4257]: time="2018-01-16T04:43:33Z" level=info msg="Got pod network {Name:kube-dns-574cf969f4-c64mq Namespace:kube-system ID:86a2d5ac7915d99c7388e825635140e8e8936e143dd4fc82ca24556e1470bf16 NetNS:/var/ru
Jan 16 04:43:33 tiny cri-containerd[4257]: time="2018-01-16T04:43:33Z" level=info msg="About to add CNI network cni-loopback (type=loopback)"
Jan 16 04:43:33 tiny cri-containerd[4257]: time="2018-01-16T04:43:33Z" level=info msg="Got pod network {Name:kube-dns-574cf969f4-c64mq Namespace:kube-system ID:86a2d5ac7915d99c7388e825635140e8e8936e143dd4fc82ca24556e1470bf16 NetNS:/var/ru
Jan 16 04:43:33 tiny cri-containerd[4257]: time="2018-01-16T04:43:33Z" level=info msg="About to add CNI network kubernetes (type=bridge)"

@tkellen
Copy link
Contributor Author

tkellen commented Jan 16, 2018

Let me also share the contents of /etc/cni/net.d/10-kuberouter.conf before and after kube-router is started. I'm candidly a bit out of my depth here so apologies if this is noise.

Placed on the host during cluster bootstrapping:

{
  "name": "kubernetes",
  "type": "bridge",
  "bridge": "kube-bridge",
  "ipam": {
    "type": "host-local"
  },
  "isDefaultGateway": true
}

...how kube-router modifies it during startup (just adding the subnet):

{
  "bridge": "kube-bridge",
  "ipam": {
    "subnet": "10.1.0.0/24",
    "type": "host-local"
  },
  "isDefaultGateway": true,
  "name": "kubernetes",
  "type": "bridge"
}

@tkellen
Copy link
Contributor Author

tkellen commented Jan 17, 2018

@abhi Is there anything else I can provide that would be useful here? You mentioned having me open a bug but I'm not sure where to do that.

@Random-Liu
Copy link
Member

Based on the code, CNI should be updated with latest configure. If it doesn't work, it might be a bug.

@tkellen I mean our cni package uses fsnotify to watch cni config directory. If kube-router changes it, cni should pick up the latest configuration. If that doesn't work, we may want to take a look what is wrong. @abhi Do you want to take a look at this? :)

@ijc
Copy link
Contributor

ijc commented Jan 23, 2018

There are debug prints for every event from the watcher (logged as logrus.Debugf) so with debugging turned up @tkellen ought to be abkle to see those happening whenever something is changed in the pluginDir (default is /etc/cni/net.d).

A long time ago I think I observed that if /etc/cni/net.d didn't already exist when the watcher was started then the watcher did not initialise correctly. Could that be happening here I wonder?

@ijc
Copy link
Contributor

ijc commented Jan 23, 2018

A long time ago I think I observed that if /etc/cni/net.d didn't already exist when the watcher was started > then the watcher did not initialise correctly. Could that be happening here I wonder?

No, looks like there is an explicit check and error for that case these days.

@ijc
Copy link
Contributor

ijc commented Jan 29, 2018

If kube-router changes it, cni should pick up the latest configuration. If that doesn't work, we may want to take a look what is wrong.

It seems that ocicni only watches with fsnotify until it has what it considers a suitable/usable default configuration as a one shot operation, it does not watch for subsequent updates such as the ones done by kube-router as reported here. Since that config is then written to the plugins stdin (rather than reread from disk) we end up passing the original/incomplete configuration. I'll look into ways of fixing that.

aside: am I the only one who is very confused by ocicni's naming of pluginDir being the dir where configuration is stored (i.e. /etc/cni/net.d) and cniDirs being an array of directories where plugins might live?

@ijc
Copy link
Contributor

ijc commented Jan 29, 2018

it does not watch for subsequent updates such as the ones done by kube-router as reported here.

This is not quite accurate. If there is no configuration present during InitCNI then it will watch for one arriving and then continue to watch (and I think correctly pickup updates). However if there is a valid configuration present during InitCNI then the watcher is never started.

ijc pushed a commit to ijc/ocicni that referenced this issue Jan 29, 2018
Even if a valid configuration is found at start of day we still need to watch
for updates to that config.

This is best achieved (to avoid racing with the watcher) by moving the initial
sync into `monitorNetDir`.

Discovered via containerd/cri#545.

Signed-off-by: Ian Campbell <ijc@docker.com>
@ijc
Copy link
Contributor

ijc commented Jan 29, 2018

cri-o/ocicni#13 is my fix for this, once it is merged I'll PR in an update here.

@ijc
Copy link
Contributor

ijc commented Jan 29, 2018

I've put this and another fix related to this and #573 in my cni-fsnotify branch while we wait for that ocicni PR.

@tkellen
Copy link
Contributor Author

tkellen commented Jan 31, 2018

Thank you @ijc! If you'd be willing to give me a bit of guidance about how I could test this I'd be glad to verify if this fixes the issue. If that seems like too much work I'll keep my eyes open for the next release that includes these changes and report my findings then.

ijc pushed a commit to ijc/ocicni that referenced this issue Feb 1, 2018
Even if a valid configuration is found at start of day we still need to watch
for updates to that config.

This is best achieved (to avoid racing with the watcher) by moving the initial
sync into `monitorNetDir`.

Discovered via containerd/cri#545.

Signed-off-by: Ian Campbell <ijc@docker.com>
@ijc
Copy link
Contributor

ijc commented Feb 1, 2018

@tkellen In your usual cri-containerd git tree it should be enough to do git pull https://github.com/ijc/cri-containerd cni-fsnotify which should fetch and merge my WIP branch into your current branch.

Or you could do:

git fetch https://github.com/ijc/cri-containerd cni-fsnotify
git checkout -b testing-ijc-thing FETCH_HEAD

To checkout exactly my branch into a new local branch called testing-ijc-thing (naming is hard...).

Random-Liu pushed a commit to Random-Liu/ocicni that referenced this issue Feb 2, 2018
Even if a valid configuration is found at start of day we still need to watch
for updates to that config.

This is best achieved (to avoid racing with the watcher) by moving the initial
sync into `monitorNetDir`.

Discovered via containerd/cri#545.

Signed-off-by: Ian Campbell <ijc@docker.com>
@Random-Liu Random-Liu added this to the v1.0.0-rc.0 milestone Feb 2, 2018
@tkellen
Copy link
Contributor Author

tkellen commented Apr 29, 2018

I am still seeing this behavior. Could we re-open this? I'm planning to provide some detailed logging output on this in the next day or two.

@Random-Liu
Copy link
Member

@tkellen Yeah, it is required now. We dicussed with the networking team, and they think that dynamically change CNI config should not be supported.

If you really want to do it in 2 steps, e.g. one daemonset to deploy the initial cni config, the second change the CIDR, can you let the first daemonset put the cni config in a different directory, and let the second one changes the cni config, and copy it to /opt/net.d/cni. This seems to be the right way to achieve what you want.

@tkellen
Copy link
Contributor Author

tkellen commented Apr 30, 2018

Thanks for the prompt reply @Random-Liu! You're right, supporting dynamically changing config doesn't make any sense. For what it is worth, from the perspective of someone operating kube-router, the configuration is not changing dynamically, it's just briefly in a bootstrapping state.

Would you accept a PR that defers the read on a new configuration file appearing for a few seconds?

If that seems like a colossal hack to you, I'll accept that this is a wrinkle in the implementation of kube-router that should be smoothed downstream.

Do you have any thoughts on this @murali-reddy?

@Random-Liu
Copy link
Member

@tkellen Let's reopen this issue.

We can either fix kube-router to not rely on this behavior, or change containerd to support dynamic cni config loading.

@tkellen Since you said that kube-router needs to do multi-stage cni config initialization, is it possible to let kube-router do the initialization in a different directory, and run a container to move the config /etc/net.d/cni after the initialization is done? Given that we don't want cni config to be changed during the cluster is being used, this seems to be a safer way to do initialization.

@abhi Are you ok with changing containerd to support dynamic cni config loading? Last time I discussed with @freehan, and he also didn't think we should support dynamic loading.

@Random-Liu Random-Liu reopened this May 1, 2018
@Random-Liu Random-Liu removed this from the v1.0.0-rc.0 milestone May 1, 2018
@abhi
Copy link
Member

abhi commented May 1, 2018

@Random-Liu I still don't think we should do dynamic config behind the scenes.
What we could do is add a config option for cri-containerd --cli-config-load=dynamic,static, which would Load the cni config on every status check. I think that would be a controlled way of supporting it for users that want that behavior. WDYT ?

@mikebrow
Copy link
Member

mikebrow commented May 2, 2018

SGTM but maybe we could just setup a file system notifier if dynamic mode..

@abhi
Copy link
Member

abhi commented Jun 22, 2018

Fixed by #825

@abhi abhi closed this as completed Jun 22, 2018
@tkellen
Copy link
Contributor Author

tkellen commented Jun 25, 2018

Thanks all! Looking forward to testing this out. Apologies I couldn't be more directly helpful.

@tkellen
Copy link
Contributor Author

tkellen commented Jun 29, 2018

Confirmed resolved! Thanks again 💃

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants