New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent unmanaged pods in GKE's containerd flavors #18486
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of question, but overall LGTM.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@nebril can you point to which PR did the backports of this PR for v1.11? Or switch the labels back if this was not backported to 1.11 yet? EDIT: Found it, see the link below. |
@bmcustodio Please consider updating the official upgrade guide in future when communicating to users about necessary steps for them to take during upgrade; users are likely to miss notes marked "important" in the release notes unless both we and they are paying close attention. For v1.11.2 I will call it out explicitly in the announcements. |
Hmm, actually looking back again at this PR I think that in the announcement I miscommunicated this because I assumed that the advice was specific to GKE users. It looks like all users must take action regarding the taints. Can you confirm? So far this is only part of v1.11.2 so shouldn't have a huge impact but I am concerned about communicating this clearly so we don't end up having to field a large number of questions from confused users about why the taints are different than they were before. |
Yes, this is correct ;) |
The changes that we have been doing to
/etc/defaults/kubelet
are reset on node reboots, as is apparently the whole/etc
directory — which also means that/etc/cni/net.d/05-cilium.conf
is removed.This would not be a problem if the assumption we made that the node taint we recommend placing on the nodes would come back upon reboots held true, but in practice it doesn't.
Besides this, it seems that containerd will re-instante its CNI configuration file, and it will do so way before Cilium has had the chance to re-run on the node and re-create its CNI configuration, causing pods to be assigned IPs by the default CNI rather than by Cilium in the meantime.
This commit attempts at preventing that from happening by observing that
/home/kubernetes/bin/kubelet
(i.e. the actual kubelet binary) is kept between reboots and executed concurrently with containerd by systemd. We leverage on this empirical observation to replace this file kubelet with a wrapper script that, under the required conditions, disables containerd, patches its configuration, removes undesired CNI configuration files, re-enables containerd and becomes the kubelet.Additionally, to prevent situations in which the GKE node is forcibly stopped and re-created from causing unmanaged pods, and building on the observation that the node comes back with the same name and pods are already scheduled there, we change the recommended taint effect from
NoSchedule
toNoExecute
, to cause any previously scheduled pods to be evicted, preventing them from getting IPs assigned by the default CNI.This should not impact other environments due to the nature ofNoExecute
, so we recommend it everywhere.