[v1.10] Backport #18486 #18737

bmcustodio · 2022-02-08T11:23:01Z

Backport of #18486 to v1.10 (cc @joamaki).

The changes that we have been doing to /etc/defaults/kubelet are reset on node reboots, as is apparently the whole /etc directory --- which also means that /etc/cni/net.d/05-cilium.conf is removed. This would not be a problem if the assumption we made that the node taint we recommend placing on the nodes would come back upon reboots held true, but in practice it doesn't. Besides this, it seems that containerd will re-instante its CNI configuration file, and it will do so way before Cilium has had the chance to re-run on the node and re-create its CNI configuration, causing pods to be assigned IPs by the default CNI rather than by Cilium in the meantime. This commit attempts at preventing that from happening by observing that /home/kubernetes/bin/kubelet (i.e. the actual kubelet binary) is kept between reboots and executed concurrently with containerd by systemd. We leverage on this empirical observation to replace this file kubelet with a wrapper script that, under the required conditions, disables containerd, patches its configuration, removes undesired CNI configuration files, re-enables containerd and becomes the kubelet. [ upstream commit 36585e4 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Co-authored-by: Chris Tarazi <chris@isovalent.com>

To prevent situations in which the GKE node is forcibly stopped and re-created from causing unmanaged pods, and building on the observation that the node comes back with the same name and pods are already scheduled there, we change the recommended taint effect from NoSchedule to NoExecute, to cause any previously scheduled pods to be evicted, preventing them from getting IPs assigned by the default CNI. This should not impact other environments due to the nature of 'NoExecute', so we recommend it everywhere. [ upstream commit b049574 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Co-authored-by: Tam Mach <sayboras@yahoo.com>

joamaki · 2022-02-08T15:09:41Z

/test-backport-1.10

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sLRPTests Checks local redirect policy LRP connectivity

Failure Output

FAIL: Cannot retrieve services on cilium pod

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.20-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sLRPTests Checks local redirect policy LRP connectivity

Failure Output

FAIL: Cannot retrieve services on cilium pod

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.20-kernel-4.19 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sLRPTests Checks local redirect policy LRP connectivity

Failure Output

FAIL: Cannot retrieve services on cilium pod

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

qmonnet · 2022-03-28T12:00:21Z

@bmcustodio What's the status on this please? Was it expected to drop the backport? Can we please update the backport-pending/1.10 label accordingly on the original PR?

qmonnet · 2022-03-28T12:04:45Z

Apologies, I see this was backported in #18835 instead, sorry for the noise.

bmcustodio and others added 2 commits February 8, 2022 11:19

bmcustodio requested a review from a team as a code owner February 8, 2022 11:23

maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Feb 8, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.10] Backport #18486 #18737

[v1.10] Backport #18486 #18737

bmcustodio commented Feb 8, 2022 •

edited

joamaki commented Feb 8, 2022 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

Test Name

Failure Output

qmonnet commented Mar 28, 2022

qmonnet commented Mar 28, 2022

[v1.10] Backport #18486 #18737

[v1.10] Backport #18486 #18737

Conversation

bmcustodio commented Feb 8, 2022 • edited

joamaki commented Feb 8, 2022 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

Test Name

Failure Output

qmonnet commented Mar 28, 2022

qmonnet commented Mar 28, 2022

bmcustodio commented Feb 8, 2022 •

edited

joamaki commented Feb 8, 2022 •

edited by maintainer-s-little-helper bot