Skip to content

Commit

Permalink
Prevent unmanaged pods in GKE's containerd flavors
Browse files Browse the repository at this point in the history
The changes that we have been doing to /etc/defaults/kubelet are reset
on node reboots, as is apparently the whole /etc directory --- which
also means that /etc/cni/net.d/05-cilium.conf is removed.

This would not be a problem if the assumption we made that the node taint we
recommend placing on the nodes would come back upon reboots held true, but in
practice it doesn't.

Besides this, it seems that containerd will re-instante its CNI
configuration file, and it will do so way before Cilium has had the
chance to re-run on the node and re-create its CNI configuration,
causing pods to be assigned IPs by the default CNI rather than by Cilium
in the meantime.

This commit attempts at preventing that from happening by observing that
/home/kubernetes/bin/kubelet (i.e. the actual kubelet binary) is kept between
reboots and executed concurrently with containerd by systemd. We leverage on
this empirical observation to replace this file kubelet with a wrapper script
that, under the required conditions, disables containerd, patches its
configuration, removes undesired CNI configuration files, re-enables
containerd and becomes the kubelet.

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Co-authored-by: Alexandre Perrin <alex@kaworu.ch>
Co-authored-by: Chris Tarazi <chris@isovalent.com>
  • Loading branch information
3 people committed Jan 27, 2022
1 parent d3742a6 commit 36585e4
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 8 deletions.
16 changes: 13 additions & 3 deletions install/kubernetes/cilium/files/nodeinit/prestop.bash
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,19 @@ rm -f /tmp/node-init.cilium.io
touch /tmp/node-deinit.cilium.io

{{- if .Values.nodeinit.reconfigureKubelet }}
echo "Changing kubelet configuration to --network-plugin=kubenet"
sed -i "s:--network-plugin=cni\ --cni-bin-dir={{ .Values.cni.binPath }}:--network-plugin=kubenet:g" /etc/default/kubelet
echo "Restarting kubelet..."
# Check if we're running on a GKE containerd flavor.
GKE_KUBERNETES_BIN_DIR="/home/kubernetes/bin"
if [[ -f "${GKE_KUBERNETES_BIN_DIR}/gke" ]] && command -v containerd &>/dev/null; then
CONTAINERD_CONFIG="/etc/containerd/config.toml"
echo "Reverting changes to the containerd configuration"
sed -Ei "s/^\#(\s+conf_template)/\1/g" "${CONTAINERD_CONFIG}"
echo "Removing the kubelet wrapper"
[[ -f "${GKE_KUBERNETES_BIN_DIR}/the-kubelet" ]] && mv "${GKE_KUBERNETES_BIN_DIR}/the-kubelet" "${GKE_KUBERNETES_BIN_DIR}/kubelet"
else
echo "Changing kubelet configuration to --network-plugin=kubenet"
sed -i "s:--network-plugin=cni\ --cni-bin-dir={{ .Values.cni.binPath }}:--network-plugin=kubenet:g" /etc/default/kubelet
fi
echo "Restarting the kubelet"
systemctl restart kubelet
{{- end }}

Expand Down
84 changes: 79 additions & 5 deletions install/kubernetes/cilium/files/nodeinit/startup.bash
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,85 @@ fi
{{- end }}

{{- if .Values.nodeinit.reconfigureKubelet }}
# GKE: Alter the kubelet configuration to run in CNI mode
echo "Changing kubelet configuration to --network-plugin=cni --cni-bin-dir={{ .Values.cni.binPath }}"
mkdir -p {{ .Values.cni.binPath }}
sed -i "s:--network-plugin=kubenet:--network-plugin=cni\ --cni-bin-dir={{ .Values.cni.binPath }}:g" /etc/default/kubelet
echo "Restarting kubelet..."
# Check if we're running on a GKE containerd flavor.
GKE_KUBERNETES_BIN_DIR="/home/kubernetes/bin"
if [[ -f "${GKE_KUBERNETES_BIN_DIR}/gke" ]] && command -v containerd &>/dev/null; then
echo "GKE *_containerd flavor detected..."

# (GKE *_containerd) Upon node restarts, GKE's containerd images seem to reset
# the /etc directory and our changes to the kubelet and Cilium's CNI
# configuration are removed. This leaves room for containerd and its CNI to
# take over pods previously managed by Cilium, causing Cilium to lose
# ownership over these pods. We rely on the empirical observation that
# /home/kubernetes/bin/kubelet is not changed across node reboots, and replace
# it with a wrapper script that performs some initialization steps when
# required and then hands over control to the real kubelet.

# Only create the kubelet wrapper if we haven't previously done so.
if [[ ! -f "${GKE_KUBERNETES_BIN_DIR}/the-kubelet" ]];
then
echo "Installing the kubelet wrapper..."

# Rename the real kubelet.
mv "${GKE_KUBERNETES_BIN_DIR}/kubelet" "${GKE_KUBERNETES_BIN_DIR}/the-kubelet"

# Initialize the kubelet wrapper which lives in the place of the real kubelet.
touch "${GKE_KUBERNETES_BIN_DIR}/kubelet"
chmod a+x "${GKE_KUBERNETES_BIN_DIR}/kubelet"

# Populate the kubelet wrapper. It will perform the initialization steps we
# need and then become the kubelet.
cat <<'EOF' | tee "${GKE_KUBERNETES_BIN_DIR}/kubelet"
#!/bin/bash
set -euo pipefail
CNI_CONF_DIR="/etc/cni/net.d"
CONTAINERD_CONFIG="/etc/containerd/config.toml"
# Only stop and start containerd if the Cilium CNI configuration does not exist,
# or if the 'conf_template' property is present in the containerd config file,
# in order to avoid unnecessarily restarting containerd.
if [[ -z "$(find "${CNI_CONF_DIR}" -type f -name '*cilium*')" || \
"$(grep -cE '^\s+conf_template' "${CONTAINERD_CONFIG}")" != "0" ]];
then
# Stop containerd as it starts by creating a CNI configuration from a template
# causing pods to start with IPs assigned by GKE's CNI.
# 'disable --now' is used instead of stop as this script runs concurrently
# with containerd on node startup, and hence containerd might not have been
# started yet, in which case 'disable' prevents it from starting.
echo "Disabling and stopping containerd"
systemctl disable --now containerd
# Remove any pre-existing files in the CNI configuration directory. We skip
# any possibly existing Cilium configuration file for the obvious reasons.
echo "Removing undesired CNI configuration files"
find "${CNI_CONF_DIR}" -type f -not -name '*cilium*' -exec rm {} \;
# As mentioned above, the containerd configuration needs a little tweak in
# order not to create the default CNI configuration, so we update its config.
echo "Fixing containerd configuration"
sed -Ei 's/^(\s+conf_template)/\#\1/g' "${CONTAINERD_CONFIG}"
# Start containerd. It won't create it's CNI configuration file anymore.
echo "Enabling and starting containerd"
systemctl enable --now containerd
fi
# Become the real kubelet, and pass it some additionally required flags (and
# place these last so they have precedence).
exec /home/kubernetes/bin/the-kubelet "${@}" --network-plugin=cni --cni-bin-dir={{ .Values.cni.binPath }}
EOF
else
echo "Kubelet wrapper already exists, skipping..."
fi
else
# (Generic) Alter the kubelet configuration to run in CNI mode
echo "Changing kubelet configuration to --network-plugin=cni --cni-bin-dir={{ .Values.cni.binPath }}"
mkdir -p {{ .Values.cni.binPath }}
sed -i "s:--network-plugin=kubenet:--network-plugin=cni\ --cni-bin-dir={{ .Values.cni.binPath }}:g" /etc/default/kubelet
fi
echo "Restarting the kubelet..."
systemctl restart kubelet
{{- end }}

Expand Down

0 comments on commit 36585e4

Please sign in to comment.