Fix cilium installation in GCloud beta "rapid" channel #9959

joestringer · 2020-01-24T01:22:20Z

If the nodeinit script runs somewhere where the BPFFS is already
mounted, then it is highly likely that existing infrastructure handles
the auto-mounting of the BPF filesystem to /sys/fs/bpf, for instance
because they are running systemd 239 or later.

In this case, don't always create the BPFFS mount unit configuration for
systemd, otherwise systemd may reject it with:

Failed to start sys-fs-bpf.mount:
Unit sys-fs-bpf.mount has a bad unit file setting.

This is because systemd has deemed the BPFFS mounting to be within its
own domain of control, and that others should not configure it.

This caused issues in beta gcloud environments where the nodeinit would
fail because the systemd mount unit would not mount. This prevented
configuration of the kubelet on each node, meaning that Cilium would not
be configured as the CNI in the environment. However, the Cilium DS
itself would run, leading to situations where pods would attempt to
connect to remote services (eg kube-dns to the APIserver) and fail to
connect in an environment where the Cilium agents themselves otherwise
appear to be healthy. A cursory investigation of cilium endpoint list
in such environments would clearly show that no pods are being managed
by Cilium.

Fixes: #9556

This change is

maintainer-s-little-helper · 2020-01-24T01:22:22Z

Release note label not set, please set the appropriate release note.

maintainer-s-little-helper · 2020-01-24T01:22:22Z

Release note label not set, please set the appropriate release note.

maintainer-s-little-helper · 2020-01-24T01:22:22Z

Release note label not set, please set the appropriate release note.

If the nodeinit script runs somewhere where the BPFFS is already mounted, then it is highly likely that existing infrastructure handles the auto-mounting of the BPF filesystem to /sys/fs/bpf, for instance because they are running systemd 239 or later. In this case, don't always create the BPFFS mount unit configuration for systemd, otherwise systemd may reject it with: Failed to start sys-fs-bpf.mount: Unit sys-fs-bpf.mount has a bad unit file setting. This is because systemd has deemed the BPFFS mounting to be within its own domain of control, and that others should not configure it. This caused issues in beta gcloud environments where the nodeinit would fail because the systemd mount unit would not mount. This prevented configuration of the kubelet on each node, meaning that Cilium would not be configured as the CNI in the environment. However, the Cilium DS itself would run, leading to situations where pods would attempt to connect to remote services (eg kube-dns to the APIserver) and fail to connect in an environment where the Cilium agents themselves otherwise appear to be healthy. A cursory investigation of `cilium endpoint list` in such environments would clearly show that no pods are being managed by Cilium. Fixes: cilium#9556 Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer · 2020-01-24T01:24:00Z

test-me-please

EDIT: Only these failed:

Suite-k8s-1.11.K8sHealthTest checks cilium-health status between nodes
Suite-k8s-1.17.K8sPolicyTest Basic Test Redirects traffic to proxy when no policy is applied with proxy-visibility annotation Tests DNS proxy visibility without policy

So, not consistently (1 per k8s version), and not related to BPFFS (restore etc). I'm not even sure where nodeinit is used in the CI. k8s tests with other versions all passed though so I think it's good from CI perspective.

coveralls · 2020-01-24T01:46:15Z

Coverage increased (+0.002%) to 45.976% when pulling 79d2ef2 on joestringer:submit/fix-gke-rapid into b2c9b07 on cilium:master.

joestringer added pending-review sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. labels Jan 24, 2020

joestringer requested a review from a team January 24, 2020 01:22

maintainer-s-little-helper bot added the dont-merge/needs-release-note label Jan 24, 2020

maintainer-s-little-helper bot added this to In progress in 1.7.0 Jan 24, 2020

maintainer-s-little-helper bot added this to Needs backport from master in 1.6.6 Jan 24, 2020

joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jan 24, 2020

maintainer-s-little-helper bot removed the dont-merge/needs-release-note label Jan 24, 2020

joestringer changed the title ~~helm: Make nodeinit systemd mountpoint conditional~~ Fix cilium installation in GCloud beta "rapid" channel Jan 24, 2020

joestringer force-pushed the submit/fix-gke-rapid branch from 60f8497 to 79d2ef2 Compare January 24, 2020 01:23

tgraf approved these changes Jan 24, 2020

View reviewed changes

tgraf merged commit 1ab474d into cilium:master Jan 24, 2020

1.7.0 automation moved this from In progress to Merged Jan 24, 2020

joestringer deleted the submit/fix-gke-rapid branch January 24, 2020 17:59

aanm added backport-pending/1.6 and removed needs-backport/1.6 labels Jan 31, 2020

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.6 Jan 31, 2020

aanm mentioned this pull request Jan 31, 2020

v1.6 backports 2020-01-31 #10007

Merged

joestringer mentioned this pull request Jan 31, 2020

nodeinit/templates: fix indentation of sys-fs-bpf #10008

Merged

aanm added backport-done/1.6 and removed backport-pending/1.6 labels Feb 3, 2020

maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.6 Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cilium installation in GCloud beta "rapid" channel #9959

Fix cilium installation in GCloud beta "rapid" channel #9959

joestringer commented Jan 24, 2020 •

edited

maintainer-s-little-helper bot commented Jan 24, 2020

maintainer-s-little-helper bot commented Jan 24, 2020

maintainer-s-little-helper bot commented Jan 24, 2020

joestringer commented Jan 24, 2020 •

edited

coveralls commented Jan 24, 2020 •

edited

Fix cilium installation in GCloud beta "rapid" channel #9959

Fix cilium installation in GCloud beta "rapid" channel #9959

Conversation

joestringer commented Jan 24, 2020 • edited

maintainer-s-little-helper bot commented Jan 24, 2020

maintainer-s-little-helper bot commented Jan 24, 2020

maintainer-s-little-helper bot commented Jan 24, 2020

joestringer commented Jan 24, 2020 • edited

coveralls commented Jan 24, 2020 • edited

joestringer commented Jan 24, 2020 •

edited

joestringer commented Jan 24, 2020 •

edited

coveralls commented Jan 24, 2020 •

edited