v1.10 backports 2022-03-07 #19062

sayboras · 2022-03-07T11:15:42Z

bpf: avoid encrypt_key map lookup if IPsec is disabled #17840 -- bpf: avoid encrypt_key map lookup if IPsec is disabled (@tklauser). ⚠️ Please review due to conflicts
ipam/crd: Fix spurious CiliumNode update status failures #17856 -- ipam/crd: Fix spurious CiliumNode update status failures (@gandro)
helm: check for contents of bootstartFile #18897 -- helm: check for contents of bootstartFile (@aanm) ⚠️ Please review due to conflicts
Fix 'node-init' in GKE's 'cos' images. #19017 -- Fix 'node-init' in GKE's 'cos' images. (@bmcustodio) ⚠️ Please review due to conflicts
docs: fix tip about opening the Hubble server port on all nodes #19036 -- docs: fix tip about opening the Hubble server port on all nodes (@rolinh)

Once this PR is merged, you can update the PR labels via:

$ for pr in 17840 17856 18897 19017 19036; do contrib/backporting/set-labels.py $pr done 1.10; done

gandro

Thanks!

tklauser

My changes look good, thanks Tam!

sayboras · 2022-03-07T12:27:46Z

/test-backport-1.10

install/kubernetes/cilium/templates/cilium-agent-daemonset.yaml

sayboras · 2022-03-08T05:03:57Z

/test-runtime

gandro · 2022-03-08T08:49:09Z

Runtime is failing with 05:54:23 runtime: Mar 08 05:54:19 runtime cilium-agent[20397]: level=fatal msg="Envoy: Binary \"cilium-envoy\" cannot be executed" error="exit status 1" subsys=envoy-manager - unclear what commit would have caused that. Have we seen similar issues on the v1.10 branch before?

sayboras · 2022-03-08T10:09:51Z

Thanks to Paul for pointing out #18919

sayboras · 2022-03-09T12:25:18Z

Runtime failure is mentioned to linked issue above #18919.
CodeQL failure is not related to any change in this PR.

aanm · 2022-03-09T21:36:46Z

/test-runtime

[ upstream commit 71452d5 ] Building the BPF datapath with LLVM 14 leads to the following errors: bpf_lxc.c:101:16: error: variable 'daddr' set but not used [-Werror,-Wunused-but-set-variable] union v6addr *daddr, orig_dip; ^ bpf_lxc.c:103:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:102:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ bpf_lxc.c:526:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:525:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ These are normally warnings, but errors in this case due to the use of -Werror when compiling Cilium's bpf programs. Fix these by marking the affected variables as __maybe_unused. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

[ upstream commit 3cb9ba1 ] In the bpf_lxc program's functions ipv6_l3_from_lxc and handle_ipv4_from_lxc, currently encrypt_key is always looked up in the encrypt map, regardless of whether IPsec is enabled or not. However, its value is only actually used when IPsec is enabled. Thus, the call can be avoid when IPsec is disabled. This also slightly reduces program size if !defined(ENABLE_IPSEC). Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

[ upstream commit 18b10b4 ] When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

[ upstream commit 6c15df9 ] Checking for the existence of the .Values.nodeinit.bootstrapFile file will be a no-op because the file is created by kubelet if it does not exist. Instead, we should check if the file has some contents inside of it which is when we can be sure the node-init DaemonSet has started. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

[ upstream commit ea9fd6f ] Turns out that in GKE's 'cos' images the 'containerd' binary is still present even though '/etc/containerd/config.toml' is not. Hence, the kubelet wrapper would still be installed for these images according to the current check, even though it's not necessary. What's worse, starting the kubelet would fail because the 'sed' command targeting the aforementioned file would fail. This PR changes the check to rely on the presence of the '--container-runtime-endpoint' flag in the kubelet, which is probably a more reliable way of detecting '*_containerd' flavours and only applying the fix in these cases. Fixes cilium#19015. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

[ upstream commit e396b5e ] The documentation page about setting up Hubble observability wrongly states that TCP port 4245 needs to be open on all nodes running Cilium to allow Hubble Relay to operate correctly. This is incorrect. Port 4245 is actually the default port used by Hubble Relay, which is a regular deployment and doesn't require any particular action from the user. However, Hubble server uses port 4244 by default and, given that it is embedded in the Cilium agent and uses a host port, it requires it to be open on all nodes to allow Hubble Relay to connect to each Hubble server instance. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Tam Mach <tam.mach@isovalent.com>

sayboras · 2022-03-10T12:25:33Z

Force pushed to resolve the conflict with v1.10 branch

ti-mo · 2022-03-11T11:23:04Z

/test-backport-1.10

Job 'Cilium-PR-K8s-1.19-kernel-5.4' hit: #17353 (91.08% similarity)

Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.18-kernel-4.9 so I can create one.

ti-mo · 2022-03-15T09:08:43Z

Both 4.9 tests are failing with the following error:

The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31414: provided port is already allocated

Could this be related to the changes in the PR or is this a flake?

Edit: this is #13071.

ti-mo · 2022-03-15T09:13:19Z

/ci-multicluster-1.10

Multicluster workflow had a node preemption during the test, re-running.

ti-mo · 2022-03-15T09:18:14Z

/test-1.19-5.4

Connectivity test to the internet failed to connect.

ti-mo · 2022-03-15T13:18:59Z

/test-1.18-4.9

ti-mo · 2022-03-15T13:19:04Z

/test-1.19-4.9

joestringer · 2022-03-17T17:19:44Z

Reviews are in, tests are passing except for codeql (not required, new reports due to updates in codeql) and known runtime job breakage #18919 . Merging, thanks!

sayboras requested a review from a team as a code owner March 7, 2022 11:15

sayboras added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Mar 7, 2022

sayboras requested review from aanm, bmcustodio, gandro, rolinh and tklauser March 7, 2022 11:15

gandro approved these changes Mar 7, 2022

View reviewed changes

tklauser approved these changes Mar 7, 2022

View reviewed changes

rolinh approved these changes Mar 7, 2022

View reviewed changes

aanm requested changes Mar 7, 2022

View reviewed changes

install/kubernetes/cilium/templates/cilium-agent-daemonset.yaml Show resolved Hide resolved

gandro requested a review from aanm March 8, 2022 08:47

aanm approved these changes Mar 9, 2022

View reviewed changes

aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Mar 9, 2022

tklauser and others added 6 commits March 10, 2022 23:22

sayboras force-pushed the pr/v1.10-backport-2022-03-07 branch from fca8dd1 to f64a507 Compare March 10, 2022 12:23

maintainer-s-little-helper bot mentioned this pull request Mar 11, 2022

CI: K8sDatapathConfig Encapsulation Check iptables masquerading with random-fully #17353

Closed

bmcustodio approved these changes Mar 14, 2022

View reviewed changes

sayboras removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Mar 14, 2022

joestringer merged commit 302c805 into cilium:v1.10 Mar 17, 2022

sayboras deleted the pr/v1.10-backport-2022-03-07 branch March 17, 2022 21:15

aanm mentioned this pull request Mar 26, 2022

Prepare for release v1.10.9 #19224

Merged

qmonnet mentioned this pull request Mar 30, 2022

v1.10 backports 2022-03-29 #19254

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.10 backports 2022-03-07 #19062

v1.10 backports 2022-03-07 #19062

sayboras commented Mar 7, 2022 •

edited

gandro left a comment

tklauser left a comment

sayboras commented Mar 7, 2022

sayboras commented Mar 8, 2022

gandro commented Mar 8, 2022

sayboras commented Mar 8, 2022

sayboras commented Mar 9, 2022

aanm commented Mar 9, 2022

sayboras commented Mar 10, 2022

ti-mo commented Mar 11, 2022 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

ti-mo commented Mar 15, 2022 •

edited

ti-mo commented Mar 15, 2022 •

edited

ti-mo commented Mar 15, 2022 •

edited

ti-mo commented Mar 15, 2022

ti-mo commented Mar 15, 2022

joestringer commented Mar 17, 2022

v1.10 backports 2022-03-07 #19062

v1.10 backports 2022-03-07 #19062

Conversation

sayboras commented Mar 7, 2022 • edited

gandro left a comment

Choose a reason for hiding this comment

tklauser left a comment

Choose a reason for hiding this comment

sayboras commented Mar 7, 2022

sayboras commented Mar 8, 2022

gandro commented Mar 8, 2022

sayboras commented Mar 8, 2022

sayboras commented Mar 9, 2022

aanm commented Mar 9, 2022

sayboras commented Mar 10, 2022

ti-mo commented Mar 11, 2022 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

ti-mo commented Mar 15, 2022 • edited

ti-mo commented Mar 15, 2022 • edited

ti-mo commented Mar 15, 2022 • edited

ti-mo commented Mar 15, 2022

ti-mo commented Mar 15, 2022

joestringer commented Mar 17, 2022

sayboras commented Mar 7, 2022 •

edited

ti-mo commented Mar 11, 2022 •

edited by maintainer-s-little-helper bot

ti-mo commented Mar 15, 2022 •

edited

ti-mo commented Mar 15, 2022 •

edited

ti-mo commented Mar 15, 2022 •

edited