-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-03-07 #19062
v1.10 backports 2022-03-07 #19062
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks Tam!
/test-backport-1.10 |
/test-runtime |
Runtime is failing with |
Thanks to Paul for pointing out #18919 |
Runtime failure is mentioned to linked issue above #18919. |
/test-runtime |
[ upstream commit 71452d5 ] Building the BPF datapath with LLVM 14 leads to the following errors: bpf_lxc.c:101:16: error: variable 'daddr' set but not used [-Werror,-Wunused-but-set-variable] union v6addr *daddr, orig_dip; ^ bpf_lxc.c:103:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:102:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ bpf_lxc.c:526:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:525:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ These are normally warnings, but errors in this case due to the use of -Werror when compiling Cilium's bpf programs. Fix these by marking the affected variables as __maybe_unused. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 3cb9ba1 ] In the bpf_lxc program's functions ipv6_l3_from_lxc and handle_ipv4_from_lxc, currently encrypt_key is always looked up in the encrypt map, regardless of whether IPsec is enabled or not. However, its value is only actually used when IPsec is enabled. Thus, the call can be avoid when IPsec is disabled. This also slightly reduces program size if !defined(ENABLE_IPSEC). Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 18b10b4 ] When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 6c15df9 ] Checking for the existence of the .Values.nodeinit.bootstrapFile file will be a no-op because the file is created by kubelet if it does not exist. Instead, we should check if the file has some contents inside of it which is when we can be sure the node-init DaemonSet has started. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit ea9fd6f ] Turns out that in GKE's 'cos' images the 'containerd' binary is still present even though '/etc/containerd/config.toml' is not. Hence, the kubelet wrapper would still be installed for these images according to the current check, even though it's not necessary. What's worse, starting the kubelet would fail because the 'sed' command targeting the aforementioned file would fail. This PR changes the check to rely on the presence of the '--container-runtime-endpoint' flag in the kubelet, which is probably a more reliable way of detecting '*_containerd' flavours and only applying the fix in these cases. Fixes cilium#19015. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit e396b5e ] The documentation page about setting up Hubble observability wrongly states that TCP port 4245 needs to be open on all nodes running Cilium to allow Hubble Relay to operate correctly. This is incorrect. Port 4245 is actually the default port used by Hubble Relay, which is a regular deployment and doesn't require any particular action from the user. However, Hubble server uses port 4244 by default and, given that it is embedded in the Cilium agent and uses a host port, it requires it to be open on all nodes to allow Hubble Relay to connect to each Hubble server instance. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Tam Mach <tam.mach@isovalent.com>
fca8dd1
to
f64a507
Compare
Force pushed to resolve the conflict with v1.10 branch |
/test-backport-1.10 Job 'Cilium-PR-K8s-1.19-kernel-5.4' hit: #17353 (91.08% similarity) Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
Both 4.9 tests are failing with the following error:
Could this be related to the changes in the PR or is this a flake? Edit: this is #13071. |
/ci-multicluster-1.10 Multicluster workflow had a node preemption during the test, re-running. |
/test-1.19-5.4 Connectivity test to the internet failed to connect. |
/test-1.18-4.9 |
/test-1.19-4.9 |
Reviews are in, tests are passing except for codeql (not required, new reports due to updates in codeql) and known runtime job breakage #18919 . Merging, thanks! |
Once this PR is merged, you can update the PR labels via: