-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.12 backports 2022-11-22 #22308
v1.12 backports 2022-11-22 #22308
Conversation
/test-backport-1.12 Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes from #22075 LGTM. 🚀
|
[ upstream commit 4e7209f ] By not returning a specific error message in case of an error, it makes it difficult to find out on which location the Cilium agent has failed to start. Fixes: 8941e96 ("datapath: Fix race with a deleted device after detection") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2fe3a92 ] Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2ae1c68 ] Cilium operator would crash when being brought up in an AWS region where there was a IPv6-only ENI and no subnet filters, because it would fail to parse the ENI (logs will show "ENI has no IP address" and "Initial synchronization with instances API failed"). We work around this issue for the moment by filtering the network interfaces we fetch from AWS with 'private-ip-addresses=*', which includes all ENIs with any value in the PrivateIpAddress field. This is the field `parseENI` complains about otherwise. In general, though, it seems that the ENI IPAM mode needs to learn to handle IPv6 ENIs. That will not be a small undertaking, so we fix the obvious bug for now. Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2fd0c56 ] Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit bf3532e ] This package is imported as a transitive dependency in cilium-cli which is built for linux, darwin and windows. Make sure the package compiles on all these platforms. Ref. cilium/cilium-cli#958 (comment) For cilium#16843 Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 933bdcb ] Signed-off-by: flxman <felix.farjsjo@gmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit e3b0095 ] We do this by removing the extraneous "cilium_" prefix from the metrics to align with the other metrics names in this file. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit f072dbd ] This metrics were incorrectly stating that they were enabled by default which confused users. Fix it to mention they are disabled by default and must be enabled explicitly via --metrics. Fixes: 1133bd5 ("docs: Added `Default` column in metrics details") Fixes: cilium#20255 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3a5e985 ] Retrieving objects from caches can be useful to prevent doing useless requests to kube-apiserver. In the unlikely event that the object doesn't exist in the local cache Cilium can try to retrieve it from kube-apiserver directly. For this particular case, with CiliumNode, it is causing Cilium to fatal as it is unable to retrieve CiliumNode from the cache, due subsystem initialization issues, thus we will fallback on retrieving the object directly from kube-apiserver. In this case, the subsystem initialization issue happened due to the fact that CiliumNode watcher is blocked on its event handler by the egressGatewayManager [1] which is blocked by the initialization of the identity allocator [2]. Unfortunately, the identity allocator is only initialized at a later stage causing the CiliumNode cache from being populated with all of its nodes. [1] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/k8s/watchers/cilium_node.go#L56 [2] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/egressgateway/manager.go#L83 Fixes: 69e4c69 ("k8s: optimize API calls made to kube-apiserver") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 6c98f15 ] When CEP was converted to an internal CEP structure, the UID field was not copied, causing the delete requests of CEPs to have their UID precondition set as empty. When kube-apiserver received this delete request it didn't delete the CEP because an empty CEP UID didn't match an existent UID. Fixes: 6f7bf6c ("Prevent CiliumEndpoint removal by non-owning agent") Reported-by: Bruno Custódio <bruno@isovalent.com> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3a650c3 ] When we know the encryption interface, we can jump directly from bpf_host to that interface using bpf_redirect. For that to work, we however need to rewrite the MAC addresses. This is currently done in bpf_host with a FIB lookup to retrieve the MAC addresses. The performance gain we get from that redirect is however expected to be negligible because we already traversed the stack several times for IPsec and we also spent a fair amount of cycles just encrypting the payloads. This commit therefore removes the redirect and related FIB lookup. This change makes the logic for IPsec a little simpler (less error cases without the FIB lookup). It also makes the logic more consistent across setups (the FIB lookup was currently only possible on AKS & GKE). Finally, a later change to IPsec will break the FIB lookup on AKS anyway. Signed-off-by: Paul Chaignon <paul@cilium.io>
57824f0
to
6c5a1e3
Compare
/test-backport-1.12 |
Looks like encryption tests are failing in most cases:
Most likely caused by the last commit. |
They all appear to be known flake #21735 (identified by |
[ upstream commit 0696874 ] When there is an annotation in the k8s node object, the annotation `io.cilium.network.ipv4-cilium-host` is used as the CiliumInternal IP address of the CiliumNode object in [1]. Whenever Cilium is updating any state into the CiliumNode it retrieves all IP address from k8s node, including the ones from annotations, and appends the local node's IP addresses, including the newly correct internal / router IP address, in [2]. Since this is a list, the annotation's IP address is always used first and all other Cilium agents will wrongly use it for any operation. [1] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L453-L459 [2] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L474-L489 Fixes: 73d6cae ("install: default AnnotateK8sNode to false") Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 1e947e9 ] When using CiliumNode, the agent's source of truth should be the agent itself and not k8s node annotations. Thus we will not use the annotations for the CiliumInternalIP address when generating a CiliumNode from the k8s Node resource. Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit ee4ea1a ] We try to restore the router IP both from the filesystem (first) and from Kubernetes objects (as a fallback). If the two IP addresses don't match, we emit a warning. There is no good reason for this to happen in CI so we should fail the test if that warning ever shows up. Doing so would have prevented the flake fixed by the previous commit. Signed-off-by: Paul Chaignon <paul@cilium.io>
Added backport of #22127 |
/test-backport-1.12 |
Net next failed with known flake #18054 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only known CI flakes, LGTM.
Skipped:
to backport this one yourself separately?
Once this PR is merged, you can update the PR labels via: