Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.12 backports 2022-11-22 #22308

Merged
merged 14 commits into from
Nov 24, 2022
Merged

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented Nov 22, 2022

Skipped:

Once this PR is merged, you can update the PR labels via:

$ for pr in 22137 22178 22075 22232 22206 22300 22298 22213 22069 22127; do contrib/backporting/set-labels.py $pr done 1.12; done

@tklauser tklauser requested a review from a team as a code owner November 22, 2022 12:56
@tklauser tklauser added backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Nov 22, 2022
@tklauser
Copy link
Member Author

tklauser commented Nov 22, 2022

/test-backport-1.12

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sIstioTest Istio Bookinfo Demo Tests bookinfo inter-service connectivity

Failure Output

FAIL: Pods are not ready after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes from #22075 LGTM. 🚀

@tklauser
Copy link
Member Author

tklauser commented Nov 23, 2022

Waiting for #22328 to be merged to fix the GitHub action failures. Rebased

aanm and others added 11 commits November 23, 2022 11:50
[ upstream commit 4e7209f ]

By not returning a specific error message in case of an error, it makes
it difficult to find out on which location the Cilium agent has failed
to start.

Fixes: 8941e96 ("datapath: Fix race with a deleted device after detection")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2fe3a92 ]

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2ae1c68 ]

Cilium operator would crash when being brought up in an AWS region where
there was a IPv6-only ENI and no subnet filters, because it would fail
to parse the ENI (logs will show "ENI has no IP address" and "Initial
synchronization with instances API failed").

We work around this issue for the moment by filtering the network
interfaces we fetch from AWS with 'private-ip-addresses=*', which
includes all ENIs with any value in the PrivateIpAddress field. This is
the field `parseENI` complains about otherwise.

In general, though, it seems that the ENI IPAM mode needs to learn to
handle IPv6 ENIs. That will not be a small undertaking, so we fix the
obvious bug for now.

Co-authored-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2fd0c56 ]

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit bf3532e ]

This package is imported as a transitive dependency in cilium-cli which
is built for linux, darwin and windows. Make sure the package compiles
on all these platforms.

Ref. cilium/cilium-cli#958 (comment)
For cilium#16843

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 933bdcb ]

Signed-off-by: flxman <felix.farjsjo@gmail.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit e3b0095 ]

We do this by removing the extraneous "cilium_" prefix from the metrics
to align with the other metrics names in this file.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit f072dbd ]

This metrics were incorrectly stating that they were enabled by default
which confused users. Fix it to mention they are disabled by default and
must be enabled explicitly via --metrics.

Fixes: 1133bd5 ("docs: Added `Default` column in metrics details")
Fixes: cilium#20255

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3a5e985 ]

Retrieving objects from caches can be useful to prevent doing useless
requests to kube-apiserver. In the unlikely event that the object
doesn't exist in the local cache Cilium can try to retrieve it from
kube-apiserver directly. For this particular case, with CiliumNode, it
is causing Cilium to fatal as it is unable to retrieve CiliumNode from
the cache, due subsystem initialization issues, thus we will fallback on
retrieving the object directly from kube-apiserver.

In this case, the subsystem initialization issue happened due to the
fact that CiliumNode watcher is blocked on its event handler by the
egressGatewayManager [1] which is blocked by the initialization of the
identity allocator [2]. Unfortunately, the identity allocator is only
initialized at a later stage causing the CiliumNode cache from being
populated with all of its nodes.

[1] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/k8s/watchers/cilium_node.go#L56
[2] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/egressgateway/manager.go#L83

Fixes: 69e4c69 ("k8s: optimize API calls made to kube-apiserver")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 6c98f15 ]

When CEP was converted to an internal CEP structure, the UID
field was not copied, causing the delete requests of CEPs to have their
UID precondition set as empty. When kube-apiserver received this delete
request it didn't delete the CEP because an empty CEP UID didn't match
an existent UID.

Fixes: 6f7bf6c ("Prevent CiliumEndpoint removal by non-owning agent")

Reported-by: Bruno Custódio <bruno@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3a650c3 ]

When we know the encryption interface, we can jump directly from
bpf_host to that interface using bpf_redirect. For that to work, we
however need to rewrite the MAC addresses. This is currently done in
bpf_host with a FIB lookup to retrieve the MAC addresses.

The performance gain we get from that redirect is however expected to be
negligible because we already traversed the stack several times for
IPsec and we also spent a fair amount of cycles just encrypting the
payloads.

This commit therefore removes the redirect and related FIB lookup. This
change makes the logic for IPsec a little simpler (less error cases
without the FIB lookup). It also makes the logic more consistent across
setups (the FIB lookup was currently only possible on AKS & GKE).
Finally, a later change to IPsec will break the FIB lookup on AKS
anyway.

Signed-off-by: Paul Chaignon <paul@cilium.io>
@tklauser
Copy link
Member Author

/test-backport-1.12

@jrajahalme
Copy link
Member

Looks like encryption tests are failing in most cases:

Suite-k8s-1.16.K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing
Suite-k8s-1.16.K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing with bpf_host

Most likely caused by the last commit.

@pchaigno
Copy link
Member

Looks like encryption tests are failing in most cases:

Suite-k8s-1.16.K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing
Suite-k8s-1.16.K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing with bpf_host

Most likely caused by the last commit.

They all appear to be known flake #21735 (identified by Mismatch of router IPs found during restoration in logs), which was fixed by #22127 in master. We could pick up the backport here if we want to be sure.

@jrajahalme
Copy link
Member

They all appear to be known flake #21735 (identified by Mismatch of router IPs found during restoration in logs), which was fixed by #22127 in master. We could pick up the backport here if we want to be sure.

Sure, let's do it!

[ upstream commit 0696874 ]

When there is an annotation in the k8s node object, the annotation
`io.cilium.network.ipv4-cilium-host` is used as the CiliumInternal IP
address of the CiliumNode object in [1]. Whenever Cilium is updating any
state into the CiliumNode it retrieves all IP address from k8s node,
including the ones from annotations, and appends the local node's IP
addresses, including the newly correct internal / router IP
address, in [2]. Since this is a list, the annotation's IP address is
always used first and all other Cilium agents will wrongly use it for
any operation.

[1] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L453-L459
[2] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L474-L489

Fixes: 73d6cae ("install: default AnnotateK8sNode to false")
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 1e947e9 ]

When using CiliumNode, the agent's source of truth should be the agent
itself and not k8s node annotations. Thus we will not use the
annotations for the CiliumInternalIP address when generating a
CiliumNode from the k8s Node resource.

Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit ee4ea1a ]

We try to restore the router IP both from the filesystem (first) and
from Kubernetes objects (as a fallback). If the two IP addresses don't
match, we emit a warning.

There is no good reason for this to happen in CI so we should fail the
test if that warning ever shows up. Doing so would have prevented the
flake fixed by the previous commit.

Signed-off-by: Paul Chaignon <paul@cilium.io>
@jrajahalme
Copy link
Member

Added backport of #22127

@jrajahalme
Copy link
Member

/test-backport-1.12

@jrajahalme
Copy link
Member

Net next failed with known flake #18054

@jrajahalme
Copy link
Member

ConformanceAKS hitting known flake/bug #21430/#22162

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 24, 2022
Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only known CI flakes, LGTM.

@jrajahalme jrajahalme merged commit 170f197 into cilium:v1.12 Nov 24, 2022
@tklauser tklauser deleted the pr/v1.12-backport-2022-11-22 branch November 25, 2022 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants