v1.14 Backports 2023-11-07 #29030

jibi · 2023-11-07T11:27:21Z

Once this PR is merged, a GitHub action will update the labels of these PRs:

 28840 28787 28808 28884 28923 28704 28898 28941 28886 28927 28745 28966 29006 28642 28795

[ upstream commit 5a0b88d ] Although ipcache might return an error, we should not perform an early return in the event handling of the pod event because some other components that are unrelated to ipcache might depend on this specific event per perform correctly. Instead we will log the ipcache error and let the execution of the method to continue. The pod might, for example, be in a "Pending" state which has its status IP addresses not set but might be able to change labels in the meanwhile which Cilium should react upon those changes which are not related with ipcache. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit da12133 ] `CertManager` throws a warning with the current Helm chart because the `.spec.privateKey.rotationPolicy` is unset. ``` Type Reason Age From Message ---- ------ ---- ---- ------- Warning CannotRegenerateKey 12m cert-manager-certificates-key-manager User intervention required: existing private key in Secret "hubble-relay-client-certs" does not match requirements on Certificate resource, mismatching fields: [spec.privateKey.algorithm[], but cert-manager cannot create new private key as the Certificate's .spec.privateKey.rotationPolicy is unset or set to Never. To allow cert-manager to create a new private key you can set .spec.privateKey.rotationPolicy to 'Always' (this will result in the private key being regenerated every time a cert is renewed) ``` Signed-off-by: Samuel Lang <gh@lang-sam.de> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 904ceb3 ] The Cilium standalone LB does not run as a K8s pod, so the regular Cilium's sysdump collection does not work. Instead, just show docker container logs of the LB. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 28a3cb7 ] l4_load_port() is just a thin wrapper around ctx_load_bytes(), which returns raw kernel errnos. Translate these to a Cilium-internal drop reason before returning to the caller. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 36b7802 ] Bump timeout from 20 to 30 minutes due to repeated workflow cancellations due to timeout on otherwise successful workflow runs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

tklauser

My change looks good, thanks!

daemon/cmd/state.go

[ upstream commit d221e96 ] Cilium does not currently support port ranges in network policies. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

giorio94

My PR looks good. Thanks!

aanm

LGTM for Thomas' PR

[ upstream commit df969b7 ] [ backporter's notes: a few conflicts to deal with, I just started fresh and removed all logic for IPSec node encryption ] Node encryption for IPsec hasn't been supported since 1d2674d ("docs: ipsec: remove node-to-node encryption") and subsequent commits. The feature also wasn't working since several releases. This commit simply removes the code for that feature. This code has no use now and makes changes to IPsec slightly more difficult. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 1900f4a ] [ backporter's notes: a few conflicts to deal with, as the IPSec methods differ from the ones in main, so I just manually moved all the methods from node.go to ipsec.go ] This commit has no functional changes. It simply moves all the linuxNodeHandler functions that pertain to IPsec to a new file, ipsec.go. This will ease review assignments by ensuring that we don't require an IPsec review on non-IPsec code and vice versa. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit c2674ae ] If an Ingress resource with `ingressClass: cilium` is changed to a different value, the corresponding resources (CEC, Endpoints & Service` aren't removed (mode dedicated) or the shared CiliumEnvoyConfig isn't updated (mode shared). Therefore, this commit reflects the changes on the corresponding resources when the `ingressClass` of an Ingress gets updated from `cilium` to something else. Fixes: #23781 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 80d99a6 ] Relates: #18894 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit cf4279c ] [ backporter's note: removed the Group: syncLBMapsControllerGroup parameter from the controller as the group is not defined in v1.14 ] fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed. One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled). To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 66ba850 ] Refactor the SyncWithK8sFinished function to return the list of services with stale backends, which should be refreshed, rather than directly refreshing them. This makes the separation more clear, allowing to avoid having to pass the refresh function as parameter and preventing possible deadlocks due to incorrect mutex locking (due to the interdependencies between the service subsystem and service cache). Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 90ad1cd ] The multi-pool workflow doesn't exercise envoy in particular and the verbose logs make it harder to analyze logs. Disable them. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit d351ae9 ] Adds a reference to the GitHub issue roadmap for the mutual authentication and provides an overview of the current status of features and what is planned before the feature can be considered for stable. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit e27730b ] This is useful for XFRM states which do not have a built-in direction field. Instead, we encode the direction in the packet mark and can therefore rely on that when logging. The same function can be used for XFRM policies, even though they do have a built-in Dir field as well. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 89626bc ] The SPI and the source and destination IP addresses (or CIDRs for XFRM policies) are not enough anymore to uniquely identify XFRM states and policies. We additionally need the node ID. This commit therefore ensures that we always log the five contextual information bits whenever possible: SPI, source, destination, direction, and node ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 4506c76 ] The node ID is reported in hexadecimal format in the XFRM states and policies, as well as in the node ID map dump. To make it easier to match the node ID across different sources, we should also dump it in hex format in the agent logs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit fe08772 ] This function will be used from cilium-dbg so we need to expose it from a shared package. We already have such a package for IPsec utility functions in pkg/common/ipsec. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 37b611e ] The cilium-dbg encrypt flush command removes all XFRM states and policies on the node. That will lead to packet drops until connections are reestablished. Traffic will also be sent in plain text between pods. This commit therefore asks for confirmation when running the command, to ensure nobody performs this action by mistake. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 5c7cfe6 ] This is useful to for example manually delete the XFRM config corresponding to an old key. It will warn if the user is about to delete all XFRM configs on the assumption that that isn't the intended action or the filter wouldn't be necessary. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit dd8920a ] Refactor the filterXFRMBySPI function to be able to filter by other things than SPI without duplicating the main logic. The new function filterXFRMs takes two predicate functions instead of hardcoding the comparison to "spi". No functional changes in this commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit c924bd6 ] We test both a single call to filterXFRMs and two chained calls. The latter is because we will need to chain calls for different filters because they are ANDed. For example, filtering on both the SPI and the node ID should only flush XFRM configs that match for both the given SPI and node ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 47e1b3f ] We will use this function from cilium-dbg in the subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 0e5d3c3 ] This can be useful to flush the XFRM configs of stale node IDs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ upstream commit 550b56e ] Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno · 2023-11-07T22:17:52Z

/test-backport-1.14

sayboras

Looks good for my commit ✅

brb

Thanks!

pchaigno

My two PRs you backported look good to me. Thanks!

aanm and others added 5 commits November 7, 2023 10:05

jibi added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Nov 7, 2023

jibi requested review from sayboras, tklauser, brb, nathanjsweet, pchaigno, jrajahalme, mhofstetter, giorio94, aanm and julianwiedmann November 7, 2023 11:27

aanm approved these changes Nov 7, 2023

View reviewed changes

tklauser approved these changes Nov 7, 2023

View reviewed changes

mhofstetter approved these changes Nov 7, 2023

View reviewed changes

jibi commented Nov 7, 2023

View reviewed changes

daemon/cmd/state.go Outdated Show resolved Hide resolved

k8s: Log Warning for Policies that Support "EndPort"

0eeedc3

[ upstream commit d221e96 ] Cilium does not currently support port ranges in network policies. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 06a09a7 to 59b76a8 Compare November 7, 2023 13:46

This comment was marked as outdated.

Sign in to view

giorio94 approved these changes Nov 7, 2023

View reviewed changes

aanm approved these changes Nov 7, 2023

View reviewed changes

pchaigno and others added 4 commits November 7, 2023 18:26

bpf: Add TC_ACT_REDIRECT check for nodeport

9469cca

[ upstream commit 80d99a6 ] Relates: #18894 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

jibi requested review from a team as code owners November 7, 2023 17:28

jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 8356e21 to 677c1cb Compare November 7, 2023 17:28

giorio94 and others added 4 commits November 7, 2023 18:32

jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 677c1cb to d0726cc Compare November 7, 2023 17:32

julianwiedmann approved these changes Nov 7, 2023

View reviewed changes

pchaigno added 11 commits November 7, 2023 19:23

ipsec: Move getNodeIDFromXfrmMark to pkg/common

c04e850

[ upstream commit 47e1b3f ] We will use this function from cilium-dbg in the subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

cmd: New flag to flush only XFRM configs for a given node ID

2182659

[ upstream commit 0e5d3c3 ] This can be useful to flush the XFRM configs of stale node IDs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

cmd: Unit test for parseNodeID

b963ac9

[ upstream commit 550b56e ] Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno force-pushed the pr/v1.14-backport-2023-11-07 branch from 1ffd5c2 to b963ac9 Compare November 7, 2023 20:24

sayboras approved these changes Nov 8, 2023

View reviewed changes

squeed approved these changes Nov 8, 2023

View reviewed changes

brb approved these changes Nov 8, 2023

View reviewed changes

pchaigno approved these changes Nov 8, 2023

View reviewed changes

nathanjsweet approved these changes Nov 8, 2023

View reviewed changes

jibi merged commit feb917a into v1.14 Nov 8, 2023
204 of 205 checks passed

jibi deleted the pr/v1.14-backport-2023-11-07 branch November 8, 2023 15:10

thorn3r mentioned this pull request Nov 9, 2023

Prepare for release v1.14.4 #29093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14 Backports 2023-11-07 #29030

v1.14 Backports 2023-11-07 #29030

jibi commented Nov 7, 2023 •

edited by nathanjsweet

tklauser left a comment

This comment was marked as outdated.

giorio94 left a comment

aanm left a comment

pchaigno commented Nov 7, 2023

sayboras left a comment

brb left a comment

pchaigno left a comment

v1.14 Backports 2023-11-07 #29030

v1.14 Backports 2023-11-07 #29030

Conversation

jibi commented Nov 7, 2023 • edited by nathanjsweet

tklauser left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

giorio94 left a comment

Choose a reason for hiding this comment

aanm left a comment

Choose a reason for hiding this comment

pchaigno commented Nov 7, 2023

sayboras left a comment

Choose a reason for hiding this comment

brb left a comment

Choose a reason for hiding this comment

pchaigno left a comment

Choose a reason for hiding this comment

jibi commented Nov 7, 2023 •

edited by nathanjsweet