Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.14 Backports 2023-11-07 #29030

Merged
merged 25 commits into from Nov 8, 2023
Merged

v1.14 Backports 2023-11-07 #29030

merged 25 commits into from Nov 8, 2023

Conversation

aanm and others added 5 commits November 7, 2023 10:05
[ upstream commit 5a0b88d ]

Although ipcache might return an error, we should not perform an early
return in the event handling of the pod event because some other
components that are unrelated to ipcache might depend on this specific
event per perform correctly. Instead we will log the ipcache error and
let the execution of the method to continue.

The pod might, for example, be in a "Pending" state which has its
status IP addresses not set but might be able to change labels in the
meanwhile which Cilium should react upon those changes which are not
related with ipcache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit da12133 ]

`CertManager` throws a warning with the current Helm chart because the `.spec.privateKey.rotationPolicy` is unset.

```
  Type     Reason               Age   From                                   Message
  ----     ------               ----  ----                                   -------
  Warning  CannotRegenerateKey  12m   cert-manager-certificates-key-manager  User intervention required: existing private key in Secret "hubble-relay-client-certs" does not match requirements on Certificate resource, mismatching fields: [spec.privateKey.algorithm[], but cert-manager cannot create new private key as the Certificate's .spec.privateKey.rotationPolicy is unset or set to Never. To allow cert-manager to create a new private key you can set .spec.privateKey.rotationPolicy to 'Always' (this will result in the private key being regenerated every time a cert is renewed)
```

Signed-off-by: Samuel Lang <gh@lang-sam.de>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 904ceb3 ]

The Cilium standalone LB does not run as a K8s pod, so the regular
Cilium's sysdump collection does not work. Instead, just show docker
container logs of the LB.

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 28a3cb7 ]

l4_load_port() is just a thin wrapper around ctx_load_bytes(), which
returns raw kernel errnos. Translate these to a Cilium-internal drop reason
before returning to the caller.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 36b7802 ]

Bump timeout from 20 to 30 minutes due to repeated workflow cancellations
due to timeout on otherwise successful workflow runs.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Nov 7, 2023
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My change looks good, thanks!

daemon/cmd/state.go Outdated Show resolved Hide resolved
[ upstream commit d221e96 ]

Cilium does not currently support port ranges in
network policies.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 06a09a7 to 59b76a8 Compare November 7, 2023 13:46
@jibi

This comment was marked as outdated.

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good. Thanks!

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for Thomas' PR

pchaigno and others added 4 commits November 7, 2023 18:26
[ upstream commit df969b7 ]

[ backporter's notes: a few conflicts to deal with, I just started fresh
and removed all logic for IPSec node encryption ]

Node encryption for IPsec hasn't been supported since 1d2674d ("docs:
ipsec: remove node-to-node encryption") and subsequent commits. The
feature also wasn't working since several releases.

This commit simply removes the code for that feature. This code has no
use now and makes changes to IPsec slightly more difficult.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1900f4a ]

[ backporter's notes: a few conflicts to deal with, as the IPSec methods
differ from the ones in main, so I just manually moved all the methods
from node.go to ipsec.go ]

This commit has no functional changes.

It simply moves all the linuxNodeHandler functions that pertain to IPsec
to a new file, ipsec.go. This will ease review assignments by ensuring
that we don't require an IPsec review on non-IPsec code and vice versa.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit c2674ae ]

If an Ingress resource with `ingressClass: cilium` is changed to
a different value, the corresponding resources (CEC, Endpoints &
Service` aren't removed (mode dedicated) or the shared CiliumEnvoyConfig
isn't updated (mode shared).

Therefore, this commit reflects the changes on the corresponding
resources when the `ingressClass` of an Ingress gets updated from
`cilium` to something else.

Fixes: #23781

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 80d99a6 ]

Relates: #18894
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi requested review from a team as code owners November 7, 2023 17:28
@jibi jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 8356e21 to 677c1cb Compare November 7, 2023 17:28
giorio94 and others added 4 commits November 7, 2023 18:32
[ upstream commit cf4279c ]

[ backporter's note: removed the Group: syncLBMapsControllerGroup
parameter from the controller as the group is not defined in v1.14 ]

fe4dda7 ("services: prevent temporary connectivity loss on agent
restart") modified the handling of restored backends to prevent
possibly causing temporary connectivity disruption on agent restart if
a service is either associated with multiple endpointslices (e.g.,
it has more than 100 backends, or is dual stack) or has backends
spanning across multiple clusters (i.e., it is a global service).
At a high level, we now keep a list of restored backends, which continue
being merged with the ones we received an update for, until the bootstrap
phase completes. At that point, we trigger an update for each service
still associated with stale backends, so that they can be removed.

One drawback associated with this approach, though, is that when
clustermesh is enabled we currently wait for full synchronization from
all remote clusters before triggering the removal of stale backends,
regardless of whether the given service is global (i.e., possibly
includes also remote backends) or not. One specific example in which
such behavior is problematic relates to the clustermesh-apiserver
Indeed, if it gets restarted at the same time of the agents
(e.g., during an upgrade), the associated service might end up
including both the address of the previous pod (which is now stale)
and that of the new one, which is correct. When kvstoremesh is
enabled, local agents connect to it through that service. In this
case, there's a circular dependency: the agent may pick the stale
backend, and the connection to etcd fails, which in turn prevents the
synchronization from being started, and eventually complete to trigger
the removal of the stale backend. Although this dependency eventually
resolves as a different backend is picked by the service load-balancing
algorithm, unnecessary delay is introduced (the same could also happen
for remote agents connecting through a NodePort if KPR is enabled).

To remove this dependency, let's perform a two-pass cleanup of stale
backends: the first one as soon as we synchronize with Kubernetes,
targeting non-global services only; the second triggered by full
clustermesh synchronization, covering all remaining ones. Hence,
non-global services can be fully functional also before the
completion of the full clustermesh synchronization. It is worth
mentioning that this fix applies to all non-global services,
which can now converge faster also in case of large clustermeshes.

Co-authored-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 66ba850 ]

Refactor the SyncWithK8sFinished function to return the list of services
with stale backends, which should be refreshed, rather than directly
refreshing them. This makes the separation more clear, allowing to avoid
having to pass the refresh function as parameter and preventing possible
deadlocks due to incorrect mutex locking (due to the interdependencies
between the service subsystem and service cache).

Suggested-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 90ad1cd ]

The multi-pool workflow doesn't exercise envoy in particular and the
verbose logs make it harder to analyze logs. Disable them.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit d351ae9 ]

Adds a reference to the GitHub issue roadmap for the mutual authentication
and provides an overview of the current status of features and what is
planned before the feature can be considered for stable.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi force-pushed the pr/v1.14-backport-2023-11-07 branch from 677c1cb to d0726cc Compare November 7, 2023 17:32
[ upstream commit e27730b ]

This is useful for XFRM states which do not have a built-in direction
field. Instead, we encode the direction in the packet mark and can
therefore rely on that when logging. The same function can be used for
XFRM policies, even though they do have a built-in Dir field as well.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 89626bc ]

The SPI and the source and destination IP addresses (or CIDRs for XFRM
policies) are not enough anymore to uniquely identify XFRM states and
policies. We additionally need the node ID.

This commit therefore ensures that we always log the five contextual
information bits whenever possible: SPI, source, destination, direction,
and node ID.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 4506c76 ]

The node ID is reported in hexadecimal format in the XFRM states and
policies, as well as in the node ID map dump. To make it easier to match
the node ID across different sources, we should also dump it in hex
format in the agent logs.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit fe08772 ]

This function will be used from cilium-dbg so we need to expose it from
a shared package. We already have such a package for IPsec utility
functions in pkg/common/ipsec.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 37b611e ]

The cilium-dbg encrypt flush command removes all XFRM states and
policies on the node. That will lead to packet drops until connections
are reestablished. Traffic will also be sent in plain text between pods.

This commit therefore asks for confirmation when running the command, to
ensure nobody performs this action by mistake.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 5c7cfe6 ]

This is useful to for example manually delete the XFRM config
corresponding to an old key. It will warn if the user is about to delete
all XFRM configs on the assumption that that isn't the intended action
or the filter wouldn't be necessary.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit dd8920a ]

Refactor the filterXFRMBySPI function to be able to filter by other
things than SPI without duplicating the main logic. The new function
filterXFRMs takes two predicate functions instead of hardcoding the
comparison to "spi".

No functional changes in this commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit c924bd6 ]

We test both a single call to filterXFRMs and two chained calls. The
latter is because we will need to chain calls for different filters
because they are ANDed. For example, filtering on both the SPI and the
node ID should only flush XFRM configs that match for both the given
SPI and node ID.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 47e1b3f ]

We will use this function from cilium-dbg in the subsequent commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 0e5d3c3 ]

This can be useful to flush the XFRM configs of stale node IDs.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 550b56e ]

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno
Copy link
Member

pchaigno commented Nov 7, 2023

/test-backport-1.14

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commit ✅

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two PRs you backported look good to me. Thanks!

@jibi jibi merged commit feb917a into v1.14 Nov 8, 2023
204 of 205 checks passed
@jibi jibi deleted the pr/v1.14-backport-2023-11-07 branch November 8, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet