New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-11-07 #29030
v1.14 Backports 2023-11-07 #29030
Conversation
[ upstream commit 5a0b88d ] Although ipcache might return an error, we should not perform an early return in the event handling of the pod event because some other components that are unrelated to ipcache might depend on this specific event per perform correctly. Instead we will log the ipcache error and let the execution of the method to continue. The pod might, for example, be in a "Pending" state which has its status IP addresses not set but might be able to change labels in the meanwhile which Cilium should react upon those changes which are not related with ipcache. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit da12133 ] `CertManager` throws a warning with the current Helm chart because the `.spec.privateKey.rotationPolicy` is unset. ``` Type Reason Age From Message ---- ------ ---- ---- ------- Warning CannotRegenerateKey 12m cert-manager-certificates-key-manager User intervention required: existing private key in Secret "hubble-relay-client-certs" does not match requirements on Certificate resource, mismatching fields: [spec.privateKey.algorithm[], but cert-manager cannot create new private key as the Certificate's .spec.privateKey.rotationPolicy is unset or set to Never. To allow cert-manager to create a new private key you can set .spec.privateKey.rotationPolicy to 'Always' (this will result in the private key being regenerated every time a cert is renewed) ``` Signed-off-by: Samuel Lang <gh@lang-sam.de> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 904ceb3 ] The Cilium standalone LB does not run as a K8s pod, so the regular Cilium's sysdump collection does not work. Instead, just show docker container logs of the LB. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 28a3cb7 ] l4_load_port() is just a thin wrapper around ctx_load_bytes(), which returns raw kernel errnos. Translate these to a Cilium-internal drop reason before returning to the caller. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 36b7802 ] Bump timeout from 20 to 30 minutes due to repeated workflow cancellations due to timeout on otherwise successful workflow runs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change looks good, thanks!
[ upstream commit d221e96 ] Cilium does not currently support port ranges in network policies. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
06a09a7
to
59b76a8
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for Thomas' PR
[ upstream commit df969b7 ] [ backporter's notes: a few conflicts to deal with, I just started fresh and removed all logic for IPSec node encryption ] Node encryption for IPsec hasn't been supported since 1d2674d ("docs: ipsec: remove node-to-node encryption") and subsequent commits. The feature also wasn't working since several releases. This commit simply removes the code for that feature. This code has no use now and makes changes to IPsec slightly more difficult. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1900f4a ] [ backporter's notes: a few conflicts to deal with, as the IPSec methods differ from the ones in main, so I just manually moved all the methods from node.go to ipsec.go ] This commit has no functional changes. It simply moves all the linuxNodeHandler functions that pertain to IPsec to a new file, ipsec.go. This will ease review assignments by ensuring that we don't require an IPsec review on non-IPsec code and vice versa. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit c2674ae ] If an Ingress resource with `ingressClass: cilium` is changed to a different value, the corresponding resources (CEC, Endpoints & Service` aren't removed (mode dedicated) or the shared CiliumEnvoyConfig isn't updated (mode shared). Therefore, this commit reflects the changes on the corresponding resources when the `ingressClass` of an Ingress gets updated from `cilium` to something else. Fixes: #23781 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
8356e21
to
677c1cb
Compare
[ upstream commit cf4279c ] [ backporter's note: removed the Group: syncLBMapsControllerGroup parameter from the controller as the group is not defined in v1.14 ] fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed. One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled). To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 66ba850 ] Refactor the SyncWithK8sFinished function to return the list of services with stale backends, which should be refreshed, rather than directly refreshing them. This makes the separation more clear, allowing to avoid having to pass the refresh function as parameter and preventing possible deadlocks due to incorrect mutex locking (due to the interdependencies between the service subsystem and service cache). Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 90ad1cd ] The multi-pool workflow doesn't exercise envoy in particular and the verbose logs make it harder to analyze logs. Disable them. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit d351ae9 ] Adds a reference to the GitHub issue roadmap for the mutual authentication and provides an overview of the current status of features and what is planned before the feature can be considered for stable. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
677c1cb
to
d0726cc
Compare
[ upstream commit e27730b ] This is useful for XFRM states which do not have a built-in direction field. Instead, we encode the direction in the packet mark and can therefore rely on that when logging. The same function can be used for XFRM policies, even though they do have a built-in Dir field as well. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 89626bc ] The SPI and the source and destination IP addresses (or CIDRs for XFRM policies) are not enough anymore to uniquely identify XFRM states and policies. We additionally need the node ID. This commit therefore ensures that we always log the five contextual information bits whenever possible: SPI, source, destination, direction, and node ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 4506c76 ] The node ID is reported in hexadecimal format in the XFRM states and policies, as well as in the node ID map dump. To make it easier to match the node ID across different sources, we should also dump it in hex format in the agent logs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit fe08772 ] This function will be used from cilium-dbg so we need to expose it from a shared package. We already have such a package for IPsec utility functions in pkg/common/ipsec. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 37b611e ] The cilium-dbg encrypt flush command removes all XFRM states and policies on the node. That will lead to packet drops until connections are reestablished. Traffic will also be sent in plain text between pods. This commit therefore asks for confirmation when running the command, to ensure nobody performs this action by mistake. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 5c7cfe6 ] This is useful to for example manually delete the XFRM config corresponding to an old key. It will warn if the user is about to delete all XFRM configs on the assumption that that isn't the intended action or the filter wouldn't be necessary. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit dd8920a ] Refactor the filterXFRMBySPI function to be able to filter by other things than SPI without duplicating the main logic. The new function filterXFRMs takes two predicate functions instead of hardcoding the comparison to "spi". No functional changes in this commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit c924bd6 ] We test both a single call to filterXFRMs and two chained calls. The latter is because we will need to chain calls for different filters because they are ANDed. For example, filtering on both the SPI and the node ID should only flush XFRM configs that match for both the given SPI and node ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 47e1b3f ] We will use this function from cilium-dbg in the subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 0e5d3c3 ] This can be useful to flush the XFRM configs of stale node IDs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ upstream commit 550b56e ] Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
1ffd5c2
to
b963ac9
Compare
/test-backport-1.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commit ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two PRs you backported look good to me. Thanks!
linuxNodeHandler
IPsec functions to their own file #28941 (@pchaigno)encrypt flush
command #28795 (@pchaigno)Once this PR is merged, a GitHub action will update the labels of these PRs: