v1.14 Backports 2023-07-24 #27038

nbusseneau · 2023-07-24T16:54:30Z

Once this PR is merged, you can update the PR labels via:

for pr in 26773 26455 26864 26808 26658 26901 26933 26868 26884 26956 26951 26719 26911 26867 26934 26979 26986 26912 26999 27011 26984; do contrib/backporting/set-labels.py $pr done 1.14; done

or with

make add-labels BRANCH=v1.14 ISSUES=26773,26455,26864,26808,26658,26901,26933,26868,26884,26956,26951,26719,26911,26867,26934,26979,26986,26912,26999,27011,26984

[ upstream commit 5136318 ] Normally the CNI config manager watches for changes, and overwrites the cilium CNI config file if necessary. However, we also need to support cases where end-users would like to modify the config that Cilium generates. So, when `cni.exclusive` is false, then also disable the file watcher. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 914f1ad ] If for some reason operator has outdated version of CES it will not be able to update such CES and it will never recover from such state. This can happen not only due to some othe client updating CES but also when the update from operator succeeds but for some reason api-server does't return OK (it can fail after updating etcd). Signed-off-by: Alan Kutniewski <kutniewski@google.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit dc2cfb6 ] This will default install Spire with Cilium if Spire auth is enabled. It will remove the need to set two flags when enabling Spire based Mutual Auth with Cilium Managed Spire. Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit b146df0 ] This will remove the spire enabled flag from the e2e test config as it is now the default option. Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 3ba76e5 ] Recent bugs with IPsec have highlighted a need to document several caveats of IPsec operations. This commit documents those caveats as well as common XFRM errors. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 3521998 ] A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: #26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit a667a7c ] with the current implementation of the workqueue used in the egressgw manager to handle CiliumEndpoint events and retries it's possible to trigger a race where by the time we process an add/update event, the related CiliumEndpoint has already been deleted from the pendingEndpointEvents, resulting in the agent panicing due to a nil access. This commit simplifies how the delete events are handled in order to eliminate the race. Fixes: 9fb24de ("egressgw: retry getIdentityLabels on failure") Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 72b560b ] after the switch to the trigger based reconciliation, the reconciliation can handle multiple type of events in one batch. Update the logic so that we don't end up calling updatePoliciesBySourceIP() twice in case we are processing a batch for both endpoint and policy events. Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 0297c6c ] Azure does not allow having multiple clusters with the same name in the same subscription even if they are hosted on different locations. In order to avoid name conflicts, we previously added the location name to the cluster name, however in some cases this leads to cluster names exceeding the maximum length. As a quick fix, we replace the location name with a simple index. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit c8ce545 ] Additionally, let's use the ready to be included snipped about the request for feedback instead of a custom message. Let's also remove the redundant beta feature information from the "enable cluster mesh" section, as already displayed as part of the link title: "[...] opt in to KVStoreMesh (beta) when enabling [...]". Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 4e9bbcd ] [ Backporter's note: minor conflicts due to 5da5882, introducing `UNENCRYPTED_TRAFFIC` to the flow proto definition, not being present on v1.14. Regerenated `flow.pb.go` with `make proto` after resolving. ] Make the TTL drops from ipv4_l3() more visible. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

…y path [ upstream commit 375b345 ] Straight-forward fix for a missing drop notification. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit db3def4 ] cil_to_container() has some paths that don't raise a drop notification for DROP_MISSED_TAIL_CALL. Fix them. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

…tainer [ upstream commit 614c174 ] Don't just return DROP_INVALID, but also throw a drop notification. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 1b68b31 ] This commit moves the MetalLB BGP solution to depcreated and promotes the BGP-CP feature in our docs. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit d29f101 ] The metric name is called "cilium_services_events_total" yet the variable name is called ServicesCount. The typical code pattern is to name the variable after a substring of the metric name for ease of grep. This commit does so and is a non-functional change. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit b86dab8 ] When the metric variable is defined as a global variable (within the `var` scope at the package level), then it will be instantiated as a NoOp metric. Once the metrics package is initialized, then all the metrics variables will transition from NoOp metrics to a real metric type. This problem occurred because the global variables instantiation happened before the metrics package initialization. This commit fixes it by using the metrics variable after the metrics package has been initialized. We can assume it's been initialized when the code executed is production ("live") code. Fixes: #26511 Fixes: 978b27c ("Metrics: Add services metrics") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 1a9778b ] The SPIRE server wasn't yet set to the ones listed in the values. This fixes that oversight. Signed-off-by: Maartje Eyskens <maartje@eyskens.me> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 0040720 ] Signed-off-by: Rauan Mayemir <rauan@mayemir.io>

[ upstream commit 12fc68a ] - For the main branch latest docs, clone the Cilium GitHub repo and use "--chart-directory ./install/kubernetes/cilium" flag. - For stable branches, set "--version" flag to the version in the top-level VERSION file. Fixes: #26931 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit c9983ef ] All IPsec traffic between two nodes is always send on a single IPsec flow (defined by outer source and destination IP addresses). As a consequence, RSS on such traffic is ineffective and throughput will be limited to the decryption performance of a single core. Reported-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit fa9aaa9 ] The object k8sClientRateLimit is already specified in the below lines https://github.com/cilium/cilium/blob/338947e0bbbe49781ba32e668e57f8c45dcf23a1/install/kubernetes/cilium/values.yaml.tmpl#L41-L51 Fixes: e2f475d Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit fe4dda7 ] Cilium already implements a restore path to prevent dropping existing connections on agent restart. Yet, there's currently an issue which causes the removal of valid backends from a service when receiving incomplete service updates, either because backends are spread across multiple endpointslices or some belong to remote clusters. Indeed, all previously known backends get replaced with the ones we just heard about (and present as part of the service cache event), possibly causing connectivity disruptions. The same issue can also occur in case of dual stack services, as they trigger the generation of two different endpointslices, one for each family. More specifically, let's consider the case in which a given service *foo* is associated with the *foo-1* and *foo-2* epslices, each containing a set of backends. Upon restart, the Cilium agent restores services and backends from the BPF map. Then it starts the service and epslice informers, both of which propagate the received events to the service cache. Let's say that we first receive the event about the service: at this point the service is considered not ready (as we have not yet seen any epslice), and nothing gets propagated. Then, we receive the event for the *foo-1* epslice, the service cache processes it and, given that the service has now backends, propagates an event down to the service subsystem for the service *foo*, including all backends part of *foo-1* (i.e., the ones known at the moment). At this point, all previously known backends get replaced by the new ones in the BPF maps, breaking the connections targeting the backends that were part of the *foo-2* epslice. Once an event for that epslice is also seen, then the backends will be merged and restored. The clustermesh case is similar because it triggers the same behavior as if we had a different epslice for each remote cluster. Let's prevent this behavior keeping a list of restored backends for each service, and continuing merging them with the ones we received an update for, until the bootstrap phase completes. After synchronization, an update is triggered for each service still associated with stale backends, so that they can be removed. Fixes: #23823 Fixes: #26944 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 079cee8 ] The 'reuse-values' flag does not exist in helm install only on the 'upgrade' sub-command. Fixes: 0a9b289 ("docs: add mutual-tls authentication") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 853df80 ] We should give preference to Cilium-cli for installation of getting started guides. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 5201b3c ] Suggested-by: Sarah Corleissen <sarah.corleissen@isovalent.com> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 64513d8 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

[ upstream commit 0254e20 ] Only enabled flags are marked as beta, as other flags will not be taken into consideration if these enabled flags are not set as true. Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

nbusseneau · 2023-07-24T17:05:07Z

@julianwiedmann Please see backporter's notes for your commits, had conflicts to resolve.

michi-covalent

💞

nbusseneau · 2023-07-24T17:06:02Z

/test-backport-1.14

pchaigno

My changes look good. Thanks!

sayboras

Thanks and looks good for my commits 👍

giorio94

My commits look good. Thanks!

meyskens

LGTM for my changes

nbusseneau · 2023-07-25T14:21:57Z

All testing has passed, almost all reviews are in and the remaining ones are for backports that did not hit any conflicts and are not complex, I will mark this ready-to-merge.

squeed and others added 28 commits July 24, 2023 16:42

bpf: host: add drop notification for missed tail call in to-lxc polic…

e993b2a

…y path [ upstream commit 375b345 ] Straight-forward fix for a missing drop notification. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

Use the server image from values for SPIRE

6d18651

[ upstream commit 1a9778b ] The SPIRE server wasn't yet set to the ones listed in the values. This fixes that oversight. Signed-off-by: Maartje Eyskens <maartje@eyskens.me> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

Fix Envoy LB docs incorrect supported annotation values

2bbc437

[ upstream commit 0040720 ] Signed-off-by: Rauan Mayemir <rauan@mayemir.io>

docs: switch the tab ordering for mutual-auth docs

0d3e695

[ upstream commit 853df80 ] We should give preference to Cilium-cli for installation of getting started guides. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

docs: small improvements in mutual-authentication documentation

a917b70

[ upstream commit 5201b3c ] Suggested-by: Sarah Corleissen <sarah.corleissen@isovalent.com> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

helm/hubble-ui: use v0.12.0 hubble-ui

f3d26ae

[ upstream commit 64513d8 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

nbusseneau added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Jul 24, 2023

nbusseneau requested review from giorio94, sayboras, pchaigno and jibi July 24, 2023 16:54

michi-covalent approved these changes Jul 24, 2023

View reviewed changes

nbusseneau marked this pull request as ready for review July 24, 2023 17:05

nbusseneau requested review from a team as code owners July 24, 2023 17:05

nbusseneau requested a review from gandro July 24, 2023 17:05

tommyp1ckles approved these changes Jul 24, 2023

View reviewed changes

julianwiedmann approved these changes Jul 24, 2023

View reviewed changes

christarazi approved these changes Jul 24, 2023

View reviewed changes

pchaigno approved these changes Jul 24, 2023

View reviewed changes

sayboras approved these changes Jul 24, 2023

View reviewed changes

giorio94 approved these changes Jul 25, 2023

View reviewed changes

geakstr approved these changes Jul 25, 2023

View reviewed changes

jibi approved these changes Jul 25, 2023

View reviewed changes

gandro approved these changes Jul 25, 2023

View reviewed changes

aanm approved these changes Jul 25, 2023

View reviewed changes

squeed approved these changes Jul 25, 2023

View reviewed changes

meyskens approved these changes Jul 25, 2023

View reviewed changes

nbusseneau added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 25, 2023

aanm merged commit 66dc0b2 into v1.14 Jul 25, 2023
185 checks passed

aanm deleted the pr/v1.14-backport-2023-07-24 branch July 25, 2023 14:36

maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14 Backports 2023-07-24 #27038

v1.14 Backports 2023-07-24 #27038

nbusseneau commented Jul 24, 2023 •

edited

nbusseneau commented Jul 24, 2023

michi-covalent left a comment

nbusseneau commented Jul 24, 2023

pchaigno left a comment

sayboras left a comment

giorio94 left a comment

meyskens left a comment

nbusseneau commented Jul 25, 2023

v1.14 Backports 2023-07-24 #27038

v1.14 Backports 2023-07-24 #27038

Conversation

nbusseneau commented Jul 24, 2023 • edited

nbusseneau commented Jul 24, 2023

michi-covalent left a comment

Choose a reason for hiding this comment

nbusseneau commented Jul 24, 2023

pchaigno left a comment

Choose a reason for hiding this comment

sayboras left a comment

Choose a reason for hiding this comment

giorio94 left a comment

Choose a reason for hiding this comment

meyskens left a comment

Choose a reason for hiding this comment

nbusseneau commented Jul 25, 2023

nbusseneau commented Jul 24, 2023 •

edited