v1.14 Backports 2024-03-12 #31335

jibi · 2024-03-12T08:46:29Z

PRs skipped due to conflicts:

gateway-api: Ensure hostname check when set on both the HTTPRoute and the Gateway Listener #30686 (@cjvirtucio87)
gateway-api: shorten the length of the value of the svc's label. #31292 (@chaunceyjiang)

Once this PR is merged, a GitHub action will update the labels of these PRs:

 30988 31014 31159 31182 31196 31060 31179 31218 31015

[ upstream commit 70b405f ] On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occurred while the process was running. There's several places we check on ```is.Errors(err, context.Cancelled)``` on whether to emit high level logs about failed program compilations. Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed") This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed" This can occur occasionally in CI, which enforces no error to pass, causing failures. example: ``` ctx, c := context.WithTimeout(context.Background(), time.Second) go func() { time.Sleep(time.Second) c() }() cmd := exec.CommandContext(ctx, "sleep", "2") fmt.Println(cmd.Run()) ctx, c = context.WithTimeout(context.Background(), time.Second) c() cmd = exec.CommandContext(ctx, "sleep", "2") fmt.Println(cmd.Run()) ``` To fix this, this will join in the ctx.Err() if it is: * context.Cancelled * The process has not exited itself. * The process appeared to be SIGKILL'ed. Addresses: #30991 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit a099bf1 ] [ backporter's node: changes applied to plugins/cilium-cni/main.go rather than plugins/cilium-cni/cmd/cmd.go ] Unlike runtime agent/operator logs, CNI logs are just written to disk so we have no way to attach timestamps to them. This makes it harder to debug CNI issues as we have no way to correlate when things happened between Agent logs and CNI events. This switches CNI to use the same default logger, except with timestamps enabled. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 927969b ] [ backporter's note: changes applied to cilium/cmd/encrypt_flush.go rather than cilium-dbg/cmd/encrypt_flush ] This commit slightly changes the behavior of the "encrypt flush" command in case of errors when trying to delete XFRM rules. The tool currently lists rules, filters them based on user-given arguments, and then deletes them. If an XFRM rule is deleted by the agent or the user while we're filtering, the deletion will fail. The current behavior in that case is to fatal. On busy clusters, that might mean that we always fatal because XFRM states and policies are constently added and removed. This commit changes the behavior to proceed with subsequent deletions in case one fails. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 5c2a67f ] [ backporter's note: changes applied to cilium/cmd/encrypt_flush.go rather than cilium-dbg/cmd/encrypt_flush ] This commit refactors the code a bit simplify a latter commit. No functional changes. This may be a bit excessive in commit splitting, but at least I can claim my last commit is free of any refactoring 😅 Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 5eb27e2 ] [ backporter's note: changes applied to cilium/cmd/encrypt_flush.go rather than cilium-dbg/cmd/encrypt_flush ] This new flag will allow users to clean stale XFRM states and policies based on the node ID map contents. If XFRM states or policies are found with a node ID that is not in the BPF map, then we probably have a leak somewhere. Such leaks can lead in extreme cases to performance degradation when the number of XFRM states and policies grows large (and if using ENI or Azure IPAM). Having a tool to cleanup these XFRM states and policies until the leak is fixed can therefore be critical. The new flag is incompatible with the --spi and --node-id filter flags. We first dump the XFRM rules and then dump the map content. In that way, if a new node ID is allocated while we're running the tool, we will simply ignore the corresponding XFRM rules. If a node ID is removed while running the tool, we will fail to remove the corresponding XFRM rules and continue with the others. Tested on a GKE cluster by adding fake XFRM states and policies that the tool was able to remove. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit f92b528 ] The CNI version should be specify so that in case we have to fallback the installation of k8s via binaries it doesn't fail with the error: ``` 10:29:25 k8s1-1.25: gzip: stdin: not in gzip format 10:29:25 k8s1-1.25: tar: Child returned status 1 10:29:25 k8s1-1.25: tar: Error is not recoverable: exiting now ``` Fixes: ce69afd ("add support for k8s 1.25.0") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

CODEOWNERS

[ upstream commit 7a301a4 ] This commit adds the GH workflow to run on arm machines. This effectively means that we can remove our travis integration and only use GH actions from now on. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit b68cf99 ] This is to extend the existing basic ingress docs with external lockdown CCNP, while still allows in-cluster traffic to Ingress LB IP. Relates: #28126 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit db7b3ef ] Suppress the "Unable to determine next hop address" logs. While it shows the L2 neighbor resolution failure, it does not always indicate a datapath connectivity issue. For example, when "devices=eth+" is specified and the device naming/purposing is not consistent across the nodes in the cluster, in some nodes "eth1" is a device reachable to other nodes, but in some nodes, it is not. As a result, L2 Discovery generates an "Unable to determine next hop address". Another example is ENI mode with automatic device detection. When secondary interfaces are added, they are used for L2 Neighbor Discovery as well. However, nodes can only be reached via the primary interface through the default route in the main routing table. Thus, L2 Neighbor Discovery generates "Unable to determine next hop address" for secondary interfaces. In both cases, it does not always mean the datapath has an issue for KPR functionality. However, the logs appear repeatedly, are noisy, and the message is error-ish, causing confusion. This log started to appear for some users who did not see it before from v1.14.3 (#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR + Tunnel users because of f2dcc86. Before the commit, we did not perform L2 Neighbor Discovery in tunnel mode, so even if users had an interface unreachable to other nodes, the log did not appear. For v1.15.0, it affects to the users who used to have the unreachable interface. 2058ed6 made it visible. Before the commit, some kind of the errors like EHOSTUNREACH and ENETUNREACH were not caught because FIBMatch option didn't specified. After v1.15.0, users started to see the log. Fixes: #28858 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 777b580 ] Set ConnectionRetryTimeSeconds in the component tests to 1s in component tests unless it is specified explicitly. Otherwise, when the initial connect fails, we need to 120s for the next connection by default, which may longer than the timeout of the test itself. Fixes: #31217 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit 81f14bb ] This commit adjusts the usage of send_trace_notify in bpf_network.c to enable monitor aggregation for all events emitted at this observation point in the datapath. This change helps improve resource usage by reducing the overall number of events that the datapath emits, while still enabling packet observability with Hubble. The events in bpf_network.c enable observability into the IPSec processing of the datapath. Before this commit, multiple other efforts have been made to increase the aggregation of events related to IPSec to reduce resource usage, see #29616 and #27168. These efforts were related to packets that were specifically marked as encrypted or decrypted by IPSec and did not include events in bpf_network.c that were emitted when either: (a) a plaintext packet has been received from the network, or (b) a packet was decrypted and reinserted into the stack by XFRM. Both of these events are candidates for aggregation because similar to-stack events will be emitted down the line in the datapath anyways. Additionally, these events are mainly useful for root-cause analysis or debugging and are not necessarily helpful from an overall observability standpoint. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

sayboras · 2024-03-12T10:09:45Z

gateway-api: Ensure hostname check when set on both the HTTPRoute and the Gateway Listener #30686 (@cjvirtucio87)

gateway-api: shorten the length of the value of the svc's label. #31292 (@chaunceyjiang)

Noted and thanks, I will take care of these two backports for 1.14.

sayboras

Thanks and looks good for my commit.

jibi · 2024-03-12T10:38:47Z

/test-backport-1.14

pchaigno

My PR looks good. Thanks!

YutaroHayakawa

Thanks!

learnitall

Thank you!

tommyp1ckles and others added 6 commits March 12, 2024 09:44

jibi added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Mar 12, 2024

jibi requested review from pchaigno, aanm, sayboras, YutaroHayakawa, learnitall and tommyp1ckles March 12, 2024 08:46

jibi force-pushed the pr/v1.14-backport-2024-03-12 branch from ea8de7a to f9c2032 Compare March 12, 2024 08:58

aanm requested changes Mar 12, 2024

View reviewed changes

CODEOWNERS Outdated Show resolved Hide resolved

aanm and others added 5 commits March 12, 2024 10:51

jibi force-pushed the pr/v1.14-backport-2024-03-12 branch from f9c2032 to de1f8c1 Compare March 12, 2024 09:51

sayboras approved these changes Mar 12, 2024

View reviewed changes

jibi marked this pull request as ready for review March 12, 2024 18:28

jibi requested review from a team as code owners March 12, 2024 18:28

jibi requested a review from aanm March 12, 2024 18:28

aanm approved these changes Mar 12, 2024

View reviewed changes

pchaigno approved these changes Mar 12, 2024

View reviewed changes

YutaroHayakawa approved these changes Mar 13, 2024

View reviewed changes

learnitall approved these changes Mar 13, 2024

View reviewed changes

jibi added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Mar 13, 2024

tommyp1ckles approved these changes Mar 14, 2024

View reviewed changes

dylandreimerink approved these changes Mar 15, 2024

View reviewed changes

julianwiedmann removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Mar 16, 2024

maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 16, 2024

julianwiedmann merged commit 0de4ac9 into v1.14 Mar 16, 2024
230 checks passed

julianwiedmann deleted the pr/v1.14-backport-2024-03-12 branch March 16, 2024 12:49

jrajahalme mentioned this pull request Mar 26, 2024

Prepare for release v1.14.9 #31626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14 Backports 2024-03-12 #31335

v1.14 Backports 2024-03-12 #31335

jibi commented Mar 12, 2024 •

edited

sayboras commented Mar 12, 2024

sayboras left a comment

jibi commented Mar 12, 2024

pchaigno left a comment

YutaroHayakawa left a comment

learnitall left a comment

v1.14 Backports 2024-03-12 #31335

v1.14 Backports 2024-03-12 #31335

Conversation

jibi commented Mar 12, 2024 • edited

sayboras commented Mar 12, 2024

sayboras left a comment

Choose a reason for hiding this comment

jibi commented Mar 12, 2024

pchaigno left a comment

Choose a reason for hiding this comment

YutaroHayakawa left a comment

Choose a reason for hiding this comment

learnitall left a comment

Choose a reason for hiding this comment

jibi commented Mar 12, 2024 •

edited