v1.11 backports 2022-01-31 #18669

glibsm · 2022-01-31T01:31:07Z

hubble/recorder: Sanitize pcap filename #18612 -- hubble/recorder: Sanitize pcap filename (@gandro)
Fix kube-apiserver policy matching feature with tunneling enabled #18527 -- Fix kube-apiserver policy matching feature with tunneling enabled (@christarazi)
test/runtime: fix flake on non-ready endpoints #18627 -- test/runtime: fix flake on non-ready endpoints (@tklauser)
datapath: Change FIB lookups to enable NodePort multihoming #18585 -- datapath: Change FIB lookups to enable NodePort multihoming (@brb)
test: Fix pod cleanup after various tests #18448 -- test: Fix pod cleanup after various tests (@joestringer)
datapath: Only unload obsolete XDP when attached #18636 -- datapath: Only unload obsolete XDP when attached (@jaffcheng)
docs: Minor updates to IPsec limitations #18647 -- docs: Minor updates to IPsec limitations (@pchaigno)
ipsec: Generate from-stack trace for ipsec packets #18608 -- ipsec: Generate from-stack trace for ipsec packets (@YutaroHayakawa)

Once this PR is merged, you can update the PR labels via:

$ for pr in 18612 18527 18627 18585 18448 18636 18647 18608; do contrib/backporting/set-labels.py $pr done 1.11; done

tklauser

Thanks

pchaigno

My changes look good. Thanks!

pchaigno · 2022-01-31T18:40:47Z

/test-backport-1.11

Job 'Cilium-PR-K8s-1.18-kernel-4.9' hit: #17981 (89.69% similarity)

joestringer

Thanks!

YutaroHayakawa

LGTM 👍

Thanks!

brb

My changes LGTM, thanks.

[ upstream commit c3ef235 ] Additional sanity check coverage that the kube-apiserver policy matching feature still works with a duplicate policy added and the previous removed. Updates: cilium#17829 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit bd7ee73 ] Now that the limitation has been fixed, remove it from the docs. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit b6a41d2 ] The init container name doesn't change during tests, so define it const. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit f49dda7 ] Ensure that all endpoints exist and are ready before attempting to extract a specific endpoint from GetAllEndpointsIds results. Follows commit 3738680 ("test/runtime: Fix flake on reserved:init endpoints") which introduced a similar check before calls to GetEndpointIds. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 96152dc ] Previously, the bpf_fib_lookup() helper was invoked in nodeport.h with the BPF_FIB_LOOKUP_{OUTPUT,DIRECT} flags set. This commit drops both flags to make the NodePort BPF multihoming possible. The former flag instructed the helper to do the FIB lookup from egress perspective. This required us to set the correct ifindex for the lookup. In the NodePort BPF code we used DIRECT_ROUTING_DEV_IFINDEX which corresponds to an iface which is used to communicate between k8s Nodes (i.e., the iface with k8s Node InternalIP or ExternalIP). However, the reliance on the DIRECT_ROUTING_DEV_IFINDEX meant that in the case of more than a single iface used to connect the nodes, only one iface could be used for the NodePort BPF traffic (forwarded service requests). To fix this, we can drop the BPF_FIB_LOOKUP_OUTPUT and set the ingress iface in the FIB lookup parameter. This makes the FIB lookup from the perspective of the device which received a packet (before forwarding to egress iface), and it eliminates the need for DIRECT_ROUTING_DEV_IFINDEX in the NodePort BPF. The latter flag meant to speed up the FIB lookups. The kernel commit which introduced it [1] says: BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes straight to the table associated with the device (expert setting for those looking to maximize throughput) This means that we bypass ip rules. However, cilium-agent for some setups installs IP rules (e.g., EKS, WireGuard, etc). Therefore, to avoid any potential pitfals we drop the latter flag so that the IP rules can be respected. To measure a potential perf regression, I provisioned [2] scripts on three AWS EC2 c5a.8xlarge machines (10Gbit, Ubuntu 20.04 LTS, 5.11 kernel). The netperf client was sending requests to the LB node which was forwarding them to the destination node. The only Helm setting for Cilium was KPR=strict. The results from three runs each setup are the following: ---------------------+-----------------+------------------------- | v1.11.1 | v1.11.1 with FIB changes ---------------------+-----------------+------------------------- TCP_RR MEAN_LATENCY | 358.74 +- 27.08 | 359.56 +- 30.50 with STDDEV_LATENCY | 360.28 +- 26.01 | 361.76 +- 26.71 | 368.18 +- 28.22 | 365.33 +- 35.83 ---------------------+-----------------+------------------------- As you can see, the increase in latency is negligible. Therefore, this commit drops the latter flag unconditionally, i.e., without checking whether there are any non-default IP rules or Cilium is running on a multi-homing setup. [1]: torvalds/linux@87f5fc7 [2]: https://github.com/brb/cilium-without-netfilter Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 6c3bf64 ] According to the Ginkgo e2e test docs, All "AfterEach" statements are executed before all "AfterAll" statements[1]. Previously, this code was assuming that AfterAll inside a context would run first (to delete pods) and then the AfterEach would run outside the context (to wait for the pods to terminate), then AfterAll outside the context to finally clean up the Cilium pods. The result was that in issue 18447, the next test to run could hit a race condition where pods were deleted but did not fully terminate before Cilium was removed. Cilium ends up getting deleted before all the pods, which means that there is no longer any way to execute the CNI DEL operation for those pods, so they get stuck in Terminating state. Fix it by moving the check that the test pods are fully terminated before returning from the AfterAll statement where we initiate the deletion of those pods. [0]: https://docs.cilium.io/en/latest/contributing/testing/e2e/#example-test-layout Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 2ae89c1 ] This test seems like it's always skipped, so who knows whether it works or not. But this should at least mitigate issues with cleaning up state at the end of the test, similar to the issues in the adjacent commits. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit cd7df97 ] Previously this code was mismatched: The BeforeEach would attempt to re-create the graceful termination app before each of the It statements, however the cleanup would only occur once. While preparing PR 18448, I hit an issue ciliumGH-18634 where if we didn't wait sufficiently long between individual tests in the graceful termination test Context, the test would fail during the following BeforeEach because the previous test's client was not yet cleaned up. Fix this by ensuring the cleanup & wait for pod termination occurs in an AfterEach for each It statement in this Context, to match the behaviour of the overall test. The next commit will then shift the cleanup logic out of a Describe level AfterEach and into the individual Contexts. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 589c215 ] The library in the tree that we use to interact with kubectl doesn't automatically wait for all pods to be deleted when we delete them. Depending on which test runs next, this can mean that we occasionally hit a race condition like the one in Issue 18447: * Delete Pods * Delete Cilium * Pods can't be fully deleted because Cilium is down * BeforeAll for next test waits to make sure everything is ready so that we can install Cilium * The next test times out while attempting this process This commit goes through all of the existing suites, removes cases where we are relying on incorrect scheduling between AfterEach/AfterAll statements, and adds in waits after deleting pods anywhere in the suite where we delete pods. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 323cb87 ] When the test fails because it is waiting for containers to terminate, add this information into the error that is reported so that developers can more easily debug the issue that's occurring. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 216bcb1 ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sBandwidthTest Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 119653b ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sCustomCalls Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit fb2d11c ] Deleting all pods after deleting Cilium is a sure-fire way to introduce flakiness where pods hang in Terminating state during deletion, because the Cilium pods have been cleaned up and kubelet can't call into the CNI to properly clean up the pods. Delete the pods first and wait for termination to finish. CC: Quentin Monnet <quentin@isovalent.com> Fixes: 9d4e99d ("tests: rework custom calls's AfterEach/AfterAll blocks to skip if needed") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 95a46c6 ] Use the spec.replicas field in the kube-dns deployment to reconfigure the number of DNS pods in the cluster during pod restart. This should restart the kube-dns pods correctly, similar to just deleting the pods. A subsequent patch intends to split this usage out depending the status of Cilium deployments, in order to mitigate issues where kube-dns pods are deployed differently between test runs due to Cilium configuration (like endpoint routes mode). Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 8302f0e ] As of the writing of this patch, Cilium does not support transitioning between enable-endpoint-routes=true and enable-endpoint-routes=false. Commit c18cfc8 ("test: Redeploy DNS pods in AfterAll for datapath tests") attempted to ensure that DNS pods are properly deleted from nodes prior to completing a particular test file. This was to ensure that when making changes to Cilium configuration like toggling endpoint routes mode, the pods are properly switched over to the new mode. Previously, the order of the test cleanup could cause problems with terminating the kube-dns pods because Cilium was first deleted from the cluster, but the kube-dns pods would get stuck in "Terminating" state. This is believed to be because the nodes would still attempt to call the cilium-cni binary in order to remove the DNS pods, and that would attempt to reach out to the cilium-agent on the node, which is no longer present. The motivation behind this patch, then, is to first scale the pods down to zero so that no kube-dns pods are present in the cluster during test file cleanup, then shut down Cilium. Finally, we restore the number of replicas back to the original value, so that when the next test sets Cilium back up, the kubedns pods will be ready to be deployed afresh. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 6f9e1df ] Currently, we determine whether a device is attached with an XDP program by checking if `link.Attrs().Xdp == nil`. We need to check for `Xdp.Attached` as well, otherwise, the XDP unloading will be executed on innocent devices every time the agent restarts, which might further cause network interruption for some NIC drivers, e.g. Mellanox mlx5. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 3cd68af ] Similarly to other guides, we can put incompatibilities between different features into a Limitations section, at the end. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 411da57 ] Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 622eb86 ] Current inherit_identity_from_host cannot inherit identity from encrypted packets (packets with MARK_MAGIC_ENCRYPT coming from host stack). Extend it to handle it. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 49b31e7 ] Currently, return value of inherit_identity_from_host indicates the packet is coming from proxy or not. However, in some cases, we want to know whether the packet was encrypted or not (and it is impossible to know it after the call since inherit_identity_from_host wipes the mark). To cover both of the cases, we can just extend it to return magic value extracted from the mark. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 8420939 ] Currently, when packets passed from from-container (bpf_lxc) to host stack, src_id is not encoded to the mark. It is inconvenient for rest of the datapath especially when we generate the trace. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 144e9e0 ] When ipsec packets comes from the host (xfrm) stack to from-host program (bpf_host), it bypasses the send_trace_notify (<- stack). As a result, we only see the trace for inner packet (ICMPRequest/Reply with <- endpoint and -> stack). Fix bpf_host program not to bypass the trace. Note that since we moved do_netdev_encrypt after inherit_identity_from_host, we cannot extract identity from ctx->mark anymore, because inherit_identity_from_host wipes the mark. That's why we need to pass inherited identity to do_netdev_encrypt. Otherwise, in tunneling mode, get_identity doesn't work correctly. Trace on source node before fix ``` <- endpoint ... EchoRequest -> stack ... EchoRequest ``` Trace on the source node after fix ``` <- endpoint ... EchoRequest -> stack ... EchoRequest <- stack encrypted ... ``` Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

joamaki · 2022-02-04T10:19:17Z

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with vxlan Tests with secondary NodePort device

Failure Output

FAIL: Request from k8s1 to service tftp://192.168.57.12:31959/hello failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

joamaki · 2022-02-04T14:34:05Z

/test-upstream-k8s

k8s-122-kernel-4.19 failed due to #12690 and that test was disabled in master, so going to ignore that. @twpayne shall I backport the commit to disable the flaky test?

k8s-upstream failed due to box download timeout so kicking it off again.

twpayne · 2022-02-04T14:37:01Z

k8s-122-kernel-4.19 failed due to #12690 and that test was disabled in master, so going to ignore that. @twpayne shall I backport the commit to disable the flaky test?

Yes please.

joamaki · 2022-02-04T16:01:22Z

/test-upstream-k8s

joamaki · 2022-02-07T11:06:03Z

k8s-upstream still hitting the download issue. I'll merge this and then open new backports that will include the fix for that issue.

glibsm requested a review from a team as a code owner January 31, 2022 01:31

glibsm added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jan 31, 2022

tklauser approved these changes Jan 31, 2022

View reviewed changes

pchaigno requested review from pchaigno, brb, christarazi, gandro, joestringer and YutaroHayakawa January 31, 2022 12:55

maintainer-s-little-helper bot assigned brb, christarazi, gandro, joestringer, pchaigno and YutaroHayakawa and unassigned brb Jan 31, 2022

pchaigno approved these changes Jan 31, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned pchaigno Jan 31, 2022

gandro approved these changes Jan 31, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned gandro Jan 31, 2022

christarazi approved these changes Jan 31, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned christarazi Jan 31, 2022

maintainer-s-little-helper bot mentioned this pull request Jan 31, 2022

CI: K8sHubbleTest Hubble Observe Test L3/L4 Flow #17981

Closed

joestringer approved these changes Jan 31, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned joestringer Jan 31, 2022

YutaroHayakawa approved these changes Feb 1, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned YutaroHayakawa Feb 1, 2022

brb approved these changes Feb 1, 2022

View reviewed changes

christarazi and others added 22 commits February 3, 2022 16:41

docs: Remove kube-apiserver policy feature limitation

ec2dd25

[ upstream commit bd7ee73 ] Now that the limitation has been fixed, remove it from the docs. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

test/runtime: define initContainer as const

ead61e6

[ upstream commit b6a41d2 ] The init container name doesn't change during tests, so define it const. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

docs: Limitations section in IPsec GSG

c18c705

[ upstream commit 3cd68af ] Similarly to other guides, we can put incompatibilities between different features into a Limitations section, at the end. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

docs: Document incompatibility between IPsec and host policies

933d9d6

[ upstream commit 411da57 ] Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

joamaki force-pushed the pr/v1.11-backport-2022-01-31 branch from 1536d5e to ca4db19 Compare February 3, 2022 15:41

joamaki merged commit da010b8 into cilium:v1.11 Feb 7, 2022

joestringer mentioned this pull request Feb 23, 2022

Prepare for release v1.11.2 #18922

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.11 backports 2022-01-31 #18669

v1.11 backports 2022-01-31 #18669

glibsm commented Jan 31, 2022

tklauser left a comment

pchaigno left a comment

pchaigno commented Jan 31, 2022 •

edited by maintainer-s-little-helper bot

Loading

joestringer left a comment

YutaroHayakawa left a comment

brb left a comment

joamaki commented Feb 4, 2022 •

edited by maintainer-s-little-helper bot

Loading

Test Name

Failure Output

joamaki commented Feb 4, 2022

twpayne commented Feb 4, 2022 •

edited

Loading

joamaki commented Feb 4, 2022

joamaki commented Feb 7, 2022

v1.11 backports 2022-01-31 #18669

v1.11 backports 2022-01-31 #18669

Conversation

glibsm commented Jan 31, 2022

tklauser left a comment

Choose a reason for hiding this comment

pchaigno left a comment

Choose a reason for hiding this comment

pchaigno commented Jan 31, 2022 • edited by maintainer-s-little-helper bot Loading

joestringer left a comment

Choose a reason for hiding this comment

YutaroHayakawa left a comment

Choose a reason for hiding this comment

brb left a comment

Choose a reason for hiding this comment

joamaki commented Feb 4, 2022 • edited by maintainer-s-little-helper bot Loading

Test Name

Failure Output

joamaki commented Feb 4, 2022

twpayne commented Feb 4, 2022 • edited Loading

joamaki commented Feb 4, 2022

joamaki commented Feb 7, 2022

pchaigno commented Jan 31, 2022 •

edited by maintainer-s-little-helper bot

Loading

joamaki commented Feb 4, 2022 •

edited by maintainer-s-little-helper bot

Loading

twpayne commented Feb 4, 2022 •

edited

Loading