-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2022-01-31 #18669
v1.11 backports 2022-01-31 #18669
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good. Thanks!
/test-backport-1.11 Job 'Cilium-PR-K8s-1.18-kernel-4.9' hit: #17981 (89.69% similarity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks.
[ upstream commit c3ef235 ] Additional sanity check coverage that the kube-apiserver policy matching feature still works with a duplicate policy added and the previous removed. Updates: cilium#17829 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit bd7ee73 ] Now that the limitation has been fixed, remove it from the docs. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit b6a41d2 ] The init container name doesn't change during tests, so define it const. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f49dda7 ] Ensure that all endpoints exist and are ready before attempting to extract a specific endpoint from GetAllEndpointsIds results. Follows commit 3738680 ("test/runtime: Fix flake on reserved:init endpoints") which introduced a similar check before calls to GetEndpointIds. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 96152dc ] Previously, the bpf_fib_lookup() helper was invoked in nodeport.h with the BPF_FIB_LOOKUP_{OUTPUT,DIRECT} flags set. This commit drops both flags to make the NodePort BPF multihoming possible. The former flag instructed the helper to do the FIB lookup from egress perspective. This required us to set the correct ifindex for the lookup. In the NodePort BPF code we used DIRECT_ROUTING_DEV_IFINDEX which corresponds to an iface which is used to communicate between k8s Nodes (i.e., the iface with k8s Node InternalIP or ExternalIP). However, the reliance on the DIRECT_ROUTING_DEV_IFINDEX meant that in the case of more than a single iface used to connect the nodes, only one iface could be used for the NodePort BPF traffic (forwarded service requests). To fix this, we can drop the BPF_FIB_LOOKUP_OUTPUT and set the ingress iface in the FIB lookup parameter. This makes the FIB lookup from the perspective of the device which received a packet (before forwarding to egress iface), and it eliminates the need for DIRECT_ROUTING_DEV_IFINDEX in the NodePort BPF. The latter flag meant to speed up the FIB lookups. The kernel commit which introduced it [1] says: BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes straight to the table associated with the device (expert setting for those looking to maximize throughput) This means that we bypass ip rules. However, cilium-agent for some setups installs IP rules (e.g., EKS, WireGuard, etc). Therefore, to avoid any potential pitfals we drop the latter flag so that the IP rules can be respected. To measure a potential perf regression, I provisioned [2] scripts on three AWS EC2 c5a.8xlarge machines (10Gbit, Ubuntu 20.04 LTS, 5.11 kernel). The netperf client was sending requests to the LB node which was forwarding them to the destination node. The only Helm setting for Cilium was KPR=strict. The results from three runs each setup are the following: ---------------------+-----------------+------------------------- | v1.11.1 | v1.11.1 with FIB changes ---------------------+-----------------+------------------------- TCP_RR MEAN_LATENCY | 358.74 +- 27.08 | 359.56 +- 30.50 with STDDEV_LATENCY | 360.28 +- 26.01 | 361.76 +- 26.71 | 368.18 +- 28.22 | 365.33 +- 35.83 ---------------------+-----------------+------------------------- As you can see, the increase in latency is negligible. Therefore, this commit drops the latter flag unconditionally, i.e., without checking whether there are any non-default IP rules or Cilium is running on a multi-homing setup. [1]: torvalds/linux@87f5fc7 [2]: https://github.com/brb/cilium-without-netfilter Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 6c3bf64 ] According to the Ginkgo e2e test docs, All "AfterEach" statements are executed before all "AfterAll" statements[1]. Previously, this code was assuming that AfterAll inside a context would run first (to delete pods) and then the AfterEach would run outside the context (to wait for the pods to terminate), then AfterAll outside the context to finally clean up the Cilium pods. The result was that in issue 18447, the next test to run could hit a race condition where pods were deleted but did not fully terminate before Cilium was removed. Cilium ends up getting deleted before all the pods, which means that there is no longer any way to execute the CNI DEL operation for those pods, so they get stuck in Terminating state. Fix it by moving the check that the test pods are fully terminated before returning from the AfterAll statement where we initiate the deletion of those pods. [0]: https://docs.cilium.io/en/latest/contributing/testing/e2e/#example-test-layout Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 2ae89c1 ] This test seems like it's always skipped, so who knows whether it works or not. But this should at least mitigate issues with cleaning up state at the end of the test, similar to the issues in the adjacent commits. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit cd7df97 ] Previously this code was mismatched: The BeforeEach would attempt to re-create the graceful termination app before each of the It statements, however the cleanup would only occur once. While preparing PR 18448, I hit an issue ciliumGH-18634 where if we didn't wait sufficiently long between individual tests in the graceful termination test Context, the test would fail during the following BeforeEach because the previous test's client was not yet cleaned up. Fix this by ensuring the cleanup & wait for pod termination occurs in an AfterEach for each It statement in this Context, to match the behaviour of the overall test. The next commit will then shift the cleanup logic out of a Describe level AfterEach and into the individual Contexts. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 589c215 ] The library in the tree that we use to interact with kubectl doesn't automatically wait for all pods to be deleted when we delete them. Depending on which test runs next, this can mean that we occasionally hit a race condition like the one in Issue 18447: * Delete Pods * Delete Cilium * Pods can't be fully deleted because Cilium is down * BeforeAll for next test waits to make sure everything is ready so that we can install Cilium * The next test times out while attempting this process This commit goes through all of the existing suites, removes cases where we are relying on incorrect scheduling between AfterEach/AfterAll statements, and adds in waits after deleting pods anywhere in the suite where we delete pods. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 323cb87 ] When the test fails because it is waiting for containers to terminate, add this information into the error that is reported so that developers can more easily debug the issue that's occurring. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 216bcb1 ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sBandwidthTest Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 119653b ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sCustomCalls Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit fb2d11c ] Deleting all pods after deleting Cilium is a sure-fire way to introduce flakiness where pods hang in Terminating state during deletion, because the Cilium pods have been cleaned up and kubelet can't call into the CNI to properly clean up the pods. Delete the pods first and wait for termination to finish. CC: Quentin Monnet <quentin@isovalent.com> Fixes: 9d4e99d ("tests: rework custom calls's AfterEach/AfterAll blocks to skip if needed") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 95a46c6 ] Use the spec.replicas field in the kube-dns deployment to reconfigure the number of DNS pods in the cluster during pod restart. This should restart the kube-dns pods correctly, similar to just deleting the pods. A subsequent patch intends to split this usage out depending the status of Cilium deployments, in order to mitigate issues where kube-dns pods are deployed differently between test runs due to Cilium configuration (like endpoint routes mode). Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 8302f0e ] As of the writing of this patch, Cilium does not support transitioning between enable-endpoint-routes=true and enable-endpoint-routes=false. Commit c18cfc8 ("test: Redeploy DNS pods in AfterAll for datapath tests") attempted to ensure that DNS pods are properly deleted from nodes prior to completing a particular test file. This was to ensure that when making changes to Cilium configuration like toggling endpoint routes mode, the pods are properly switched over to the new mode. Previously, the order of the test cleanup could cause problems with terminating the kube-dns pods because Cilium was first deleted from the cluster, but the kube-dns pods would get stuck in "Terminating" state. This is believed to be because the nodes would still attempt to call the cilium-cni binary in order to remove the DNS pods, and that would attempt to reach out to the cilium-agent on the node, which is no longer present. The motivation behind this patch, then, is to first scale the pods down to zero so that no kube-dns pods are present in the cluster during test file cleanup, then shut down Cilium. Finally, we restore the number of replicas back to the original value, so that when the next test sets Cilium back up, the kubedns pods will be ready to be deployed afresh. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 6f9e1df ] Currently, we determine whether a device is attached with an XDP program by checking if `link.Attrs().Xdp == nil`. We need to check for `Xdp.Attached` as well, otherwise, the XDP unloading will be executed on innocent devices every time the agent restarts, which might further cause network interruption for some NIC drivers, e.g. Mellanox mlx5. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 3cd68af ] Similarly to other guides, we can put incompatibilities between different features into a Limitations section, at the end. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 411da57 ] Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 622eb86 ] Current inherit_identity_from_host cannot inherit identity from encrypted packets (packets with MARK_MAGIC_ENCRYPT coming from host stack). Extend it to handle it. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 49b31e7 ] Currently, return value of inherit_identity_from_host indicates the packet is coming from proxy or not. However, in some cases, we want to know whether the packet was encrypted or not (and it is impossible to know it after the call since inherit_identity_from_host wipes the mark). To cover both of the cases, we can just extend it to return magic value extracted from the mark. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 8420939 ] Currently, when packets passed from from-container (bpf_lxc) to host stack, src_id is not encoded to the mark. It is inconvenient for rest of the datapath especially when we generate the trace. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 144e9e0 ] When ipsec packets comes from the host (xfrm) stack to from-host program (bpf_host), it bypasses the send_trace_notify (<- stack). As a result, we only see the trace for inner packet (ICMPRequest/Reply with <- endpoint and -> stack). Fix bpf_host program not to bypass the trace. Note that since we moved do_netdev_encrypt after inherit_identity_from_host, we cannot extract identity from ctx->mark anymore, because inherit_identity_from_host wipes the mark. That's why we need to pass inherited identity to do_netdev_encrypt. Otherwise, in tunneling mode, get_identity doesn't work correctly. Trace on source node before fix ``` <- endpoint ... EchoRequest -> stack ... EchoRequest ``` Trace on the source node after fix ``` <- endpoint ... EchoRequest -> stack ... EchoRequest <- stack encrypted ... ``` Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
1536d5e
to
ca4db19
Compare
/test-backport-1.11 Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
/test-upstream-k8s |
k8s-upstream still hitting the download issue. I'll merge this and then open new backports that will include the fix for that issue. |
Once this PR is merged, you can update the PR labels via: