-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr/v1.12 backports #23260
Pr/v1.12 backports #23260
Conversation
[ upstream commit 1db1156 ] With cilium/cilium-cli#962 in place in cilium-cli v0.12.0 and the CI update to use that version in #20617, the connectivity tests cover all functionality tested by the tests in l7_demos.go. Moreover, the cilium-cli connectivity tests can be run against arbitrary clusters with Cilium deployed, while this test is specific for the test setup based on vagrant VMs. Thus, drop this test. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit bc14b15 ] [ Backporter's notes: ProxyRequestContext did not have the DataSource field in 1.12 so had to be dropped. ] A few changes are made here: - `cilium_fqdn_semaphore_rejected_total` wasn't being updated correctly, due to an incorrect error check. This is fixed now. - Based on the discussion [here](403790a#r936034590), the field `scope:datapathTime` in `cilium_proxy_upstream_reply_seconds` was split into two different scopes: `policyGenerationTime` (for updating the DNS caches and policy caches) and `datapathTime` (which include the async policy map updates and the identity cache updates). Signed-off-by: Rahul Joshi <rkjoshi@google.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 7a6b4d9 ] Update the EKS helm install guide to patch the aws-node daemonset instead of deleting it prior to installation. That aligns with how the cilium cli install works and also the EKS tests. Also a minor style fix - remove the `` around aws-node because they don't get rendered properly in that section (i.e. they're written as-is). Suggested-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 82b9a8e ] Add a note about aws-node flushing Linux routing tables which could cause connectivity issues if Cilium is uninstalled through the cli. Also suggest that deleting aws-node daemonset prior to Cilium installation is recommended. Suggested-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 4bc5629 ] [ Backporter's notes: strange patch without any functional changes, and of course didn't apply cleanly. Not sure if this will work. ] Signed-off-by: Patrice Chalin <chalin@cncf.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit ae9dbd8 ] [ Backporter's notes: rev_nodeport_lb6 changed quite a bit, hope this change still makes sense. ] Pass the `monitor` feedback from the CT lookup to __encap_with_nodeid(). A previous commit already added this for rev_nodeport_lb4(), so aim for commonality here. Fixes: 428abc9 ("bpf: pipe forwarding reason into traces for TO_OVERLAY") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 1aa45b3 ] Agent monitor supports both consumers and listeners. Unlike consumers, a listener expects an event to be first JSON encoded, then binary encoded using Gob. Since the whole encoding operation may be expensive, we avoid that if there are no active listeners at the moment. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit a9abe31 ] Currently, the Log() method is holding the LogRecord lock for the entire record operation. The record operation can be broken up in the following parts: - add metadata to the record - check if the notifier is set - call NewProxyLogRecord on the notifier if not nil The lock is needed both to copy metadata and to get the currently set notifier. Instead, the call to NewProxyLogRecord can be executed without holding the lock if the concrete type that satisfies the LogRecordNotifier interface supports concurrent notifications. Currently there is only one concrete implementation of that interface: the agent monitor. The agent monitor supports two kinds of clients: listeners and consumers. Since listeners expect the notifications to be first JSON marshaled and then Gob encoded, the log record operation is not negligible both in terms of time taken and computational resources. So, in case of: - a very high rate of fqdn related events - low computational resources available for the pod the queue of goroutines waiting on that lock may grow significantly. In that scenario, the total memory used because of all the enqueued goroutines may grow unacceptably. Since the agent monitor already supports concurrent events handling, the commit shortens the critical section, allowing for concurrent execution of the actual log entries recording. Doing so, the number of goroutines waiting for the LogRecord lock is lowered in high load/low resources scenarios. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 5c4776b ] The benchmarks evaluate the impact of the previous two commits in improving the events log notification performance. Specifically, the benchmarks mimics what is happening in case of a high rate of DNS related notifications and consequent logging of those events, as done in the `notifyOnDNSMsg` callback. To better understand the scalability improvements, the benchmarks run with various numbers of concurrent notifications: from a single one up to 1000. The first benchmark, called LogNotifierWithNoListeners, shows the improvement that comes from commit `monitor: Do not marshal events without active listeners`, that avoids encoding notifications altogether if no listeners are currently active. As expected, the improvement is remarkable, since no (useless) encoding is taking place at all. The second one, called LogNotifierWithListeners, shows the improvement that comes from commit `logger: Avoid holding the lock for the entire log operation`, where we release the lock earlier to enable concurrent log notifications encoding. As expected, the performance improvement is more evident as the load increases, thanks to the higher level of concurrency. Below the results of running these benchmarks without the two commits (shown as `old`) and with them (shown as `new`), compared with benchstat. $ benchstat old.txt new.txt name old time/op new time/op delta LogNotifierWithNoListeners/OneRecord-8 27.8µs ± 1% 2.6µs ± 3% -90.49% (p=0.008 n=5+5) LogNotifierWithNoListeners/TenRecords-8 284µs ± 1% 34µs ±12% -88.03% (p=0.008 n=5+5) LogNotifierWithNoListeners/HundredRecords-8 2.80ms ± 1% 0.21ms ± 4% -92.65% (p=0.008 n=5+5) LogNotifierWithNoListeners/ThousandRecords-8 28.2ms ± 1% 1.8ms ± 1% -93.63% (p=0.016 n=5+4) LogNotifierWithListeners/OneRecord-8 33.5µs ± 6% 48.6µs ± 2% +45.13% (p=0.008 n=5+5) LogNotifierWithListeners/TenRecords-8 352µs ± 1% 330µs ± 1% -6.30% (p=0.016 n=5+4) LogNotifierWithListeners/HundredRecords-8 3.47ms ± 2% 2.92ms ± 3% -16.06% (p=0.008 n=5+5) LogNotifierWithListeners/ThousandRecords-8 35.3ms ± 1% 31.5ms ± 2% -10.83% (p=0.008 n=5+5) name old alloc/op new alloc/op delta LogNotifierWithNoListeners/OneRecord-8 10.2kB ± 0% 0.9kB ± 0% ~ (p=0.079 n=4+5) LogNotifierWithNoListeners/TenRecords-8 102kB ± 0% 9kB ± 0% -91.19% (p=0.008 n=5+5) LogNotifierWithNoListeners/HundredRecords-8 1.02MB ± 0% 0.09MB ± 0% -91.20% (p=0.008 n=5+5) LogNotifierWithNoListeners/ThousandRecords-8 10.2MB ± 0% 0.9MB ± 0% -91.25% (p=0.008 n=5+5) LogNotifierWithListeners/OneRecord-8 14.4kB ± 0% 14.4kB ± 0% +0.05% (p=0.024 n=5+5) LogNotifierWithListeners/TenRecords-8 143kB ± 0% 123kB ± 0% -13.92% (p=0.008 n=5+5) LogNotifierWithListeners/HundredRecords-8 1.43MB ± 0% 1.17MB ± 1% -18.51% (p=0.008 n=5+5) LogNotifierWithListeners/ThousandRecords-8 14.3MB ± 0% 11.1MB ± 1% -22.37% (p=0.008 n=5+5) name old allocs/op new allocs/op delta LogNotifierWithNoListeners/OneRecord-8 169 ± 0% 9 ± 0% -94.67% (p=0.008 n=5+5) LogNotifierWithNoListeners/TenRecords-8 1.69k ± 0% 0.08k ± 0% -95.20% (p=0.008 n=5+5) LogNotifierWithNoListeners/HundredRecords-8 16.9k ± 0% 0.8k ± 0% -95.26% (p=0.008 n=5+5) LogNotifierWithNoListeners/ThousandRecords-8 169k ± 0% 8k ± 0% -95.28% (p=0.008 n=5+5) LogNotifierWithListeners/OneRecord-8 196 ± 0% 196 ± 0% ~ (all equal) LogNotifierWithListeners/TenRecords-8 1.95k ± 0% 1.83k ± 0% -6.53% (p=0.016 n=4+5) LogNotifierWithListeners/HundredRecords-8 19.5k ± 0% 17.8k ± 0% -8.68% (p=0.008 n=5+5) LogNotifierWithListeners/ThousandRecords-8 195k ± 0% 175k ± 0% -10.57% (p=0.008 n=5+5) Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 9f6fc7e ] We currently attempt to suppress tracing when handle_nat_fwd() returns, assuming that the inner function (ie. handle_nat_fwd_ipv*()) was inlined and already raised a trace message. But handle_nat_fwd() can return CTX_ACT_OK without even going down the IPv4 / IPv6 path, in which case we end up missing the TRACE_TO_NETWORK. So just clean up things for good by wiring up an alternative function for the inline config. Don't write trace messages from that function. Note that the tracing in to-overlay still looks completely confused, but that's a separate issue. Fixes: 322510d ("bpf: Add missing packet tracing for handle_nat_fwd") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit baf7f34 ] Fredrik Björkman has reported that the BPF L7 Ingress (and thus Gateway API) does not work when the Cilium's XDP is enabled. A quick glimps into the nodeport_lb4/6 revealed that in the case of an L7 service the ctx_redirect_to_proxy_hairpin_ipv4/6() is called. The latter invokes the ctx_redirect() to the cilium_host. Unfortunately, the redirect does not work when called from the bpf_xdp, as the cilium_host doesn't have any XDP prog attached. Fix this by offloading the L7 handling to the bpf_host at the TC BPF ingress. Reported-by: Fredrik Björkman <fredrik.bjorkman1@ingka.ikea.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 91d52e4 ] Avoid operator crashing from trying to access a field of a nil CESTracker. This can happen rarely with two consecutive CEP deletions that belong to the same CES. While the first CES update is being processed, the second CEP deletion causes the CES in cache to become empty, and CES update gets enqueued anyway, before the first CES update is finished. The first CES update then deletes CES tracker as the CES has 0 CEPs in it. Finally, the second CES update runs into the error. Signed-off-by: Dorde Lapcevic <dordel@google.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit a0a04db ] When listing a table or chain with '-L', iptables will by default try to perform a reverse IP lookup for each IP referenced in the ruleset that is being listed. This adds a useless dependency on DNS, which can lead to an increased initialization time for the agent in case the DNS server is slow to respond. As the reverse lookup is not needed, switch to the '-S' command, which does not perform any reverse lookup. Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 2c9c8c1 ] When a service backend endpoint connects to itself via its service cluster IP address, the traffic is hairpin'd using a loopback IP address. We skip policy enforcement on egress, as a pod is allowed to connect to itself, and users don't have to specify additional policies to allow the hairpin'd traffic. We have a similar expectation on the ingress side. The ingress side was broken because of a regression due to which replies were dropped on the ingress policy enforcement path. The patch that introduced the regression split per packet load-balancing logic into a separate tail call, where (limited) LB state is stored in packet context, and restored later while handling rest of the packet processing including conntrack, and policy enforcement. When a service pod connects to itself via its clusterIP, post the forward service translation, conntrack state update is done on the conntrack entry with loopback address (the restored state). As a result, when a reply is reverse SNAT'd and DNAT'd, the policy verdict for the `CT_NEW` entry is denied, and the reply packet is dropped. Prior to the regression, the original conntrack entry would have the updated state, and loopback flag set. Fix: When hairpin'd traffic is reverse translated, and sent for a local delivery in case of bpf_lxc, we should directly redirect it to the endpoint, thereby skipping the tail call with ingress policy enforcement altogether. Here is the packet flow - Request path: podA -> clusterIP --> svc xlate (dAddr=podA) --> SNAT using loopback address (saddr=loopback IP) --> conntrack entry created (loopback IP, backend IP) with loopback flag state Reply path: cluster IP -> podA --> svc rev-xlate (saddr=podA address) --> SNAT using loopback (daddr=loopback IP) --> CT_REPLY match --> local_delivery Fixes: 7575ba0 ("datapath: Reduce from LXC complexity") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit a064973 ] The test is meant to check that a TCPv4 packet addressed to FRONTEND_IP:FRONTEND_PORT gets DNATed to BACKEND_IP:BACKEND_PORT and TXed back out. But right now we actually assert that the destination port *doesn't* get DNATed, and the test still works! The reason is that the .frag_off field in the IPv4 header isn't built correctly (it has the wrong endianness), so the LB code in lb4_xlate() believes that the packet doesn't have a L4 header and consequently also doesn't apply L4 DNAT on the packet. Fix both aspects. df1bc96 ("bpf: ported existing unit tests to new testing framework") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 5c522b2 ] DSR NAT entries don't have the .host_local flag set, see eg. snat_v4_create_dsr(). This was presumably just copy&pasted from the other NAT entries in the file. It currently doesn't make a difference for the test, but let's still fix it. Fixes: 7904650 ("ctmap: add support for GC of DSR orphaned entries") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit b006a84 ] When clustermesh-apiserver Pod is deployed on the IPv6 single-stack cluster, etcd fails to startup with the error like this. ``` invalid value "https://127.0.0.1:2379,https://a:1:0:3::fc75:2379" for flag -listen-client-urls: URL address does not have the form "host:port": https://a:1:0:3::fc75:2379 ``` This happens because we don't put brackets around the IPv6 address. Fix Helm template to correctly handle that. Fixes: #22952 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
This requirement was removed in 7435adb, but new test files are now trickling down into 1.12 through backports. Skip checking tags here as well. Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 6554eff ] Don't leave dangling IP routes behind, they can impact subsequent tests. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 87a916c ] This most likely just papers over bugs in other tests that fail to tear down their custom routes. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
The BPF unit/integration tests had some trouble running on older kernels (<=5.8). That is because the test runner always attempted to load all programs in the ELF. This fix makes it so we only attempt to load XDP and TC programs, both of which have `BPF_PROG_RUN` support. This commit also removes the rlimit memory lock which is required on pre-5.11 kernels. Fixes: #22779 Fixes: #22780 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
this commits bumps cilium/ebpf to v0.10.0 as this version accommodates a change to the minimum ebpf program input size newer kernels will accept. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
GitHub recently rolled out Docker buildx version v0.10.0 on their builders, which transparently changed the MediaType of docker images to OCI v1 and added provenance attestations. Unfortunately, various tools we use in CI like SBOM tooling and docker manifest inspect do not properly support some aspect of the new image formats. This resulted in breaking CI, with some messages like this: level=fatal msg="generating doc: creating SPDX document: generating SPDX package from image ref quay.io/cilium/docker-plugin-ci:XXX: generating image package" This could also lead CI to fail while waiting for image builds to complete, because the command we use to test whether the image is available did not support the image types. This commit attempts to revert buildx back to v0.9.1 to prevent it from generating the images in a format that other tooling doesn't expect. Over time we can work on migrating to buildx v0.10, testing various parts of our CI as we do so. This is a quick-and-dirty hack to stabilize CI for the short term, then we can figure out over time how to properly resolve the conflict between these systems. Signed-off-by: Joe Stringer <joe@cilium.io>
re-opened per: #23190 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changed are dropped, and in a separate PR so LGTM.
|
/test-backport-1.12 Job 'Cilium-PR-K8s-1.17-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
i guess we need to backport #18306 ? |
oh wait the failure is from |
Okay yeah, a whole slew of v1.13 tests tried to run, and have failed, so we'll ignore those. |
|
/mlh new-flake Cilium-PR-K8s-1.17-kernel-4.9 👍 created #23264 |
/test-1.17-4.9 |
eb9f6ce
to
2dd8d82
Compare
All tests have passed. This PR has a commit ontop to allow us to use the GHA workflow during this pr. I'm going to remove this commit now, and then immediately merge this PR, since we know tests are OK. |
@ldelossa awesome thanks! Please be sure to follow up on running the command under |
One more hint: For each PR that was not backported, I would suggest adding the |
-> please double-check if this makes sense, rev_nodeport_lb6 changed quite a bit
Once this PR is merged, you can update the PR labels via:
PRs dropped from this round:
-> dropped, maintenance branches have different CODEOWNERS
-> relies on missing node.GetCiliumEndpointNodeIP()
-> generated quite a large diff I wasn't sure what to do with
-> can probably be made to fit with some work from the author; do we really want/need to backport this?
Manual fix added to skip
check-missing-tags-in-tests.sh
in v1.12 branch due to 7435adb.Manual modified cherry-pick of #22980 to avoid eBPF unit test failures
Manual fix added to to avoid eBPF unit test failures: #23092 (comment)