-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.12 backports 2023-01-13 #23092
v1.12 backports 2023-01-13 #23092
Conversation
/test-backport-1.12 Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.20-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my change, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backport changes for my PR look good. Thanks!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PR #22745, thanks! 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- bpf: nodeport: wire up trace aggregation for rev_nodeport_lb6() #22794 -- bpf: nodeport: wire up trace aggregation for rev_nodeport_lb6() (@julianwiedmann)
-> please double-check if this makes sense, rev_nodeport_lb6 changed quite a bit
Looks good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change looks good. Thanks!
[ upstream commit 1db1156 ] With cilium/cilium-cli#962 in place in cilium-cli v0.12.0 and the CI update to use that version in cilium#20617, the connectivity tests cover all functionality tested by the tests in l7_demos.go. Moreover, the cilium-cli connectivity tests can be run against arbitrary clusters with Cilium deployed, while this test is specific for the test setup based on vagrant VMs. Thus, drop this test. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit baf7f34 ] Fredrik Björkman has reported that the BPF L7 Ingress (and thus Gateway API) does not work when the Cilium's XDP is enabled. A quick glimps into the nodeport_lb4/6 revealed that in the case of an L7 service the ctx_redirect_to_proxy_hairpin_ipv4/6() is called. The latter invokes the ctx_redirect() to the cilium_host. Unfortunately, the redirect does not work when called from the bpf_xdp, as the cilium_host doesn't have any XDP prog attached. Fix this by offloading the L7 handling to the bpf_host at the TC BPF ingress. Reported-by: Fredrik Björkman <fredrik.bjorkman1@ingka.ikea.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Timo Beckers <timo@isovalent.com>
ca91ae1
to
132a953
Compare
/test-backport-1.12 Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.20-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
[ upstream commit 91d52e4 ] Avoid operator crashing from trying to access a field of a nil CESTracker. This can happen rarely with two consecutive CEP deletions that belong to the same CES. While the first CES update is being processed, the second CEP deletion causes the CES in cache to become empty, and CES update gets enqueued anyway, before the first CES update is finished. The first CES update then deletes CES tracker as the CES has 0 CEPs in it. Finally, the second CES update runs into the error. Signed-off-by: Dorde Lapcevic <dordel@google.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit a0a04db ] When listing a table or chain with '-L', iptables will by default try to perform a reverse IP lookup for each IP referenced in the ruleset that is being listed. This adds a useless dependency on DNS, which can lead to an increased initialization time for the agent in case the DNS server is slow to respond. As the reverse lookup is not needed, switch to the '-S' command, which does not perform any reverse lookup. Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 2c9c8c1 ] When a service backend endpoint connects to itself via its service cluster IP address, the traffic is hairpin'd using a loopback IP address. We skip policy enforcement on egress, as a pod is allowed to connect to itself, and users don't have to specify additional policies to allow the hairpin'd traffic. We have a similar expectation on the ingress side. The ingress side was broken because of a regression due to which replies were dropped on the ingress policy enforcement path. The patch that introduced the regression split per packet load-balancing logic into a separate tail call, where (limited) LB state is stored in packet context, and restored later while handling rest of the packet processing including conntrack, and policy enforcement. When a service pod connects to itself via its clusterIP, post the forward service translation, conntrack state update is done on the conntrack entry with loopback address (the restored state). As a result, when a reply is reverse SNAT'd and DNAT'd, the policy verdict for the `CT_NEW` entry is denied, and the reply packet is dropped. Prior to the regression, the original conntrack entry would have the updated state, and loopback flag set. Fix: When hairpin'd traffic is reverse translated, and sent for a local delivery in case of bpf_lxc, we should directly redirect it to the endpoint, thereby skipping the tail call with ingress policy enforcement altogether. Here is the packet flow - Request path: podA -> clusterIP --> svc xlate (dAddr=podA) --> SNAT using loopback address (saddr=loopback IP) --> conntrack entry created (loopback IP, backend IP) with loopback flag state Reply path: cluster IP -> podA --> svc rev-xlate (saddr=podA address) --> SNAT using loopback (daddr=loopback IP) --> CT_REPLY match --> local_delivery Fixes: 7575ba0 ("datapath: Reduce from LXC complexity") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit a064973 ] The test is meant to check that a TCPv4 packet addressed to FRONTEND_IP:FRONTEND_PORT gets DNATed to BACKEND_IP:BACKEND_PORT and TXed back out. But right now we actually assert that the destination port *doesn't* get DNATed, and the test still works! The reason is that the .frag_off field in the IPv4 header isn't built correctly (it has the wrong endianness), so the LB code in lb4_xlate() believes that the packet doesn't have a L4 header and consequently also doesn't apply L4 DNAT on the packet. Fix both aspects. df1bc96 ("bpf: ported existing unit tests to new testing framework") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 5c522b2 ] DSR NAT entries don't have the .host_local flag set, see eg. snat_v4_create_dsr(). This was presumably just copy&pasted from the other NAT entries in the file. It currently doesn't make a difference for the test, but let's still fix it. Fixes: 7904650 ("ctmap: add support for GC of DSR orphaned entries") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit b006a84 ] When clustermesh-apiserver Pod is deployed on the IPv6 single-stack cluster, etcd fails to startup with the error like this. ``` invalid value "https://127.0.0.1:2379,https://a:1:0:3::fc75:2379" for flag -listen-client-urls: URL address does not have the form "host:port": https://a:1:0:3::fc75:2379 ``` This happens because we don't put brackets around the IPv6 address. Fix Helm template to correctly handle that. Fixes: cilium#22952 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
This requirement was removed in 7435adb, but new test files are now trickling down into 1.12 through backports. Skip checking tags here as well. Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 6554eff ] Don't leave dangling IP routes behind, they can impact subsequent tests. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 87a916c ] This most likely just papers over bugs in other tests that fail to tear down their custom routes. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
132a953
to
51a0e7c
Compare
/test-backport-1.12 Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
ConformanceGKE (ci-gke-1.12) — Connectivity test failed - Unrelated to changes:
k8s-1.19-kernel-4.9 (test-1.19-4.9) — Build finished. - #22578 k8s-1.21-kernel-4.9 (test-1.21-4.9) — Build finished. - #22578 |
@ldelossa Are the underlying errors the same in the failure vs the flake? The failure reports a level=error log msg which is typically something we should scrutinize closer. |
BPF checks / build datapath (pull_request) Failing after 6m - looks legitimate. The Go driven BPF tests are complaining about a minimum kernel version. That test is using a Ubuntu 22.04 image: https://github.com/cilium/cilium/actions/runs/3947994288/workflow#L75 The error message |
@christarazi Those tests fail consistently with:
Which appear to be a known flake. |
@ldelossa where did you get the immediate above comment from? Are you sure that's the failure and not an intermediate report for a specific |
@ldelossa That was just a transient error that happened earlier. If you scroll down further, you should see
Edit: @joestringer beat me to it. He's pointing out the same thing. |
Aha I see. In that case: k8s-1.21-kernel-4.9 (test-1.21-4.9) — Build finished. - Existing flake: #21279 Will re-run for 1.19 error, since I do not see any issue for that Kafka related fail. |
/test-1.19-4.9 |
Flake, I'll track a ticket with that previous failure. |
This backport needed some sugar on top: #23190 The original PR is opened on Timo's fork which I can't push to, so we'll use the new PR above to merge in the backports. |
@ti-mo yup, now I know, my bad. I should have just gave it a try, but wasn't sure if you wanted pushes directly to your fork. |
-> please double-check if this makes sense, rev_nodeport_lb6 changed quite a bit
Once this PR is merged, you can update the PR labels via:
PRs dropped from this round:
-> dropped, maintenance branches have different CODEOWNERS
-> relies on missing node.GetCiliumEndpointNodeIP()
-> generated quite a large diff I wasn't sure what to do with
-> can probably be made to fit with some work from the author; do we really want/need to backport this?
Manual fix added to skip
check-missing-tags-in-tests.sh
in v1.12 branch due to 7435adb.