Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: nodeport: provide L4 ports for SNAT in LB egress path #26550

Merged

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented Jun 29, 2023

For LB traffic that gets forwarded to a remote backend in non-DSR mode, tail_nodeport_nat_egress_ipv*() calls snat_v*_nat() to perform SNAT on the packet. Under the covers, this extracts a fresh CT tuple to look up / build a SNAT entry.

But for LB traffic we don't require any of the ICMP handling in that code path, and we already extract a CT tuple for building tunnel headers in XDP mode. So we can optimize this code path, and provide a fully populated CT tuple to the SNAT helper.

One additional benefit is that we fix handling for fragmented IPv4 packets, as lb4_extract_tuple() knows how to extract their L4 ports while snat_v4_nat() doesn't.

Fix SNAT by the N/S load-balancer for fragmented IPv4 requests.

For LB traffic that gets forwarded to a remote backend in non-DSR mode,
tail_nodeport_nat_egress_ipv*() calls snat_v*_nat() to perform SNAT on the
packet. Under the covers, this extracts a fresh CT tuple to look up / build
a SNAT entry.

But for LB traffic we don't require any of the ICMP handling in that code
path, and we already extract a CT tuple for building tunnel headers in XDP
mode. So we can optimize this code path, and provide a fully populated CT
tuple to the SNAT helper.

One additional benefit is that we fix handling for fragmented IPv4 packets,
as lb4_extract_tuple() knows how to extract their L4 ports while
snat_v4_nat() doesn't.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 29, 2023
@julianwiedmann julianwiedmann added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. area/loadbalancing Impacts load-balancing and Kubernetes service implementations and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 29, 2023
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann
Copy link
Member Author

Related: #11180

@julianwiedmann julianwiedmann marked this pull request as ready for review June 29, 2023 08:50
@julianwiedmann julianwiedmann requested a review from a team as a code owner June 29, 2023 08:50
@julianwiedmann julianwiedmann added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Jun 29, 2023
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I suspect it may suffer from the same fragment double accounting issue as my #25340, because you also add a call of ipv4_handle_fragmentation. However, I stopped understanding how ipv4_handle_fragmentation leads to ct_lookup4. I'll take one more look tomorrow, maybe something changed in its implementation.

@julianwiedmann
Copy link
Member Author

Looks good to me, but I suspect it may suffer from the same fragment double accounting issue as my #25340, because you also add a call of ipv4_handle_fragmentation. However, I stopped understanding how ipv4_handle_fragmentation leads to ct_lookup4. I'll take one more look tomorrow, maybe something changed in its implementation.

Ah, is that blocking you in #25340 ? It's most likely

update_metrics(ctx_full_len(ctx), dir, REASON_FRAG_PACKET);
then. That part is a bit silly, I wouldn't expect it to produce accurate metrics at the moment ... we can hardly expect that every packet needs just a single L4 port lookup.

It's probably fine to just push this metrics update into __ct_lookup4() instead - there we should have more confidence that a given CT tuple only gets looked up (== accounted) once. Let's try that in a follow-on PR?

@gentoo-root
Copy link
Contributor

is that blocking you in #25340 ?

Yes.

It's most likely

Ah right, it's called twice, because ct_lookup4 also calls ct_extract_ports4.

I wouldn't expect it to produce accurate metrics at the moment

We have a test for that, which breaks after my change.

we can hardly expect that every packet needs just a single L4 port lookup.

Right, so I tried to write a test that would break even before my change, so far without luck, though.

It's probably fine to just push this metrics update into __ct_lookup4() instead - there we should have more confidence that a given CT tuple only gets looked up (== accounted) once.

I believe there is a scenario with host firewall where ct_lookup4 is called more than once. I couldn't trigger it yet for some reason, though.

@julianwiedmann
Copy link
Member Author

It's probably fine to just push this metrics update into __ct_lookup4() instead - there we should have more confidence that a given CT tuple only gets looked up (== accounted) once.

I believe there is a scenario with host firewall where ct_lookup4 is called more than once.

Yep, I would fully expect that we still have cases where that's the case. But those we should fix over time :)

@gentoo-root
Copy link
Contributor

But those we should fix over time :)

OK, if I don't manage to reproduce it with host firewall, I'll try to move update_metrics in my pull request to see if it unblocks the failing test. Thanks for the suggestion! I was looking to fix it once and for all, but probably you are right.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 30, 2023
@julianwiedmann
Copy link
Member Author

But those we should fix over time :)

OK, if I don't manage to reproduce it with host firewall, I'll try to move update_metrics in my pull request to see if it unblocks the failing test. Thanks for the suggestion! I was looking to fix it once and for all, but probably you are right.

Just FYI, I suspect that this might even produce slightly better metrics (more accurate direction variable, basically). I didn't have time to follow-up on #25880 yet.

@julianwiedmann julianwiedmann merged commit e8fa91d into cilium:main Jun 30, 2023
66 checks passed
@julianwiedmann julianwiedmann deleted the 1.14-bpf-nodeport-nat-l4-ports branch June 30, 2023 11:44
@joamaki joamaki mentioned this pull request Jul 5, 2023
23 tasks
@joamaki joamaki added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 5, 2023
@jibi jibi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants