New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: nodeport: provide L4 ports for SNAT in LB egress path #26550
bpf: nodeport: provide L4 ports for SNAT in LB egress path #26550
Conversation
For LB traffic that gets forwarded to a remote backend in non-DSR mode, tail_nodeport_nat_egress_ipv*() calls snat_v*_nat() to perform SNAT on the packet. Under the covers, this extracts a fresh CT tuple to look up / build a SNAT entry. But for LB traffic we don't require any of the ICMP handling in that code path, and we already extract a CT tuple for building tunnel headers in XDP mode. So we can optimize this code path, and provide a fully populated CT tuple to the SNAT helper. One additional benefit is that we fix handling for fragmented IPv4 packets, as lb4_extract_tuple() knows how to extract their L4 ports while snat_v4_nat() doesn't. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
/test |
Related: #11180 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I suspect it may suffer from the same fragment double accounting issue as my #25340, because you also add a call of ipv4_handle_fragmentation. However, I stopped understanding how ipv4_handle_fragmentation leads to ct_lookup4. I'll take one more look tomorrow, maybe something changed in its implementation.
Ah, is that blocking you in #25340 ? It's most likely Line 148 in 28de75d
It's probably fine to just push this metrics update into |
Yes.
Ah right, it's called twice, because ct_lookup4 also calls ct_extract_ports4.
We have a test for that, which breaks after my change.
Right, so I tried to write a test that would break even before my change, so far without luck, though.
I believe there is a scenario with host firewall where ct_lookup4 is called more than once. I couldn't trigger it yet for some reason, though. |
Yep, I would fully expect that we still have cases where that's the case. But those we should fix over time :) |
OK, if I don't manage to reproduce it with host firewall, I'll try to move update_metrics in my pull request to see if it unblocks the failing test. Thanks for the suggestion! I was looking to fix it once and for all, but probably you are right. |
Just FYI, I suspect that this might even produce slightly better metrics (more accurate |
For LB traffic that gets forwarded to a remote backend in non-DSR mode, tail_nodeport_nat_egress_ipv*() calls snat_v*_nat() to perform SNAT on the packet. Under the covers, this extracts a fresh CT tuple to look up / build a SNAT entry.
But for LB traffic we don't require any of the ICMP handling in that code path, and we already extract a CT tuple for building tunnel headers in XDP mode. So we can optimize this code path, and provide a fully populated CT tuple to the SNAT helper.
One additional benefit is that we fix handling for fragmented IPv4 packets, as lb4_extract_tuple() knows how to extract their L4 ports while snat_v4_nat() doesn't.