New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: nat: reduce CT lookup scope #25917
Conversation
/test |
/* CT expects a tuple with the source and destination ports reversed, | ||
* while NAT uses normal tuples that match packet headers. | ||
*/ | ||
ipv4_ct_tuple_swap_ports(&tuple_revsnat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice saving on one tuple copy! It was bugging me a bit back when I worked on my revSNAT fix.
Might be also worth dropping ipv4_ct_tuple_swap_ports and just going with assigning sport and dport to the right places, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet we can simplify this quite a bit more (especially in combination with your suggestion to pull the tuple swap out of the CT lookup). But let's do that in a follow-on PR, I still have nightmares from too much CT tuple wrangling 😬 .
The NAT code's CT helper currently doesn't differentiate whether it's called from SNAT or RevSNAT context. If the CT lookup doesn't return an entry, it unconditionally creates a new CT entry. But RevSNAT should only be applied to replies of outbound connections. The expected behaviour is that when such a connection's first packet is handled by the SNAT path, we create (1) a NAT entry, (2) a RevNAT entry, and (3) a CT entry that keeps the NAT entries from being GCed. Thus we should never encounter a situation where the RevSNAT path finds a matching RevNAT entry for a packet, but then doesn't find the corresponding CT entry. And the CT entry we *would* create in this case does not match what we create from the SNAT path at all (for a start, the type is CT_INGRESS when it should be CT_EGRESS). So skip the ct_create*() call from the RevSNAT path. This gives us more robust behaviour, and reduces code size. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
The behaviour for SNAT and RevSNAT is sufficiently different that it makes little sense to squeeze everything into a shared helper. Just open-code the CT lookup in the callers. For the RevSNAT case this even allows us to remove one of the temporary CT tuples. Also remove the relevant BPF unit tests - we still have coverage for these paths through `bpf_nat_tests.c`. (Note that this doesn't completely remove the unused `ext_err` from the RevSNAT path. I expect we will be using it pretty soon for other errors, so let's avoid the churn for now). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
SNAT should only care about outbound connections, while RevSNAT only wants the replies for such connections. Apply the corresponding scope to their CT lookups. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
620e422
to
b2ed3a3
Compare
/test |
Continuing from #25826, this PR applies the concept of "single-scope" CT lookup for the SNAT / RevSNAT paths. Helps to make the code paths simpler & more robust, while reducing complexity.