Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ct_lookup in bpf_host.c to a separate tailcall #23831

Merged
merged 8 commits into from Apr 27, 2023

Conversation

gentoo-root
Copy link
Contributor

Move ct_lookup in bpf_host.c to a separate tailcall to reduce verifier complexity.

Fixes: #17142

@gentoo-root gentoo-root added kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. labels Feb 16, 2023
@gentoo-root gentoo-root force-pushed the split-ct-lookup-host-v2 branch 3 times, most recently from 9ba08ee to a8ad172 Compare February 20, 2023 18:53
@gentoo-root
Copy link
Contributor Author

/test

@gentoo-root gentoo-root force-pushed the split-ct-lookup-host-v2 branch 2 times, most recently from b275b20 to dc000df Compare March 6, 2023 19:41
@gentoo-root gentoo-root marked this pull request as ready for review March 6, 2023 20:42
@gentoo-root gentoo-root requested a review from a team as a code owner March 6, 2023 20:42
@pchaigno pchaigno requested review from pchaigno and removed request for NikAleksandrov March 6, 2023 23:10
@pchaigno pchaigno added kind/complexity-issue Relates to BPF complexity or program size issues area/host-firewall Impacts the host firewall or the host endpoint. labels Mar 6, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me and the commit history is nice! 🚀

I have a couple comments below. Most notably, I'd like to reduce the code movements in the last two commits. That will be painful during future backports and git blaming sessions.

bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
@julianwiedmann julianwiedmann self-requested a review March 7, 2023 07:25
bpf/bpf_host.c Outdated Show resolved Hide resolved
@gentoo-root gentoo-root force-pushed the split-ct-lookup-host-v2 branch 2 times, most recently from 7f261aa to 6be447e Compare March 11, 2023 01:06
@gentoo-root
Copy link
Contributor Author

Pushed a new version. Julian's comment is not addressed yet.

Did a cleanup and simplification of code, fixed a dozen minor bugs added in the previous changeset, deduplicated ct_lookup4 in whitelist_snated_egress_connections, reduced the number of revalidate_data calls, organized the code to avoid some flags like fill_trace, reordered handle_ipv{4,6}_cont after handle_ipv{4,6}.

@pchaigno
Copy link
Member

/test

@pchaigno
Copy link
Member

@gentoo-root I'm guessing, given recent assignments, you may be a bit low on cycles. If that's the case, I think it's fine to address Julian's comment in a follow up. If this PR stays open too long, it's likely to accumulate conflicts 😞

@brb
Copy link
Member

brb commented Mar 16, 2023

OT: is this going to fix the latest 4.19 complexity issues for bpf_host?

@gentoo-root gentoo-root force-pushed the split-ct-lookup-host-v2 branch 3 times, most recently from f92ae82 to 0f9a087 Compare March 16, 2023 19:07
@gentoo-root
Copy link
Contributor Author

Ungenerated deepcopy source code:
diff --git a/pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml b/pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
index 031e6e98..bc2036dc 100644
--- a/pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
+++ b/pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
@@ -937,7 +937,7 @@ spec:
                 type: array
               egressDeny:
                 description: EgressDeny is a list of EgressDenyRule which are enforced
-                  at egress. Any rule inserted here will by denied regardless of the
+                  at egress. Any rule inserted here will be denied regardless of the
                   allowed egress rules in the 'egress' field. If omitted or empty,
                   this rule does not apply at egress.
                 items:

Hmm, that's interesting. I didn't touch this file, and that line is pretty old (2020). How come CI started complaining only now?

@gentoo-root
Copy link
Contributor Author

OT: is this going to fix the latest 4.19 complexity issues for bpf_host?

It reduces complexity of bpf_host, but I need to check specifically whether it's enough to fix the issues.

@gentoo-root
Copy link
Contributor Author

/test

@gentoo-root
Copy link
Contributor Author

gentoo-root commented Apr 1, 2023

/test-1.26-net-next

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Expected

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

@gentoo-root
Copy link
Contributor Author

k8s kernel tests seem to fail because I added new tail calls, and a "Missed tail call" appears when migrating to Cilium that doesn't have new tail calls. #20691

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, took way too long to review this 😞. Had a closer look, please check if my comments make sense ...

bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/bpf_host.c Show resolved Hide resolved
bpf/bpf_host.c Outdated
Comment on lines 771 to 868
if (ret == CTX_ACT_OK) {
if (from_host)
invoke_tailcall_if(is_defined(ENABLE_HOST_FIREWALL),
CILIUM_CALL_IPV4_CONT_FROM_HOST,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just a matter of taste. But I would prefer to have the ctx_store_meta(ctx, CB_SRC_LABEL, secctx) here, and not hidden inside handle_ipv4(). It's already difficult enough to keep track of all the games we play with CB_SRC_LABEL ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about CB_FROM_HOST? Shall I move it out of handle_ipv4 too? If I do, it will be cumbersome to return 2 booleans in addition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CB_FROM_HOST should probably stay in it current place - as that's where we create its input data.

I'm just very sensitive to CB_SRC_LABEL, burned too many times already 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I moved it and rebased.

(Please note that there is one more assignment to CB_SRC_LABEL in handle_ipv4 in the NAT46x64 shortcut that existed before my changes, but this should be fine, we do an unconditional tailcall right away.)

bpf/bpf_host.c Show resolved Hide resolved
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't spot anything else, and you nicely shot down all my earlier concerns 😅.

Feel free to resolve my nits in a follow-on PR if it's easier. But as this needs a rebase anyway ...

@gentoo-root
Copy link
Contributor Author

OMG, rebased 1 hour ago, and tons of conflicts again =/

@gentoo-root gentoo-root force-pushed the split-ct-lookup-host-v2 branch 2 times, most recently from d3ad1d4 to 468f40d Compare April 26, 2023 17:14
@pchaigno
Copy link
Member

/test

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a second, cursory look. One comment below, but I think it can be fixed up after the tests haven't passed (or after they are triaged) without needing to then rerun the full test suite (assuming you don't rebase in the process).

/* Only enforce host policies for packets from host IPs. */
/* Only enforce host policies for packets from host IPs. */
if (src_sec_identity != HOST_ID &&
(is_defined(ENABLE_MASQUERADE) || ipcache_srcid != HOST_ID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't match the condition anymore. Maybe we need to move the comment in whitelist_snated_egress_connections here? We should probably also rename whitelist_snated_egress_connections to something like track_snated_egress_connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to move the comment in whitelist_snated_egress_connections here?

I'll update this comment to make it more meaningful.

We should probably also rename whitelist_snated_egress_connections to something like track_snated_egress_connections.

Why? I believe the meaning of the old name still holds. By creating a CT entry (the only thing this function does at the moment) we are whitelisting these connections from applying host policies to reply packets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but that intent is hard to understand if the code comment is moved to where the condition is.

When handle_ipv4 returns an error, send_drop_notify_error_ext is called
by tail_handle_ipv4, hence no need to call send_drop_notify_error from
handle_ipv4 itself.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
To prepare for splitting ipv4_host_policy_ingress, move the
initialization of the is_untracked_fragment variable closer to its only
usage.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Split the conntrack lookup part of the host policy ingress handlers into
a separate function to prepare for making it a separate tailcall. Use
struct ct_buffer{4,6} for communication between two parts of the old
function to prepare for passing the lookup result via a BPF map, similar
to how it's done in bpf_lxc.c.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Currently, ipv{4,6}_host_policy_egress calls revalidate_data to get the
pointer to the IP header. However, if we analyze from which contexts
it's called, we realize that in most cases this pointer is already
available, and the call to revalidate_data can be optimized out:

1. handle_ipv{4,6} already have the pointer.

2. handle_to_netdev_ipv{4,6} also already have the pointer, moreover,
   handle_to_netdev_ipv4 calls revalidate_data and never uses its
   results - until this commit.

3. from_host_to_lxc doesn't have the pointer, but it always calls
   ipv{4,6}_host_policy_egress with src_id == HOST_ID, so this function
   will never return early before it would call revalidate_data. That
   means that the call to revalidate_data can be moved from
   ipv{4,6}_host_policy_egress to from_host_to_lxc without any
   performance penalty.

This commit essentially drops a few revalidate_data calls, no strings
attached.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
ct_lookup4 is called by both ipv4_host_policy_egress_lookup and
whitelist_snated_egress_connections. Both lookups are the same and can
be replaced with ipv4_host_policy_egress_lookup. Remove the lookup part
from whitelist_snated_egress_connections and reuse the lookup that is
already present in ipv4_host_policy_egress.

While at it, also replace the error check after ct_create4 from IS_ERR
to ret < 0. ct_create4 only returns 0 or DROP_CT_CREATE_FAILED, and
having a mix of different error checks confuses the verifier when it
can't prove that CTX_ACT_DROP is not a possible outcome.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Split the conntrack lookup part of the host policy egress handlers into
a separate function to prepare for making it a separate tailcall. Use
struct ct_buffer{4,6} for communication between two parts of the old
function to prepare for passing the lookup result via a BPF map, similar
to how it's done in bpf_lxc.c.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
In order to reduce verifier complexity, split handle_ipv4 into two
functions chained by a tailcall. The first part will stop after doing
ct_lookup4 of the host firewall, and the second part will pick up the
results from a BPF map and go on.

The tailcall is saved, and the map is not used if host firewall is
disabled, so the overhead is minimal in this case.

Also move the "verifier workaround" call to revalidate_data in
handle_ipv4 to the ENABLE_HOST_FIREWALL section, as its presence at the
previous location now makes clang miscompile code and generate 32-bit
zero extension on pkt and pkt_end pointers, which is not accepted by the
verifier. Removing the workaround altogether leads to another verifier
error: "R5 invalid mem access 'scalar'".

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
In order to reduce verifier complexity, split handle_ipv6 into two
functions chained by a tailcall. The first part will stop after doing
ct_lookup6 of the host firewall, and the second part will pick up the
results from a BPF map and go on.

The tailcall is saved, and the map is not used if host firewall is
disabled, so the overhead is minimal in this case.

Similar to the previous commit, also move the "verifier workaround"
call to revalidate_data in handle_ipv6 to the ENABLE_HOST_FIREWALL
section.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
@gentoo-root
Copy link
Contributor Author

gentoo-root commented Apr 27, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' hit: #24697 (90.61% similarity)

@gentoo-root
Copy link
Contributor Author

/test-1.26-net-next

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 27, 2023
@pchaigno pchaigno merged commit 1085ae2 into cilium:main Apr 27, 2023
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/host-firewall Impacts the host firewall or the host endpoint. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. kind/complexity-issue Relates to BPF complexity or program size issues kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move conntrack logic to separate BPF program to reduce complexity
4 participants