Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.15] bpf: improvements for tailcall error reporting #32332

Merged
merged 13 commits into from
May 22, 2024

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented May 3, 2024

To improve the error reporting for missed tailcalls in the datapath, backport a bunch of recent upstream improvements. This will make it easier to diagnose drops for such errors, and hopefully also help to eliminate CI flakes in the upgrade/downgrade workflows.

The manual backport is needed due to some trivial contextual conflicts.

Once this PR is merged, a GitHub action will update the labels of these PRs:

 30001 30023 30288 31990 32129 32151 32299

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels May 3, 2024
julianwiedmann and others added 13 commits May 9, 2024 08:48
[ upstream commit 4135f5b ]

The naming of ep_tail_call() is very misleading - it *doesn't* call to
an endpoint's policy tail-call (in POLICY_CALL_MAP), and it's also used
by programs that are *not* associated with an endpoint (eg. bpf_xdp and
bpf_overlay). Instead it's used for a tail-call in the program's internal
tail-call map.

So start off with introducing a new helper with improved naming. Then let
this helper return DROP_MISSED_TAIL_CALL (instead of every caller
open-coding the same value), and also setting the index of the missed
tail-call in `ext_err` where available.

Finally steal a bit of compiler magic from the kernel, and enforce that
callers check for returned errors. We've had too many cases in the past
where we forgot to return the DROP_MISSED_TAIL_CALL.

Then convert a few initial callers to demonstrate the usage.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit a5c091c ]

When invoke_tailcall_if() expands into a tail-call, provide it with an
`ext_err` parameter so that the index of a missed tail-call can be
reported.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 7b5195e ]

Allow tail_call_internal() to return the index of a missed tail-call.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 4746609 ]

Switch the internal tail-calls in bpf_host to the new helper.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 174a338 ]

Switch the internal tail-calls in bpf_overlay to the new helper.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 6d69016 ]

Switch the internal tail-calls in nodeport.h to the new helper.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit d22b60f ]

Switch the internal tail-calls in bpf_lxc to the new helper.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 24963eb ]

Switch over some multicast code that was recently introduced.

Then remove the deprecated wrapper, and its coccinelle infrastructure.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit b52df67 ]

Add __must_check wrappers to ensure that callers handle the error.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 33f01d1 ]

Fix mismatched brackets in multicast code and add ENABLE_MULTICAST in
MAX_OVERLAY_OPTIONS to check such mismatches during compile time in
future.

Signed-off-by: harsimran pabla <hpabla@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
…call

[ upstream commit 8b0c05c ]

Make it easier to differentiate between
(1) a missed program-internal tailcall (as reported by tail_call_internal())
that indicates a bug in how the agent loads the BPF program, and
(2) a missed tailcall to some endpoint's policy program, that can also
    occur due to inherent race conditions for packets that are in-flight
    while an endpoint is being terminated.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit c12a80f ]

Ensure that the callers handle errors, and specify a more fine-grained
drop reason.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit de55fd8 ]

Whether the tail-call is executed as dynamic or static is an implementation
detail. Hide it in a generic tail_call_policy() helper.

Suggested-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann
Copy link
Member Author

/test-backport-1.15

@julianwiedmann julianwiedmann changed the title V1.15 tail calls [v1.15] bpf: improvements for tailcall error reporting May 9, 2024
@julianwiedmann julianwiedmann marked this pull request as ready for review May 9, 2024 05:56
@julianwiedmann julianwiedmann requested review from a team as code owners May 9, 2024 05:56
Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you!

@julianwiedmann
Copy link
Member Author

@kaworu ping, I think CODEOWNERS wants you to sanity-check the new drop reason.

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf changes LGTM

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 22, 2024
@julianwiedmann julianwiedmann requested a review from a team May 22, 2024 16:16
@julianwiedmann julianwiedmann merged commit 90acfca into cilium:v1.15 May 22, 2024
59 checks passed
@julianwiedmann julianwiedmann deleted the v1.15-tail-calls branch May 22, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants