Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: report distinct drop reason for missed endpoint policy tailcall #32151

Merged
merged 2 commits into from
Apr 26, 2024

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented Apr 24, 2024

Improve the reporting of missed tailcalls for policy programs. While any missed tailcall in the program-internal flow (ie what's driven by tail_call_internal()) points at a genuine loader bug, the policy tailcalls need to deal with race conditions due to eg terminating endpoints. Having distinct drop reasons makes it easier to diagnose the specific drop, and potentially allow-list them.

@julianwiedmann julianwiedmann added sig/loader Impacts the loading of BPF programs into the kernel. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. release-note/misc This PR makes changes that have no direct user impact. labels Apr 24, 2024
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann
Copy link
Member Author

The broader idea here is to isolate what kind of missed tailcall we're still seeing in CI, and make it possible to allow-list only DROP_EP_NOT_READY.

Will probably have to backport for v1.15 as well to get reasonable CI feedback from the upgrade workflows.

@julianwiedmann
Copy link
Member Author

/test

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

bpf/lib/maps.h Show resolved Hide resolved
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API changes are good to go, thanks!

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hubble bits LGTM

…call

Make it easier to differentiate between
(1) a missed program-internal tailcall (as reported by tail_call_internal())
that indicates a bug in how the agent loads the BPF program, and
(2) a missed tailcall to some endpoint's policy program, that can also
    occur due to inherent race conditions for packets that are in-flight
    while an endpoint is being terminated.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Ensure that the callers handle errors, and specify a more fine-grained
drop reason.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann
Copy link
Member Author

(rebase to resolve conflict in generated api)

@julianwiedmann
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 26, 2024
@kaworu kaworu enabled auto-merge April 26, 2024 12:02
@kaworu kaworu added this pull request to the merge queue Apr 26, 2024
Merged via the queue into cilium:main with commit c12a80f Apr 26, 2024
64 checks passed
@julianwiedmann julianwiedmann deleted the 1.16-bpf-policy-tailcall branch May 2, 2024 06:28
@julianwiedmann julianwiedmann added the backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. label May 9, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport pending to v1.15 in 1.15.5 May 9, 2024
@nebril nebril added this to Backport pending to v1.15 in 1.15.6 May 10, 2024
@nebril nebril removed this from Backport pending to v1.15 in 1.15.5 May 10, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
1.15.6
Backport pending to v1.15
Development

Successfully merging this pull request may close these issues.

None yet

5 participants