Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium: Fix 16bit ifindex limitation #27622

Merged
merged 2 commits into from
Aug 22, 2023
Merged

cilium: Fix 16bit ifindex limitation #27622

merged 2 commits into from
Aug 22, 2023

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Aug 22, 2023

See details in commit desc.

Fixes: #16260

@borkmann borkmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Aug 22, 2023
@borkmann borkmann requested review from a team as code owners August 22, 2023 09:29
@borkmann borkmann removed the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Aug 22, 2023
@borkmann
Copy link
Member Author

/test

@julianwiedmann
Copy link
Member

note that you will need a rebase for picking up #27592 to fix CI

@borkmann
Copy link
Member Author

/test

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you! Needs just one small change I think (see review below).

As follow-on, let's also add HAVE_FIB_IFINDEX to the relevant complexity tests to reflect the real world.

Apart from fixing the bleed, this can be further refined by not relying
on the asm.FnRedirectPeer helper presence but by actually doing a runtime
BPF program probe so that stable kernels can even be covered.

We can further also check for ENABLE_SKIP_FIB.

bpf/lib/conntrack.h Outdated Show resolved Hide resolved
@julianwiedmann
Copy link
Member

We can further also check for ENABLE_SKIP_FIB.

ah, maybe not 🤔 . That's a runtime thing - so if one changes their config to remove ENABLE_SKIP_FIB, they suddenly require the ifindex of existing CT entries to be populated.

@borkmann
Copy link
Member Author

As follow-on, let's also add HAVE_FIB_IFINDEX to the relevant complexity tests to reflect the real world.

Ah yes, I'll add it.

@borkmann
Copy link
Member Author

We can further also check for ENABLE_SKIP_FIB.

ah, maybe not 🤔 . That's a runtime thing - so if one changes their config to remove ENABLE_SKIP_FIB, they suddenly require the ifindex of existing CT entries to be populated.

Yeah, I'd avoid combining the two.

The limitation exists mainly on old kernels where the fib lookup helper
does not populate the outgoing ifindex. Only for this case we rely on
the CT lookup stored ifindex which back then was added as a 16bit field
due to limited padding space available. Nowadays this can be lifted
after the big rework in #23884. We've seen users with high netdevice
churn run into this limitation where the agent bails out.

Apart from fixing the bleed, this can be further refined by not relying
on the asm.FnRedirectPeer helper presence but by actually doing a runtime
BPF program probe so that stable kernels can even be covered.

Fixes: #16260
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit d1c362e1dd68 ("bpf: Always return target ifindex in bpf_fib_lookup")
which HAVE_FIB_IFINDEX reflects is part of is 5.10+ kernels. Add the define
to the complexity tests for 5.10 and net-next to better reflect real world.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

/test

@borkmann
Copy link
Member Author

(checkpatch complaint unrelated given this refers to kernel commit not cilium one)

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, ty!

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@borkmann borkmann merged commit 323b4cb into main Aug 22, 2023
208 of 209 checks passed
@borkmann borkmann deleted the pr/fix-ifindex branch August 22, 2023 12:28
@tklauser
Copy link
Member

tklauser commented Aug 22, 2023

@borkmann I've tried to backport this PR to v1.14 as part of the backporter rotation but hit some non-trivial merge conflicts in bpf/. Would you be ok to backport this yourself given you're more familiar with the code in question? Thanks

@tklauser tklauser mentioned this pull request Aug 22, 2023
22 tasks
@tklauser tklauser added the backport/author The backport will be carried out by the author of the PR. label Aug 22, 2023
@borkmann
Copy link
Member Author

Yep will try to get it done this week.

@joestringer
Copy link
Member

v1.14 backport makes sense, does this also affect older releases?

@borkmann
Copy link
Member Author

borkmann commented Aug 22, 2023

v1.14 backport makes sense, does this also affect older releases?

It does, there it would have to be even more custom patch for older stable releases given they don't have #23884 (only 1.14). If I find some cycles, I could take a look.

@joestringer joestringer added affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch labels Aug 22, 2023
@julianwiedmann
Copy link
Member

FYI, I opened #27642 to cover the fine-tuning of the feature probe.

@borkmann borkmann added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 1, 2023
@borkmann borkmann added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Sep 1, 2023
@julianwiedmann julianwiedmann removed the affects/v1.14 This issue affects v1.14 branch label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datapath: Support devices for bpf_host which ifindex > max(u16)
5 participants