-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loader: attach programs using tcx #30103
Conversation
Connectivity tests fail because currently the TCX API in cilium/ebpf doesn't do a feature test, so on kernels without TCX support creating links fails and we don't fall back to the old behavior. Opened cilium/ebpf#1294 to add a feature test in ebpf-go. The build every commit action will continue to fail because I've separate commits for vendoring the lib changes and fixing the query programs API breakage. I'll keep them separate so the fix can easily be applied to the renovate PR that updates the ebpf-go dependency. |
/test |
07ae5db
to
876668e
Compare
/test |
/test |
Some minor comments/questions about the code otherwise lgtm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API changes LGTM, thanks!
This commit adds the necessary infrastructure to attach bpf programs operating on sk_buff using the kernel's new tcx hook. Enabling the functionality in the agent's endpoint attachment path happens in a follow-up commit. Signed-off-by: Robin Gögge <r.goegge@isovalent.com> Co-authored-by: Timo Beckers <timo@isovalent.com>
This commit puts the tcx logic in the endpoint attachment path and gates it behind a new --enable-tcx agent flag. A follow-up commit will use the flag in the Helm charts' configmap. attachSKBProgram() now takes a bool to indicate if the user has requested tcx attachments and seamlessly migrates programs between tcx and legacy attachment modes in both directions. Of course, this process is contingent on no other tcx programs being attached to the interface, as that disables legacy tc execution. Signed-off-by: Timo Beckers <timo@isovalent.com>
This commit adds the 'bpf.enableTCX' Helm value to allow disabling tcx attachments if external tooling integrating with Cilium hasn't caught up yet, as attaching a tcx program to an interface disables the legacy tc ingress/egress pipelines. The agent upgrades and downgrades interfaces seamlessly based on tcx being enabled or not, so any existing workloads are migrated automatically at runtime if the config flag is changed and the agent restarted. Rebooting the node is not necessary. Signed-off-by: Timo Beckers <timo@isovalent.com>
Extend the agent API to indicate whether Cilium is actually using tcx or relying on legacy tc so that this can be displayed in `cilium status`. Status when tcx is active: # kubectl exec cilium-4m7nq -- cilium-dbg status [...] BandwidthManager: Disabled Routing: Network: Tunnel [geneve] Host: Legacy Attach Mode: TCX Masquerading: IPTables [IPv4: Enabled, IPv6: Enabled] [...] Status when inactive: # kubectl exec cilium-4m7nq -- cilium-dbg status [...] BandwidthManager: Disabled Routing: Network: Tunnel [geneve] Host: Legacy Attach Mode: Legacy TC Masquerading: IPTables [IPv4: Enabled, IPv6: Enabled] [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a note on tcx for the 1.16 upgrade guide. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
/test |
I wonder if this is something we should elevate to |
👋 looking at https://github.com/cilium/cilium/actions/runs/8931592322. What's the current upgrade/downgrade impact of this feature - in particular without #32228 backported? Do we need to switch some of the |
I just did the backport of mentioned PR in #32337 . The ci-ipsec-upgrade test I've seen sporadically fail on various kernels though, not only bpf-next. |
Without the backport there is a tiny race for downgrade where tcx gets removed and legacy tc installed so for a short window there is nothing attached. With the backport it's atomic. The upgrade is also atomic, that is, tcx is installed and then tc legacy removed (if tcx is installed successfully it takes precedence and legacy tc is not run). Lets see if the backport helps. |
(fyi, merged to v1.15 branch now.) |
I'm at attempt 7 for the Can we please flip the default to |
Ah yes, I thought the downgrade tests would test against latest v1.15 branch, not patch release. Cc @ti-mo lets default to false until then wdyt? |
(PR is here for checking #32342 ) |
@julianwiedmann I've retriggered the job you referenced previously and it is now passing: https://github.com/cilium/cilium/actions/runs/8927670014/job/24559664765. Your 7 runs prior to that pulled a |
🤷 I give up on understanding the downgrade tests. So the minor downgrade is running against the previous minor's branch tip, and not the previous minor's current release. And from a quick git archaeology, that was always the case? Then we should be good indeed. Sorry for the noise. Thank you @rgo3 for keeping me honest! |
I feel the same 😅. But yes, I believe and have seen in the CI logs that we install previous minor's branch tip, in this case |
For more detailed descriptions, please refer to the individual commits.
On a high level, this PR:
Closes #27632