Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best effort for xdp #28666

Merged
merged 1 commit into from Nov 16, 2023
Merged

Best effort for xdp #28666

merged 1 commit into from Nov 16, 2023

Conversation

poblahblahblah
Copy link
Contributor

@poblahblahblah poblahblahblah commented Oct 18, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • [?] All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • [?] Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

This allows a user to specify a new mode, best-effort when enabling XDP. We need this ability since we are running physical hosts where the primary NICs do support XDP, but have additional vlan devices that do not have XDP support in the driver.

More background can be found in #24768, but to summarize we have 1 physical device (ens785np0) per host and due to network design we need an additional vlan type interface (vlan3). Regrettably the vlan driver in the kernel does not support XDP. When enabling XDP without this change Cilium enters a crashloop with this error:

level=fatal msg="Failed to compile XDP program" error="program cil_xdp_entry: attaching XDP program to interface vlan3: operation not supported" subsys=datapath-loader

With this change applied and by setting loadBalancer.acceleration to best-effort in our values.yaml file we get the following log output with XDP enabled only on devices that support it:

level=info msg="attaching XDP program to interface vlan3 failed. Ignoring" subsys=datapath-loader
$ sudo bpftool net
xdp:
ens785np0(2) driver id 28652

tc:
ens785np0(2) clsact/ingress cil_from_netdev-ens785np0 id 28699
ens785np0(2) clsact/egress cil_to_netdev-ens785np0 id 28721
vlan3(4) clsact/ingress cil_from_netdev-vlan3 id 28731
vlan3(4) clsact/egress cil_to_netdev-vlan3 id 28739
cilium_net(5) clsact/ingress cil_to_host-cilium_net id 28692
cilium_host(6) clsact/ingress cil_to_host-cilium_host id 28674
cilium_host(6) clsact/egress cil_from_host-cilium_host id 28686
cilium_vxlan(7) clsact/ingress cil_from_overlay-cilium_vxlan id 28663
cilium_vxlan(7) clsact/egress cil_to_overlay-cilium_vxlan id 28664
lxc_health(11) clsact/ingress cil_from_container-lxc_health id 28708

flow_dissector:

I initially created #25870 and it was indicated in this comment thread that swallowing these errors if set to "best-effort" would be an acceptable solution.

Any and all feedback would be very welcome!

Fixes: #24768

Adds "best-effort" mode for XDP to skip interfaces without driver support

@poblahblahblah poblahblahblah requested review from a team as code owners October 18, 2023 04:07
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 18, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Oct 18, 2023
@mhofstetter mhofstetter added kind/enhancement This would improve or streamline existing functionality. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Oct 18, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 18, 2023
Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @poblahblahblah

Added some config-related comments.

@borkmann Adding you as reviewer because it seems that you already have the context from the issue and the previous PR

pkg/option/config.go Show resolved Hide resolved
pkg/option/config.go Show resolved Hide resolved
pkg/datapath/loader/link.go Outdated Show resolved Hide resolved
Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just a couple nits.

pkg/datapath/loader/xdp.go Outdated Show resolved Hide resolved
pkg/datapath/loader/netlink.go Outdated Show resolved Hide resolved
@maintainer-s-little-helper
Copy link

Commits ec4e546, 454ce4b, 50367e6 do not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 19, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 19, 2023
@poblahblahblah
Copy link
Contributor Author

I'll squash commits into a single commit once this is looking like it will be ready for merge.

pkg/datapath/loader/netlink.go Outdated Show resolved Hide resolved
@qmonnet
Copy link
Member

qmonnet commented Nov 8, 2023

/test

@qmonnet
Copy link
Member

qmonnet commented Nov 9, 2023

/test

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better, thanks! But now I don't understand where the new files install/kubernetes/cilium/Chart.yaml-e, ...-ee, and ...-eee come from. Were they committed by mistake?

install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
@ti-mo ti-mo removed the request for review from rgo3 November 10, 2023 13:23
@ti-mo
Copy link
Contributor

ti-mo commented Nov 10, 2023

@poblahblahblah I've merged #28308 in the meantime, so I gave this a rebase and moved the error check to reinitializeXDPLocked(). I'll kick off tests as well.

@ti-mo
Copy link
Contributor

ti-mo commented Nov 10, 2023

/test

@poblahblahblah
Copy link
Contributor Author

I'm not super familiar with the failures we're seeing. Can/should these be run again?

@qmonnet you've been very generous with your time and I appreciate that. It looks like there's another helm failure. Do you have any insight into this?

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval for API.

@qmonnet
Copy link
Member

qmonnet commented Nov 14, 2023

It looks like there's another helm failure. Do you have any insight into this?

Just as the workflow output says 🙂:

HINT: to fix this, run 'make -C Documentation update-helm-values'

So please try running this command and committing the changes in Documentation/helm-reference.rst (This is a different command from make -C install/kubernetes).

The default behavior for enabling XDP support is to enable XDP on all
interfaces. If an interface's driver does not support XDP then Cilium
will bail. This change introduces a new mode, `best-effort`, which will
swallows and logs the error when a device is encountered that does not
support XDP.

Signed-off-by: Patrick O’Brien <patrick.obrien@thetradedesk.com>
@poblahblahblah
Copy link
Contributor Author

poblahblahblah commented Nov 14, 2023

Just as the workflow output says 🙂

Can't believe I missed that. Apologies!

Should be good now

@qmonnet
Copy link
Member

qmonnet commented Nov 15, 2023

/test

@ti-mo ti-mo removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Nov 16, 2023
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@qmonnet
Copy link
Member

qmonnet commented Nov 16, 2023

Jussi, Daniel, Nate and I are all part of sig-datapath, so I think we can skip Louis' review.

@qmonnet qmonnet removed the request for review from ldelossa November 16, 2023 17:30
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 16, 2023
@qmonnet qmonnet merged commit 548fdd7 into cilium:main Nov 16, 2023
61 checks passed
@poblahblahblah
Copy link
Contributor Author

Thank you for the help, everyone!

Great experience all around!

@poblahblahblah poblahblahblah deleted the best-effort-for-xdp branch November 16, 2023 23:19
@julianwiedmann
Copy link
Member

👋 should the new bpf-lb-acceleration value ("best-effort") also be mentioned in the agent documentation?
https://docs.cilium.io/en/v1.15/cmdref/cilium-agent/#options

  --bpf-lb-acceleration string                                BPF load balancing acceleration via XDP ("native", "disabled") (default "disabled")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XDP: Failing to attach XDP program to a tagged VLAN interface
10 participants