Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable eswitch mode setting on SmartNICs #253

Merged
merged 10 commits into from
Feb 9, 2022
Merged

Enable eswitch mode setting on SmartNICs #253

merged 10 commits into from
Feb 9, 2022

Conversation

slyon
Copy link
Collaborator

@slyon slyon commented Jan 10, 2022

Description

Certain types of network cards/SmartNICs allow toggling their low-level (devlink) embedded switch mode PCI setting in order to utilize the full hardware offloading capabilities. With this PR we're exposing this functionality to the netplan YAML schema:

  • embedded-switch-mode: [switchdev|legacy]
    • choose the devlink setting for the embedded switch (eswitch)
    • sticks to the hardware's preset by default
  • delay-virtual-functions-rebind: [false|true]
    • False by default: SR-IOV virtual functions are re-bind to their driver right after setting the eswitch mode
    • Can be delayed to a later stage during the boot process; useful when OVS/OVN bonding (VF LAG) is to be used

Example:

network:
  version: 2
  ethernets:
    engreen:
      embedded-switch-mode: "switchdev"
      delay-virtual-functions-rebind: true
    enblue:
      match:
        driver: "mlx5_core"
      set-name: enblue
      embedded-switch-mode: "legacy"
      virtual-function-count: 4
    sriov_vf0:
      link: engreen

This PR builds upon the new libnetplan YAML parser (#250) for accessing the new settings.

Checklist

  • Runs make check successfully.
  • Retains 100% code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

@slyon slyon added the schema change This PR includes a change to netplan's YAML schema, which needs sign-off label Jan 10, 2022
@slyon slyon force-pushed the slyon/mlnx-switchdev branch 2 times, most recently from 33c1468 to 9d69bb0 Compare January 12, 2022 10:30
@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2022

Codecov Report

Merging #253 (79a58d7) into main (a42eb78) will increase coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 79a58d7 differs from pull request most recent head 53db8a4. Consider uploading reports for the commit 53db8a4 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #253      +/-   ##
==========================================
+ Coverage   99.14%   99.16%   +0.02%     
==========================================
  Files          60       61       +1     
  Lines       10490    10803     +313     
==========================================
+ Hits        10400    10713     +313     
  Misses         90       90              
Impacted Files Coverage Δ
src/netplan.c 100.00% <ø> (ø)
netplan/cli/commands/__init__.py 100.00% <100.00%> (ø)
netplan/cli/commands/sriov_rebind.py 100.00% <100.00%> (ø)
netplan/cli/sriov.py 100.00% <100.00%> (ø)
netplan/libnetplan.py 100.00% <100.00%> (ø)
src/generate.c 100.00% <100.00%> (ø)
src/parse.c 99.82% <100.00%> (+<0.01%) ⬆️
src/sriov.c 100.00% <100.00%> (ø)
src/types.c 100.00% <100.00%> (ø)
src/validation.c 99.61% <100.00%> (+0.03%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a42eb78...53db8a4. Read the comment docs.

@slyon slyon force-pushed the slyon/mlnx-switchdev branch 3 times, most recently from e836b9c to 46ab0ba Compare January 13, 2022 10:36
@slyon slyon force-pushed the slyon/mlnx-switchdev branch 2 times, most recently from 13d9f33 to 1ba1e1c Compare January 13, 2022 11:17
@slyon slyon force-pushed the slyon/mlnx-switchdev branch 6 times, most recently from bbb49e2 to 3e73abb Compare January 27, 2022 15:04
doc/netplan.md Show resolved Hide resolved
doc/netplan.md Outdated Show resolved Hide resolved
@slyon slyon added the schema ok label Feb 2, 2022
@slyon slyon force-pushed the slyon/mlnx-switchdev branch 6 times, most recently from 9f1cdb6 to 8db3301 Compare February 2, 2022 11:41
@slyon slyon changed the title Enable eswitch mode setting for certain devices Enable eswitch mode setting on SmartNICs Feb 2, 2022
@slyon slyon marked this pull request as ready for review February 2, 2022 12:05
@slyon
Copy link
Collaborator Author

slyon commented Feb 7, 2022

We got some positive feedback from Canonical's OpenStack team about the new functionality, especially that the VF LAG (bonding of SR-IOV physical functions) is working as expected.

@slyon
Copy link
Collaborator Author

slyon commented Feb 7, 2022

autopkgtest [14:41:28]: @@@@@@@@@@@@@@@@@@@@ summary
ovs                  PASS
ethernets            PASS
bridges              PASS
bonds                PASS
routing              PASS
vlans                PASS
wifi                 PASS
tunnels              PASS
scenarios            PASS
regressions          PASS
autostart            PASS
cloud-init           PASS

Copy link
Contributor

@schopin-pro schopin-pro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in great shape, and the code looks overall solid 🙂

I have a few items, most of them non-blocking, but there's one in particular regarding potential ABI break that needs answering before merging.

tests/test_sriov.py Show resolved Hide resolved
tests/test_sriov.py Outdated Show resolved Hide resolved
@@ -707,6 +707,9 @@ _serialize_yaml(
if (def->sriov_link)
YAML_STRING(def, event, emitter, "link", def->sriov_link->id);
YAML_UINT_DEFAULT(def, event, emitter, "virtual-function-count", def->sriov_explicit_vf_count, G_MAXUINT);
YAML_STRING(def, event, emitter, "embedded-switch-mode", def->embedded_switch_mode);
YAML_BOOL_TRUE(def, event, emitter, "delay-virtual-functions-rebind",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (later-pr): the YAML_BOOL_TRUE name has the opposite semantics to YAML_UINT_0. I should fix that, I guess?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

src/parse.c Show resolved Hide resolved
netplan/cli/sriov.py Outdated Show resolved Hide resolved
src/sriov.h Outdated Show resolved Hide resolved
src/generate.c Outdated Show resolved Hide resolved
tests/generator/base.py Outdated Show resolved Hide resolved
netplan/cli/sriov.py Show resolved Hide resolved
@slyon
Copy link
Collaborator Author

slyon commented Feb 9, 2022

Integration tests are still good, even if the CI is doing funny things by skipping most (all?) tests, due to some package dependency issue. I re-ran it locally:

autopkgtest [16:07:51]: @@@@@@@@@@@@@@@@@@@@ summary
ovs                  PASS
ethernets            PASS
bridges              PASS
bonds                PASS
routing              PASS
vlans                PASS
wifi                 PASS
tunnels              PASS
scenarios            PASS
regressions          PASS
autostart            PASS
cloud-init           PASS

src/generate.c Outdated Show resolved Hide resolved
netplan/cli/sriov.py Show resolved Hide resolved
Copy link
Contributor

@schopin-pro schopin-pro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I have a small suggestion for the new API, but feel free to merge if you don't want to pick it up.

V2:
* simplify self.assert_sriov() in generator/base.py
* add new netplan_state_finish_sriov_write() and netplan_sriov_cleanup() API
* move SR-IOV rebind service generation into netplan_state_finish_sriov_write()

V3:
* move 'any_sriov' udev rules logic into netplan_state_finish_sriov_write()
@slyon
Copy link
Collaborator Author

slyon commented Feb 9, 2022

Thanks for the re-review! Turned out the write_sriov_conf_finish() and cleanup_sriov_conf() functions have been part of the generate binary itself, so it didn't break ABI in the first place. But still it's nicer to have all this handling of SR-IOV logic in the sriov module and call into the library in future version.

I will be merging this.

@slyon slyon merged commit ef56b3a into main Feb 9, 2022
@slyon slyon deleted the slyon/mlnx-switchdev branch February 9, 2022 17:22
@dshcherb
Copy link

dshcherb commented Feb 9, 2022

Thanks a lot for this, this is a much needed feature to utilize SR-IOV 👍

In case setting num_vfs needs to be done via systemd in the future, there was some recent work on that too:

systemd/systemd#21865
systemd/systemd@41ce9d7

@slyon
Copy link
Collaborator Author

slyon commented Feb 9, 2022

In case setting num_vfs needs to be done via systemd in the future, there was some recent work on that too:

Nice! That's good to know. We should probably switch to that upstream solution in the future, once it lands in stable series (maybe in parallel to our manual way of doing it).

@slyon slyon mentioned this pull request Jan 25, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema change This PR includes a change to netplan's YAML schema, which needs sign-off schema ok
Projects
None yet
5 participants