Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tristate type for offload options (LP: #1956264) #270

Merged
merged 14 commits into from
May 12, 2022

Conversation

nicolasbock
Copy link
Contributor

@nicolasbock nicolasbock commented Mar 22, 2022

Signed-off-by: Nicolas Bock nicolas.bock@canonical.com

Description

Checklist

  • Runs make check successfully.
  • Retains 100% code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad. LP#1956264

@nicolasbock
Copy link
Contributor Author

At this point it seems that some tests are not run. I will check and update this PR.

@nicolasbock nicolasbock marked this pull request as ready for review March 23, 2022 18:09
@nicolasbock nicolasbock force-pushed the tristate branch 3 times, most recently from 76586de to 199e060 Compare March 23, 2022 19:53
@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2022

Codecov Report

Merging #270 (350cc19) into main (6b9a3b7) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #270   +/-   ##
=======================================
  Coverage   99.06%   99.07%           
=======================================
  Files          61       61           
  Lines       11160    11196   +36     
=======================================
+ Hits        11056    11092   +36     
  Misses        104      104           
Impacted Files Coverage Δ
netplan/cli/commands/apply.py 100.00% <ø> (ø)
src/netplan.c 100.00% <100.00%> (ø)
src/networkd.c 100.00% <100.00%> (ø)
src/parse.c 99.84% <100.00%> (+<0.01%) ⬆️
src/types.c 100.00% <100.00%> (ø)
tests/generator/test_ethernets.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b9a3b7...350cc19. Read the comment docs.

doc/netplan.md Outdated Show resolved Hide resolved
@nicolasbock
Copy link
Contributor Author

@rlaager Could I get another review? Thanks!

@rlaager
Copy link
Contributor

rlaager commented Apr 12, 2022

@rlaager Could I get another review? Thanks!

It looks good to me upon visual inspection. I haven't run the code. I probably should try to make time to do that, since I'm the one that wants this feature. :)

@nicolasbock
Copy link
Contributor Author

Thanks fort he feedback @rlaager ! 😄

@nicolasbock
Copy link
Contributor Author

Would it help you if I made a package available for you?

@rlaager
Copy link
Contributor

rlaager commented Apr 12, 2022

Would it help you if I made a package available for you?

That would speed it up a bit, yes, as that saves me from patching the package and also guilts me into doing the testing right away. ;) Ideally, I'm looking for whatever is in 20.04 + this patch. But if it's a git snapshot or something that's probably fine too. I've got spare systems in the right state for quick testing, so it's pretty easy to run the test.

@rlaager
Copy link
Contributor

rlaager commented Apr 15, 2022

TL;DR: The changes work as intended. There is a pre-existing issue applying link settings that is only documented for wakeonlan that applies here too.

I tested today. I built the netplan from focal with this patch applied. Maybe that's not ideal from an upstream perspective (since I'm not building the commit on master), but that is what was easiest for me.

My netplan config has this (among other things):

  ethernets:
    enp4s0:
      dhcp4: no
      dhcp6: no
      generic-receive-offload: no
      large-receive-offload: no
    enp6s0:
      dhcp4: no
      dhcp6: no
      #generic-receive-offload: no
      #large-receive-offload: no

That seems to build the correct output:

# cat /run/systemd/network/10-netplan-enp4s0.link
[Match]
OriginalName=enp4s0

[Link]
WakeOnLan=off
GenericReceiveOffload=off
LargeReceiveOffload=off

It doesn't actually work:

# ethtool -k enp4s0 | grep receive-offload
generic-receive-offload: on
large-receive-offload: off
# ethtool -k enp6s0 | grep receive-offload
generic-receive-offload: on
large-receive-offload: off

It's not an issue of off vs on, as set large-receive-offload to yes doesn't change anything.

Here we can see why:

# SYSTEMD_LOG_LEVEL=debug udevadm test-builtin net_setup_link /sys/class/net/enp4s0
Trying to open "/etc/systemd/hwdb/hwdb.bin"...
Trying to open "/etc/udev/hwdb.bin"...
Trying to open "/usr/lib/systemd/hwdb/hwdb.bin"...
Trying to open "/lib/systemd/hwdb/hwdb.bin"...
Trying to open "/lib/udev/hwdb.bin"...
=== trie on-disk ===
tool version:          245
file size:         9954959 bytes
header size             80 bytes
strings            2164111 bytes
nodes              7790768 bytes
Load module index
Found container virtualization none.
timestamp of '/etc/systemd/network' changed
timestamp of '/run/systemd/network' changed
Parsed configuration file /usr/lib/systemd/network/99-default.link
Parsed configuration file /usr/lib/systemd/network/73-usb-net-by-mac.link
Parsed configuration file /run/systemd/network/10-netplan-enp4s0.link
Created link configuration context.
ID_NET_DRIVER=ixgbe
enp4s0: Device has name_assign_type=4
enp4s0: Config file /run/systemd/network/10-netplan-enp4s0.link matches device based on renamed interface name, ignoring
enp4s0: Config file /usr/lib/systemd/network/99-default.link is applied
enp4s0: Failed to get ACTION= property: No such file or directory
ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
enp4s0: Device has name_assign_type=4
Using default interface naming scheme 'v245'.
enp4s0: Policy *keep*: keeping existing userspace name
enp4s0: Device has addr_assign_type=3
enp4s0: MAC on the device already set by userspace
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
Unload module index
Unloaded link configuration context.

networkd refuses to apply 10-netplan-enp4s0.link because it is matching the renamed name. This is the same problem as documented in netplan for wakeonlan:

Note: This will not work reliably for devices matched by name
only and rendered by networkd, due to interactions with device
renaming in udev. Match devices by MAC when setting wake on LAN.

That issue is not specific to wakeonlan but applies to anything that gets written to the .link files. It might be a good idea to document that differently. (The easy way would be to duplicate that into every one of these settings. There are probably other, better, ways.)

@slyon slyon changed the title Add tristate type for offload options Add tristate type for offload options (LP: #1956264) Apr 22, 2022
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this improvement @nicolasbock and sorry for the late reply (I was out of office for a while).

Thanks for the testing @rlaager If I understand your comment correctly the code works as intended, but is affected by an udev/link matching issue, which can be worked around by matching on MAC address?!

The PR looks good to me overall! There is just one small spelling change (on/off -> true/false) that I'd like to push for; see inline comment. Furthermore, I'm wondering if this offload config is something that could be tested in an emulated (qemu/kvm) test environment and if so if it would make sense to add an integration test for it in tests/integration/ethernets.py? But it might need special offloading hardware (which is not available in qemu), right?

Also, running make check locally seems to fail for me, due to coverage issues, could you please rebase this branch to the latest main branch and double check that?

src/networkd.c Outdated Show resolved Hide resolved
src/types.h Outdated Show resolved Hide resolved
@rlaager
Copy link
Contributor

rlaager commented Apr 23, 2022

Thanks for the testing @rlaager If I understand your comment correctly the code works as intended

Correct.

but is affected by an udev/link matching issue

Correct. The .link files that netplan writes out are completely ignored (in general, not specific to this setting inside it) because the interface has already been renamed by the point that .link file is read by networkd. networkd refuses to use the .link file because it is named by the renamed (new) name of the interface.

Specifically, in my case, these are Intel NICs using the ixgbe driver. On early boot, they are named eth0 and eth1 and then renamed to enp4s0 and enp6s0 (in this particular machine, which has two single-port NICs; other machines have single two-port NICs and the same problem exists with slightly different new names).

/run/systemd/network/10-netplan-enp4s0.network is not honored by networkd at all because it contains:

[Match]
Name=enp4s0

Now, I would probably argue that networkd should apply .link files like this based on the new name. But this is not happening by accident. The networkd code contains a message about exactly what is happening here. So it's clearly intentional on their part.

I'm pretty sure I tried matching on eth1, but that doesn't work either, because there is no eth1 at that point. That left matching by MAC.

which can be worked around by matching on MAC address?!

Correct. networkd will apply the .link rules that match on the MAC address even after the interface is renamed, because the MAC address matches.

Now, in my particular case, the interfaces are being put into a bridge, so the MAC addresses get changed once they are added into the bridge. So in that particular configuration, even with the MAC address matching, the .link files are only applied on boot. I can set settings (like generic-receive-offload) that show up in the .link file and have them apply on boot, but I can't change them (via netplan) on a running system without rebooting.

That's all suboptimal, but: 1) Setting this at boot is good enough for me. 2) This issue is about .link settings in general, not this specific new setting, so there's no need to hold up this PR on it.

This is clearly a known issue, as the netplan documentation for wakeonlan describes this exact issue, saying, "Note: This will not work reliably for devices matched by name only and rendered by networkd, due to interactions with device renaming in udev. Match devices by MAC when setting wake on LAN."

So my suggestion was to either: A) duplicate that not to these settings, or B) make it more general somehow.

I'm wondering if this offload config is something that could be tested in an emulated (qemu/kvm) test environment and if so if it would make sense to add an integration test for it in tests/integration/ethernets.py?

Yes, this can be tested in a qemu/kvm VM. With either a virtio or e1000e NIC, sudo ethtool -k INTERFACE_NAME | grep -v fixed shows generic-receive-offload (among a few other things, but not large-receive-offload). And it can be successfully set with sudo ethtool -K INTERFACE_NAME gro off.

@nicolasbock
Copy link
Contributor Author

@slyon I addressed your inline comments. I am still working on the other issues.

src/types.h Outdated Show resolved Hide resolved
@nicolasbock nicolasbock force-pushed the tristate branch 2 times, most recently from 4148b08 to 5e7d22e Compare May 10, 2022 14:50
@nicolasbock
Copy link
Contributor Author

@slyon When trying to update the focal abi definitions (following the instructions in https://github.com/canonical/netplan/blob/main/abi-compat/README.md#howto-create-a-abi-reference) I am getting the following error:

ubuntu@trusty-tamarin:~/pwd$ meson setup build --prefix=/usr
The Meson build system
Version: 0.53.2
Source dir: /home/ubuntu/pwd
Build dir: /home/ubuntu/pwd/build
Build type: native build

meson.build:1:0: ERROR: Meson version is 0.53.2 but project requires >= 0.61.0

A full log can be found at /home/ubuntu/pwd/build/meson-logs/meson-log.txt

Do I need to install meson from somewhere else?

@slyon
Copy link
Collaborator

slyon commented May 10, 2022

@slyon When trying to update the focal abi definitions (following the instructions in https://github.com/canonical/netplan/blob/main/abi-compat/README.md#howto-create-a-abi-reference) I am getting the following error:

Don't worry about that too much. I can deal with the ABI definitions at the end. Focal packages are too old, so we're still using make on Focal (and the github CI). Basically what you'd do is:

make clean && make
abidw libnetplan.so.0.0 --headers-dir include/ --header-file src/abi.h > abi-compat/focal_0.104.xml

But you also need a newer version of abigail-tools from https://launchpad.net/~slyon/+archive/ubuntu/ci-netplan – So don't worry about that. You can just leave out the changes to abi-compat/* if you want, and I can update it before we merge the PR.

src/abi.h Outdated Show resolved Hide resolved
doc/netplan.md Outdated Show resolved Hide resolved
Closes: https://bugs.launchpad.net/netplan/+bug/1956264
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
@nicolasbock
Copy link
Contributor Author

@slyon Would you like me to squash the commits in this PR?

@eli-schwartz
Copy link

Do I need to install meson from somewhere else?

https://mesonbuild.com/Getting-meson.html

Despite that Ubuntu 20.04 is frozen at an old version of Meson, you can manually install the latest version (because its only dependency is python 3.7) and use that for personal development albeit obviously not to build Ubuntu packages.

@nicolasbock
Copy link
Contributor Author

Do I need to install meson from somewhere else?

https://mesonbuild.com/Getting-meson.html

Despite that Ubuntu 20.04 is frozen at an old version of Meson, you can manually install the latest version (because its only dependency is python 3.7) and use that for personal development albeit obviously not to build Ubuntu packages.

Thanks for the information @eli-schwartz . I was just wondering within the context of the abi build instructions. Using make as @slyon suggested works great.

src/abi.h Outdated
@@ -327,4 +348,13 @@ struct netplan_net_definition {
/* netplan-feature: eswitch-mode */
char* embedded_switch_mode;
gboolean sriov_delay_virtual_functions_rebind;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the ABI concern that's leading to adding all these _tristate fields? Specifically, is it size or contents that you're worried about?

If size, they seem to be the same size to me. Granted, I'm using a test program and not netplan's source itself. But if size is the only concern, we could check that with the real netplan code (on all supported archs if necessary).

If contents: an old consumer looking at these will interpret UNSET as if it was TRUE. But is that actually a problem in practice? First off, are there likely any external consumers of those fields? Second, if there are and they interpret UNSET as TRUE, is that actually a problem, given that these fields default to true in the kernel? Third, we don't seem to be concerned about how old consumers will interpret these, as the deprecated fields are no longer being set moving forward anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! You're right. This is a special case where both size and content make sense using the new types as well as the old, and it does not shift the memory layout. In this case I guess we can just change the type of the previous fields.

@slyon slyon added the ABI-compatible ABI changed in a compatible way label May 11, 2022
@nicolasbock
Copy link
Contributor Author

I am confused @slyon . Why revert the change to WakeOnLan? As you commented the true/false usage is more common.

slyon added 4 commits May 11, 2022 17:04
Size:
Both enum and gboolean reduce to integer, so they are of same size.
Content:
An old consumer looking at these will interpret UNSET as if it was TRUE,
which is the kernel's default (=UNSET) value.
Copy link
Contributor Author

@nicolasbock nicolasbock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I took the liberty to push to your PR branch, in order to clean up a few things.

We do not need to squash the commits, as we'll do a squash-merge afterwards. Also, I've reverted the WakeOnLan= setting, because it is unrelated to this offloading PR and also the off value is special/explicit in this case and not your usual boolean (.c.f. https://systemd.network/systemd.link.html#WakeOnLan=)

As suggested by @rlaager the ABI compatibility is not actually broken by this change of gboolean to NetplanTristate as both use the size of an integer and are even value/content compatible, as NETPLAN_TRISTATE_UNSET defaults to the kernel value (TRUE). So we can avoid deprecating the old offloading fields. Thanks for this investigation!

Also, thanks for your example of using ethtool to get/set those values. I've used it to implement an integration tests, which will be run during autopkgtest inside a qemu/kvm VM and/or LXD container. I needed to add a small quirt to the CI workflow, in order to add ethtool as a test dependency. Truns out that those values (all but large-receive-offload) can be used on a veth device as well, which is what we're using inside the integration tests:

# ethtool -k eth42 | grep -v fixed
Features for eth42:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ip-generic: on
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on

Finally, I've added a udevadm test /sys/class/net/IFACE command inside the netplan apply CLI, to make it actually apply those changes at runtime without a reboot. I think this migh solve the issue described above, at least in some cases (it works inside the integration test).

@nicolasbock Let me know what you think! From my POV this PR is now ready to be merged!
Thanks again for your continued contributions to the netplan project! And also thanks to all the other people involved in reviewing this PR!

@rlaager rlaager mentioned this pull request May 12, 2022
5 tasks
@rlaager
Copy link
Contributor

rlaager commented May 12, 2022

I backported this to 20.04 and retested. It works the same as before, as expected. So I'm happy with this.

@slyon I ended up figuring out the remaining piece for applying the changes in my scenario. I've taken that to a new PR #278.

@slyon slyon merged commit 310bfed into canonical:main May 12, 2022
@nicolasbock nicolasbock deleted the tristate branch May 12, 2022 12:23
slyon added a commit that referenced this pull request Feb 6, 2023
Using bond, bridge, and VLAN devices results in sharing MAC addresses across interfaces. Matching by MACAddress fails in two ways. It will overmatch by matching all the interfaces (physical and virtual) that share that MAC address. It will undermatch by no longer matching the physical interface once its MAC address has been changed.

Matching by PermanentMACAddress fixes both of those issues.

This replaces the old "workaround" of setting Type=!vlan bond bridge.

That code had a comment saying, "something outputs netplan config that includes using the MAC of the first phy member of a bond as default value for the MAC of the bond device itself." I think this was a misunderstanding of what was going on there. In both examples, the configuration was attempting to match a physical interface by MAC address, not provide a MAC address to set on the virtual device.

These are the two Launchpad bugs referenced by the code:
https://bugs.launchpad.net/netplan/+bug/1804861
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1888726

The comment continued, "This is evil, it's an optional field and networkd knows what to do if the MAC isn't specified". This configuration is perfectly reasonable. It is in fact required, as explained in the netplan documentation, if one wants to use settings that apply in the .link file (wakeonlan or the offload settings). This is because networkd refuses to apply a .link file that matches by name if the interface has already been renamed. At a minimum, this means that matching by MAC address is required if one wants to be able to change the settings after boot (on interfaces that are renamed). If the interfaces are renamed at early boot, before the netplan configuration applies, matching by MAC address is required for those settings to apply at all.

For my example, see:
#270 (comment)
#270 (comment)

Arguably, netplan should always use PermanentMACAddress, at least for physical interfaces. But that change might have unintended consequences, so I opted for the more minimal change of only using it in these complex scenarios (which are already special-cased).

COMMITS:

* Match by PermanentMACAddress in complex scenarios

Using bond, bridge, and VLAN devices results in sharing MAC addresses
across interfaces.  Matching by MACAddress fails in two ways.  It will
overmatch by matching all the interfaces (physical and virtual) that
share that MAC address.  It will undermatch by no longer matching the
physical interface once its MAC address has been changed.

Matching by PermanentMACAddress fixes both of those issues.

This replaces the old "workaround" of setting "Type=!vlan bond bridge".

That code had a comment saying, "something outputs netplan config that
includes using the MAC of the first phy member of a bond as default
value for the MAC of the bond device itself."  I think this was a
misunderstanding of what was going on there.  In both examples, the
configuration was attempting to match a physical interface by MAC
address, not provide a MAC address to set on the virtual device.

The comment continued, "This is evil, it's an optional field and
networkd knows what to do if the MAC isn't specified".  This
configuration is perfectly reasonable.  It is in fact required,
as explained in the netplan documentation, if one wants to use settings
that apply in the .link file (wakeonlan or the offload settings).
This is because networkd refuses to apply a .link file that matches by
name if the interface has already been renamed.  At a minimum, this
means that matching by MAC address is required if one wants to be able
to change the settings after boot (on interfaces that are renamed).  If
the interfaces are renamed at early boot, before the netplan
configuration applies, matching by MAC address is required for those
settings to apply at all.

Arguably, netplan should always use PermanentMACAddress, at least for
physical interfaces.  But that change might have unintended
consequences, so I opted for the more minimal change of only using it
in these complex scenarios (which are already special-cased).

* doc: update match.mac wrt permanent vs virtual MAC

* networkd: apply PermanentMACAddress= to all MAC matches

* test:integration:base: add veth MAC matching quirk

---------

Co-authored-by: Lukas Märdian <slyon@ubuntu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI-compatible ABI changed in a compatible way
Projects
None yet
5 participants