Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to generate config when interface was renamed #4005

Closed
ubuntu-server-builder opened this issue May 12, 2023 · 8 comments
Closed

failed to generate config when interface was renamed #4005

ubuntu-server-builder opened this issue May 12, 2023 · 8 comments
Labels
launchpad Migrated from Launchpad priority Fix soon

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1983516

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2022-08-19T16:37:36.642634+00:00
date_created = 2022-08-03T21:21:01.030130+00:00
date_fix_committed = 2022-08-17T13:51:00.957484+00:00
date_fix_released = 2022-08-19T16:37:36.642634+00:00
id = 1983516
importance = high
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1983516
milestone = None
owner = cjp256
owner_name = Chris Patterson
private = False
status = fix_released
submitter = cjp256
submitter_name = Chris Patterson
tags = []
duplicates = []

Launchpad user Chris Patterson(cjp256) wrote on 2022-08-03T21:21:01.030130+00:00

2022-08-03 18:42:31,598 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 1359 bytes
2022-08-03 18:42:31,598 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,875 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth2'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,880 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth0'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,956 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth7'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,959 - util.py[WARNING]: failed stage init-local
2022-08-03 18:42:31,959 - util.py[DEBUG]: failed stage init-local
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 740, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 410, in main_init
init.apply_network_config(bring_up=bring_up_interfaces)
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 937, in apply_network_config
return self.distro.apply_network_config(
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 233, in apply_network_config
self._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 142, in _write_network_state
return super()._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 129, in _write_network_state
renderer.render_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 260, in render_network_state
self._net_setup_link(run=self._postcmds)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 282, in _net_setup_link
subp.subp(cmd, capture=True)
File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth7']
Exit code: 1
Reason: -
Stdout:
Stderr: Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Parsed configuration file /usr/lib/systemd/network/73-usb-net-by-mac.link
Parsed configuration file /run/systemd/network/10-netplan-eth3.link
Parsed configuration file /run/systemd/network/10-netplan-eth2.link
Parsed configuration file /run/systemd/network/10-netplan-eth1.link
Parsed configuration file /run/systemd/network/10-netplan-eth0.link
Created link configuration context.
Failed to open device '/sys/class/net/eth7': No such device
Unload module index
Unloaded link configuration context.

@ubuntu-server-builder ubuntu-server-builder added launchpad Migrated from Launchpad priority Fix soon labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2022-08-08T21:30:17.122129+00:00

Thanks @chrispatterson for continuing to help us out here on big systems.

Looks like a case where the network rename by the kernel is colliding with cloud-init.

I'm thinking the failure symptom is the following:

  • cloud-init calls get_devicelist and looping starts looping through devices found [1]
  • kernel renames some nic and sysfs gets updated
  • cloud-init is unable to finish the loop of calls to 'udevadm', 'test-builtin', 'net_setup_link', <PREVIOUS/STALE_DEVICE_NAME>

We need to better handle this potential race condition in cloud-init and vet whether a rename happened out from under us, or block the renames in the kernel temporarily if we can.

References:

[1] https://github.com/canonical/cloud-init/blob/main/cloudinit/net/netplan.py#L279-L284

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2022-08-08T21:31:22.246252+00:00

I think I'll mark this High and we can discuss tomorrow mitigation steps here.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-16T10:51:33.989141+00:00

fwiw, this issue is affecting me as well. I only see it on real hardware, but apparently it helps to add a lot of bridge interfaces to trigger the issue, particularly OVS bridges.

The Traceback I see refers to a real interface name, so I think this may occur under other circumstances than interface rename:

2022-08-16 10:23:30,009 - init.py[DEBUG]: Selected renderer 'netplan' from priority list: ['netplan', 'eni', 'sysconfig']
2022-08-16 10:23:30,009 - netplan.py[DEBUG]: V2 to V2 passthrough
2022-08-16 10:23:30,014 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 4180 bytes
2022-08-16 10:23:30,014 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,188 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/ovs-system'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,191 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/lo'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,195 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/bondM'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,200 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/enp129s0f0'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,204 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eno1'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,207 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2808'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,212 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2806'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,215 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,220 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2804'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,225 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/bond0'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,229 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/enp129s0f1'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,234 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eno2'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,239 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2807'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,243 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2805'] with allowed return codes [0] (shell=False, capture=True)
2022-08-16 10:23:30,247 - util.py[WARNING]: failed stage init
2022-08-16 10:23:30,248 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 740, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 410, in main_init
init.apply_network_config(bring_up=bring_up_interfaces)
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 937, in apply_network_config
return self.distro.apply_network_config(
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 233, in apply_network_config
self._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 142, in _write_network_state
return super()._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 129, in _write_network_state
renderer.render_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 260, in render_network_state
self._net_setup_link(run=self._postcmds)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 282, in _net_setup_link
subp.subp(cmd, capture=True)
File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/br-bond0.2805']
Exit code: 1
Reason: -
Stdout:
Stderr: Trying to open "/etc/systemd/hwdb/hwdb.bin"...
Trying to open "/etc/udev/hwdb.bin"...
Trying to open "/usr/lib/systemd/hwdb/hwdb.bin"...
Trying to open "/lib/systemd/hwdb/hwdb.bin"...
Trying to open "/lib/udev/hwdb.bin"...
=== trie on-disk ===
tool version: 249
file size: 11124932 bytes
header size 80 bytes
strings 2374708 bytes
nodes 8750144 bytes
Load module index
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
Found container virtualization none.
Loaded timestamp for '/etc/systemd/network'.
Loaded timestamp for '/run/systemd/network'.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Parsed configuration file /usr/lib/systemd/network/73-usb-net-by-mac.link
Parsed configuration file /run/systemd/network/10-netplan-enp129s0f1.link
Parsed configuration file /run/systemd/network/10-netplan-enp129s0f0.link
Parsed configuration file /run/systemd/network/10-netplan-eno2.link
Parsed configuration file /run/systemd/network/10-netplan-eno1.link
Created link configuration context.
Failed to open device '/sys/class/net/br-bond0.2805': No such device
Unload module index
Unloaded link configuration context.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-17T09:28:03.619222+00:00

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-17T09:31:22.895839+00:00

This rudimentary patch [0] works around the issue for me. For anyone stuck on this issue I put it in this PPA [1], which can be used by the MAAS Package repos feature to slip it into a deployment.

It does not help for the situation where udevadm test-builtin net_setup_link is called on an actual non-existing interface though, which is what the OP reported, but the two variants of the issue appear closely connected to me.

Should we expand the bug to cover both cases, or do you want a separate bug for attempting to call udevadm test-builtin net_setup_link on an interface that apparently is not completely initialized yet?

0: https://pastebin.ubuntu.com/p/pHqbwJwVPh/
1: https://launchpad.net/~fnordahl/+archive/ubuntu/lp1983516

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user James Falcon(falcojr) wrote on 2022-08-17T13:53:54.493033+00:00

Hey Frode, thanks for the patch, but we recently committed a (similar) fix: #1655

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2022-08-19T16:37:37.329339+00:00

This bug is believed to be fixed in cloud-init in version 22.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2022-09-08T06:26:40.937710+00:00

The 22.3 package does indeed appear to fix the issue, thank you for the quick turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad priority Fix soon
Projects
None yet
Development

No branches or pull requests

1 participant