-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to generate config when interface was renamed #4005
Comments
Launchpad user Chad Smith(chad.smith) wrote on 2022-08-08T21:30:17.122129+00:00 Thanks @chrispatterson for continuing to help us out here on big systems. Looks like a case where the network rename by the kernel is colliding with cloud-init. I'm thinking the failure symptom is the following:
We need to better handle this potential race condition in cloud-init and vet whether a rename happened out from under us, or block the renames in the kernel temporarily if we can. References: [1] https://github.com/canonical/cloud-init/blob/main/cloudinit/net/netplan.py#L279-L284 |
Launchpad user Chad Smith(chad.smith) wrote on 2022-08-08T21:31:22.246252+00:00 I think I'll mark this High and we can discuss tomorrow mitigation steps here. |
Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-16T10:51:33.989141+00:00 fwiw, this issue is affecting me as well. I only see it on real hardware, but apparently it helps to add a lot of bridge interfaces to trigger the issue, particularly OVS bridges. The Traceback I see refers to a real interface name, so I think this may occur under other circumstances than interface rename: 2022-08-16 10:23:30,009 - init.py[DEBUG]: Selected renderer 'netplan' from priority list: ['netplan', 'eni', 'sysconfig'] |
Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-17T09:28:03.619222+00:00 |
Launchpad user Frode Nordahl(fnordahl) wrote on 2022-08-17T09:31:22.895839+00:00 This rudimentary patch [0] works around the issue for me. For anyone stuck on this issue I put it in this PPA [1], which can be used by the MAAS Package repos feature to slip it into a deployment. It does not help for the situation where Should we expand the bug to cover both cases, or do you want a separate bug for attempting to call 0: https://pastebin.ubuntu.com/p/pHqbwJwVPh/ |
Launchpad user James Falcon(falcojr) wrote on 2022-08-17T13:53:54.493033+00:00 Hey Frode, thanks for the patch, but we recently committed a (similar) fix: #1655 |
Launchpad user Brett Holman(holmanb) wrote on 2022-08-19T16:37:37.329339+00:00 This bug is believed to be fixed in cloud-init in version 22.3. If this is still a problem for you, please make a comment and set the state back to New Thank you. |
Launchpad user Frode Nordahl(fnordahl) wrote on 2022-09-08T06:26:40.937710+00:00 The 22.3 package does indeed appear to fix the issue, thank you for the quick turnaround! |
This bug was originally filed in Launchpad as LP: #1983516
Launchpad details
Launchpad user Chris Patterson(cjp256) wrote on 2022-08-03T21:21:01.030130+00:00
2022-08-03 18:42:31,598 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 1359 bytes
2022-08-03 18:42:31,598 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,875 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth2'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,880 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth0'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,956 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth7'] with allowed return codes [0] (shell=False, capture=True)
2022-08-03 18:42:31,959 - util.py[WARNING]: failed stage init-local
2022-08-03 18:42:31,959 - util.py[DEBUG]: failed stage init-local
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 740, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 410, in main_init
init.apply_network_config(bring_up=bring_up_interfaces)
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 937, in apply_network_config
return self.distro.apply_network_config(
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 233, in apply_network_config
self._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 142, in _write_network_state
return super()._write_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/distros/init.py", line 129, in _write_network_state
renderer.render_network_state(network_state)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 260, in render_network_state
self._net_setup_link(run=self._postcmds)
File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 282, in _net_setup_link
subp.subp(cmd, capture=True)
File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth7']
Exit code: 1
Reason: -
Stdout:
Stderr: Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Parsed configuration file /usr/lib/systemd/network/73-usb-net-by-mac.link
Parsed configuration file /run/systemd/network/10-netplan-eth3.link
Parsed configuration file /run/systemd/network/10-netplan-eth2.link
Parsed configuration file /run/systemd/network/10-netplan-eth1.link
Parsed configuration file /run/systemd/network/10-netplan-eth0.link
Created link configuration context.
Failed to open device '/sys/class/net/eth7': No such device
Unload module index
Unloaded link configuration context.
The text was updated successfully, but these errors were encountered: