-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to find a system nic while running cloud-init #4125
Comments
I add "ExecStartPre=sleep 30" in Service section of cloud-init-local.service. This avoids the problem. But I think that it's not a fundamental solution. |
I see similar problem on 22.1-5.el8.0.1, this is what comes from RHEL 8.6:
While I do see the interface with expected MAC address later, my guess is that the NIC with expected MAC address is not ready or does not exist at the time cloud-init tries to configure it. It is likely to happen when you have some SRIOV NICs witch large number of virtual functions. |
Thanks @xiaoge1001 for reporting this issue. I am not able to reproduce this exact behavior. Could you please provide steps to reproduce and attach the logs? I mark this issue as incomplete in the meanwhile. |
Sorry, this is an occasional problem, I do not reproduce the issue. But I adopted the suggestion here at that time. link: https://askubuntu.com/questions/1400527/unable-to-find-a-system-nic-while-running-cloud-init |
This is a duplicate of #3523. |
Cloud-init doesn't hit this codepath until after systemd-networkd-wait-online.service (and the networkmanager equivalent). While this guarantees that an interface will be ready in userspace, it does not guarantee that all expected interfaces will be ready in userspace. Interfaces may be loaded late by the kernel whem the driver is not loaded by the initramfs. This is what causes this error. This exception apparently exists in order to validate that invalid network configuration isn't passed to cloud-init. This gets thrown when rendering a configuration for openstack which contains a network device hasn't been loaded yet. When an interface loud-init has a few options: a) assume (incorrectly) that if the device is not yet visible in userspace that writing out a configuration with this device will break network backends, traceback without configuring network at all. Pros: cloud-init can log warnings when openstack passes invalid configuration b) warn about the device not existing and drop it from the configuration (previously proposed) Pros: breaks perfectly valid configurations slightly less than current state by rendering network configuration for all interfaces that are currently up c) block until the interface that openstack told us about is available Pros: will always "work" when valid configuration is passed d) trust that network daemons can handle configurations which reference interfaces which might load late[1]: don't throw an exception, don't log an error, but do log an info/debug about the missing interface in case openstack did pass invalid network configuration Pros: will always "work" when valid configuration is passed There is no clear "best" choice, this is an issue that has engineering tradoffs to consider. Cloud-init currently does a). The tradoffs are between behaving correctly, affecting bootspeed, failure path behavior, and failure path ease of debugging. If priority is to work correctly when correct configuration is passed is the priority, then c) or d) are probably superior. Preference to d) for better failure path behavior and no negative impact to boot speed. If priority is logging noisily when invalid network configuration is passed (which would then require users to modify the kernel with built-in drivers or modify initrd to load drivers), then a) or b) are probably superior. I haven't investigated when openstack might pass invalid configuration. Can users provide freeform network configuration or is this a machine-generated configuration that is generated either dynamically or via structured user input forms? If users can't pass free-form configuration, then a) and b) would seem significant less practical. Even if they can, I think that I would prefer to do the right thing if possible rather than maintain the ability to warn but break on false positives. [1] They must - interfaces load late all the time on many platforms. Common offenders include virtio and high speed network adapters. |
Bug report
I run "cloud-init status", I found that cloud-init failed to executel
Steps to reproduce the problem
Reboot computes
Environment details
cloud-init logs
2023-05-18 02:48:15,451 - util.py[DEBUG]: failed stage init-local
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/cloudinit/cmd/main.py", line 689, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3.9/site-packages/cloudinit/cmd/main.py", line 398, in main_init
init.apply_network_config(bring_up=bring_up_interfaces)
File "/usr/lib/python3.9/site-packages/cloudinit/stages.py", line 789, in apply_network_config
netcfg, src = self._find_networking_config()
File "/usr/lib/python3.9/site-packages/cloudinit/stages.py", line 740, in _find_networking_config
if self.datasource and hasattr(self.datasource, 'network_config'):
File "/usr/lib/python3.9/site-packages/cloudinit/sources/DataSourceConfigDrive.py", line 158, in network_config
self._network_config = openstack.convert_net_json(
File "/usr/lib/python3.9/site-packages/cloudinit/sources/helpers/openstack.py", line 698, in convert_net_json
raise ValueError("Unable to find a system nic for %s" % d)
ValueError: Unable to find a system nic for {'mtu': 1500, 'type': 'physical', 'subnets': [{'type': 'static', 'netmask': '255.255.255.0', 'routes': [{'netmask': '0.0.0.0', 'network': '0.0.0.0', 'gateway': '192.168.50.1'}], 'address': '192.168.50.19', 'ipv4': True}], 'mac_address': 'fa:16:3e:7c:49:9f'}
The text was updated successfully, but these errors were encountered: