Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After cloud-init init --local is executed, ephemeral IP do not cleanup #4100

Closed
ubuntu-server-builder opened this issue May 12, 2023 · 3 comments
Labels
invalid This doesn't seem right launchpad Migrated from Launchpad won't fix This doesn't fit the project plans.

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #2015946

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2023-04-12T03:30:34.214924+00:00
date_fix_committed = 2023-04-12T17:11:41.026101+00:00
date_fix_released = 2023-04-12T17:11:41.026101+00:00
id = 2015946
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/2015946
milestone = None
owner = sxt1001
owner_name = shixuantong
private = False
status = incomplete
submitter = sxt1001
submitter_name = shixuantong
tags = []
duplicates = []

Launchpad user shixuantong(sxt1001) wrote on 2023-04-12T03:30:34.214924+00:00

Reproduction Procedure:
1、Configuring the dhclient.conf file:
send host-name = gethostname();

request subnet-mask, broadcast-address, time-offset, routers,
        domain-name, domain-name-servers, domain-search, host-name,
        dhcp6.name-servers, dhcp6.domain-search,
        netbios-name-servers, netbios-scope, interface-mtu,
        ntp-servers, classless-static-routes;

supersede domain-name-servers 8.8.8.8;
supersede domain-search "mydomain.example.com";
supersede classless-static-routes 0 192.168.0.1;

timeout 60;
retry 300;
script "/sbin/dhclient-script";

2、bonding nic to bond0
nmcli con add type bond con-name bond0 ifname bond0 mode active-backup
nmcli con add type bond-slave ifname enp3s0 master bond0
nmcli con up bond-slave-enp3s0

3、running cloud-init init --local
4、ephemeral IP do not cleanup (enp3s0)
... ...
2: enp3s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether 52:54:00:e9:cb:d9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.172.208/16 brd 192.168.255.255 scope global dynamic enp3s0
       valid_lft 33689sec preferred_lft 33689sec
... ...
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:e9:cb:d9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.172.208/16 brd 192.168.255.255 scope global dynamic noprefixroute bond0
       valid_lft 33675sec preferred_lft 33675sec
    inet6 fe80::6e8f:d5d9:fa69:b1ba/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

@ubuntu-server-builder ubuntu-server-builder added incomplete Action required by submitter launchpad Migrated from Launchpad labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user shixuantong(sxt1001) wrote on 2023-04-12T03:33:34.015260+00:00

The key logs are as follows:

2023-04-12 02:46:41,022 - dhcp.py[DEBUG]: Performing a dhcp discovery on enp3s0
2023-04-12 02:46:41,022 - util.py[DEBUG]: Copying /usr/sbin/dhclient to /var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient
2023-04-12 02:46:41,040 - subp.py[DEBUG]: Running command ['ip', 'link', 'set', 'dev', 'enp3s0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2023-04-12 02:46:41,055 - subp.py[DEBUG]: Running command ['/var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient.pid', 'enp3s0', '-sf', '/bin/true'] with allowed return codes [0] (shell=False, capture=True)
2023-04-12 02:46:47,894 - util.py[DEBUG]: All files appeared after 0 seconds: ['/var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient.pid', '/var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhcp.leases']
2023-04-12 02:46:47,895 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient.pid (quiet=False)
2023-04-12 02:46:47,896 - util.py[DEBUG]: Read 6 bytes from /var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhclient.pid
2023-04-12 02:46:47,896 - util.py[DEBUG]: Reading from /proc/22923/stat (quiet=True)
2023-04-12 02:46:47,897 - util.py[DEBUG]: Read 317 bytes from /proc/22923/stat
2023-04-12 02:46:47,898 - dhcp.py[DEBUG]: killing dhclient with pid=22923
2023-04-12 02:46:47,911 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhcp.leases (quiet=False)
2023-04-12 02:46:47,912 - util.py[DEBUG]: Read 544 bytes from /var/tmp/cloud-init/cloud-init-dhcp-ui6j1klv/dhcp.leases
2023-04-12 02:46:47,913 - dhcp.py[DEBUG]: Received dhcp lease on enp3s0 for 192.168.172.208/255.255.0.0
2023-04-12 02:46:47,915 - init.py[DEBUG]: Attempting setup of ephemeral network on enp3s0 with 192.168.172.208/16 brd 192.168.255.255
2023-04-12 02:46:47,916 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '192.168.172.208/16', 'broadcast', '192.168.255.255', 'dev', 'enp3s0'] with allowed return codes [0] (shell=False, capture=True)
2023-04-12 02:46:47,925 - init.py[DEBUG]: Skip ephemeral network setup, enp3s0 already has address 192.168.172.208
2023-04-12 02:46:47,926 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '0.0.0.0/0', 'via', '192.168.0.1', 'dev', 'enp3s0'] with allowed return codes [0] (shell=False, capture=True)
2023-04-12 02:46:47,932 - init.py[ERROR]: Error bringing up EphemeralIPv4Network. Datasource setup cannot continue
2023-04-12 02:46:47,933 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: FAIL: no local data found from DataSourceOpenStackLocal
2023-04-12 02:46:47,933 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
2023-04-12 02:46:47,934 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/cloudinit/sources/init.py", line 844, in find_source
if s.update_metadata_if_supported(
File "/usr/lib/python3.9/site-packages/cloudinit/sources/init.py", line 733, in update_metadata_if_supported
result = self.get_data()
File "/usr/lib/python3.9/site-packages/cloudinit/sources/init.py", line 326, in get_data
return_value = self._get_data()
File "/usr/lib/python3.9/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 143, in _get_data
with EphemeralDHCPv4(self.fallback_interface):
File "/usr/lib/python3.9/site-packages/cloudinit/net/dhcp.py", line 63, in enter
return self.obtain_lease()
File "/usr/lib/python3.9/site-packages/cloudinit/net/dhcp.py", line 116, in obtain_lease
ephipv4.enter()
File "/usr/lib/python3.9/site-packages/cloudinit/net/init.py", line 1109, in enter
self._bringup_static_routes()
File "/usr/lib/python3.9/site-packages/cloudinit/net/init.py", line 1168, in _bringup_static_routes
subp.subp(
File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line 293, in subp
raise ProcessExecutionError(stdout=out, stderr=err,
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-4', 'route', 'append', '0.0.0.0/0', 'via', '192.168.0.1', 'dev', 'enp3s0']
Exit code: 2
Reason: -
Stdout:
Stderr: RTNETLINK answers: File exists
2023-04-12 02:46:47,939 - main.py[DEBUG]: No local datasource found

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2023-04-12T17:11:25.593349+00:00

Thank you for this bug and helping contribute to make cloud-init better.

Could you please help us better understand the problem you are trying to solve that leads you to want to setup networking on a live system before cloud-init is running so we can understand what you are trying to solve?

The reason I ask is that calling cloud-init boot stages directly is not a typical use or pattern we expect to generally support.

I think I'll mark this bug 'Incomplete' for the time being as we await more context from you, but please feel free to set it back to 'New' when you have time to respond.

Cloud-init should be baked into base images and has 4 separate services designated to run in early system boot to discover a cloud-init datasource, and render networking at the right time in boot without collisions with other network config being previously brought up.

Also, calling any cloud-init stage from cmdline after first boot should remain inert due to semaphores and data caching which inform cloud-init there is nothing else to do. It should have already detected a viable datasource and network config and applied it to the system in early boot, so I don't expect we should be seeing dhclient runs from the second time cloud-init --local is called on a machine unless the instance-id changes and all of cloud-init needs to rerun.

Specific to this reproducer, it looks like it is arbitrarily bringing up network manually and then invoking cloud-init's local boot stage[1] from the commandline where cloud-init hasn't run before during first boot for some reason.

We don't expect cloud-init --local to run for the first time this late in a boot of a machine, our systemd unit cloud-init-local.service orders this unit 'Before=network-pre.target' which expects no network devices up. Here's a quote is from Freedesktop.org's documentation[2] around network-pre.target
"""
network-pre.target is a target that may be used to order services before any network interface is configured.
"""

So, I'm wondering the following:

  1. what problem is needing a solution here? If supplemental network config needs to be provided to cloud-init that a given datasource doesn't already support, it could be provided via either files in /etc/cloud/cloud.cfg.d containing network config or from the various datasources which support customized network-config[3]
  2. Why wasn't cloud-init --local already run on this instance in early first boot?

Much thanks,
chad

REFERENCES:
[1] cloud-init local boot stage: https://cloudinit.readthedocs.io/en/latest/explanation/boot.html#local
[2] systemd docs on network-pre https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/#conceptsinsystemd
[3] cloud-init network config docs https://cloudinit.readthedocs.io/en/latest/reference/network-config.html#network-configuration-sources

each boot stage active in systemd boot targets launched
Calling cloud-init is typically run at first boot --local after a system has booted where is not a typical support path

I'm inclined to mark this bug as invalid because typically c

You are exposing an interesting use-case here that I think is not intended for cloud-init.

cloud-init --local invokes cloud-init's local boot stage
Generally, invoking cloud-init from the comman

Typically cloud-init-local.service is one that is scheduled by systemd to happen before any network is present

@holmanb holmanb added invalid This doesn't seem right won't fix This doesn't fit the project plans. and removed incomplete Action required by submitter labels May 28, 2024
@holmanb
Copy link
Member

holmanb commented May 28, 2024

Cloud-init's init command isn't intended to be invoked via the cli like this. Cloud-init expects to run via the init system, during early boot. This isn't a use case that we plan on supporting. Closing.

@holmanb holmanb closed this as completed May 28, 2024
@holmanb holmanb closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right launchpad Migrated from Launchpad won't fix This doesn't fit the project plans.
Projects
None yet
Development

No branches or pull requests

2 participants