New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudinit resets network scripts to default configuration DHCP once Config Drive is removed after sometime #3777
Comments
Launchpad user Ryan Harper(raharper) wrote on 2020-09-01T14:51:12.678196+00:00 Hello, The config-drive provided to an instance includes metadata that provides and instance-id. If the config-drive is removed, cloud-init should not longer be active during subsequent boots. It sounds like cloud-init is not installed correctly. Are you using official cloud-images from RHEL and SUSE? Have you manually enabled the cloud-init service files? Cloud-init will disable itself if there are not datasources detected. Can you run cloud-init collect-logs and attach the generated tarball? |
Launchpad user Ryan Harper(raharper) wrote on 2020-09-01T14:51:20.381799+00:00 |
Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-09T16:10:11.990832+00:00 cloud-init used here is not from RHEL or SUSE. As a next step, we will try validate if this behavior can be reproduced with cloud-init shipped with RHEL or SLES and see if this can be reproduced there. |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-09-10T07:56:53.869899+00:00 We have tested the mentioned behavior on cloudint 19.4(RHEL provided cloudinit) on RHEL8.2 guest VM Observing the same behaviors as explained. Attached the cloudinit log for your reference. Please let me know if any other details are required. Thanks. |
Launchpad user Scott Moser(smoser) wrote on 2020-09-10T12:32:07.941924+00:00 I'm pretty sure cloud-init is working as designed here.
for more information. |
Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-11T10:02:38.919104+00:00 While we look at the links provided by Scott above at def apply_network_config(self, bring_up):
|
Launchpad user Eduardo Otubo(otubo) wrote on 2020-09-15T10:11:29.912740+00:00 From RHEL and Openstack point of view, we never remove the config-drive and that's the reason we never faced this issue. Keeping the config-drive attached is an option for you, Divya? Also, if you could paste a diff-style code it would be much easier to review, or even better a github link with the colored diff. |
Launchpad user Eduardo Otubo(otubo) wrote on 2020-09-17T12:19:10.603790+00:00 Just found out other similar bugs related to this behavior and this is fixed downstream only. The bug wasn't exactly on cloud-init, but looks like we were not including ds-identify on the rpm, causing the issue. Please give it a try on any RHEL shipped rpms >= cloud-init-18.5-5.* |
Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-17T12:38:19.570764+00:00 Edurado, ok. I think we have tried it with the RHEL cloud-init 8.5 that was getting shipped with RHEL8 and could reproduce is but we will try it and get back. We did take a look at why we have a need to remove the data source and due to some reasons specific to our platform/environment, we have a need to remove the datasource. I believe cloud-init should accomodate both cases- with and without datasource. In a case there is no datasource, cloud-init should not go and reset NW to dhcp. |
Launchpad user Ryan Harper(raharper) wrote on 2020-09-17T14:50:27.833432+00:00
If there is no datasource, cloud-init does not run. If you've removed your datasource, cloud-init should no longer activate during boot. In your case, you've removed your datasource so cloud-init attempts to look for one. On Power architecture, cloud-init cannot detect if it's running on Openstack without trying to contact the metadata service due to limitations in the Nova/PowerVM. https://cloudinit.readthedocs.io/en/latest/topics/datasources/openstack.html cloud-init then attempts to contact OpenStack over the network, it does this by making a best guess on which network interface to bring up, run DHCP and attempt to contact the metadata service. Command: ['/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhclient.pid', 'env32', '-sf', '/bin/true']
Since cloud-init cannot reach the OpenStack metadata service, it does not have an instance-id so it will assume this boot is a first boot. On first boot, without network configuration from Openstack cloud-init will render a fallback network config, dhcp on best guess interface, which is why you see a change in network configuration. The core issues are:
As Scott mentioned, to address (2), you can look at configuring the manual-cache-clean option which will configure cloud-init in way as to keep existing metadata changes in place until the user manually cleans out cloud-init metadata. |
Launchpad user Eduardo Otubo(otubo) wrote on 2020-10-01T14:18:23.795388+00:00 @ryan, I have a question about:
If cloud-init doesn't run, it shouldn't look for datasources, is that correct?
|
Launchpad user Ryan Harper(raharper) wrote on 2020-10-01T15:00:22.726801+00:00
cloud-init won't run if ds-identify does not detect a datasource. On each boot cloud-init's systemd generator calls ds-identify. This program runs to determine whether cloud-init should run or not. The ds-identify default policy is to search, and report that cloud-init should run if:
For (1); cloud-init examines specific values on the system, files in a For (2); in some cases, ds-identify cannot be 100% sure the datasource isn't For POWER systems which do not export DMI values, the OpenStack Datasource Contrast this with x86 platform where OpenStack VMs export values in the DMI ds-identify then writes out its conclusions in /run/cloud-init/cloud.cfg For this bug, on first boot, /dev/sr1 included an OpenStack ConfigDrive, so cat /run/cloud-init/cloud.cfgdatasource_list: [ ConfigDrive, None ] The configdrive payload includes a network configuration which was applied. cat /run/cloud-init/cloud.cfgdatasource_list: [ OpenStack, None ] Now, cloud-init OpenStack datasource starts, and because the system currently |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-09T09:27:14.768702+00:00 I have explored a bit on manual_cache_clean: true option suggested, Here are some of the findings.
|
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-12T13:32:10.136458+00:00 In the above explanation, if I am not mistaken, if we remove Datasource(/dev/sr0) and reboot we should end up with below cloud.cfg configuration with openstack set right? which doesn't seems to be case here. I am seeing cloud.cfg is intact with [ ConfgDrive, None ]. with this in my opinion dscheck_OpenStack() shouldn't be called at all. when no data source cloudinit should get disabled itself but here why do we see dhcp configuration which is fallback cat /run/cloud-init/cloud.cfgdatasource_list: [ OpenStack, None ] |
Launchpad user Ryan Harper(raharper) wrote on 2020-10-12T14:49:40.155053+00:00
You're saying you've observed the above where you remove the ISO, reboot the Can you provide the contents of /run/cloud-init/* and /var/log/cloud-init.log Could it be either the ISO was not removed, or this may have come from one of the
ds-identify is called every boot. The config files in /run/cloud-init are
Recall that since you're not on x86, the OpenStack check cannot return False |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-13T04:17:44.438351+00:00 Yes, Above is when we explicitly specify ConfigDrive is the datasource in the /etc/cloud/cloud.cfg, Attached is the cloud-init collec-logs |
Launchpad user Ryan Harper(raharper) wrote on 2020-10-13T15:52:12.702667+00:00
Yes; when you remove the ConfigDrive cloud-init no longer knows it has a When booting without the iso, cloud-init tries to find it, but fails and From your logs (thanks!), All of the boots where /dev/sr0 has the Config drive, we see cloud-init search 2020-05-07 06:41:11,105 - util.py[DEBUG]: Cloud-init v. 19.1 running 'init-local' at Thu, 07 May 2020 06:41:11 +0000. Up 3356.48 seconds. On boots where /dev/sr0 does not contain a ConfigDrive ISO, cloud-init does 2020-10-13 03:30:14,074 - util.py[DEBUG]: Cloud-init v. 19.4 running 'init-local' at Tue, 13 Oct 2020 03:30:13 +0000. Up 27.25 seconds. As I suggested in your PR; I think re-submitting and addressing the feedback |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-14T12:26:45.330647+00:00 All though I need to take a look at your suggestion about PR 229: Here is few more findings with ds-identify tool, Let me if this can be accepted if I raise a PR against ds-identity tool itself. Currently ds-identify returns DS_FOUND on subsequent boot even though config drive (/dev/sr0) is removed. I believe this shouldn't happen (let me know if you think otherwise), Here I am trying to fix this particular behavior for power hardware only. Lets say if do something like this in ds-identify
with this I hope cloudinit will not configure fallback(dhcp) on powerVM Thanks |
Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T14:49:16.425838+00:00
ds-identify will already do this. If you remove your hardcoded Note that by default (on Ubuntu at least), the datasource_list root@g1: to update this file, run dpkg-reconfigure cloud-initdatasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, None ] When ds-identify runs, it reads /etc/cloud/cloud.cfg and By default, this is, as you see, a long list. For each of these In your case where you've provided a datasource_list value Look at your ds-identify.log file you provided: [up 17.85s] ds-identify
The current behavior for single-datasource could be changed to I think this would address your case. As a quick test for you I think such a PR is worth discussing. |
Launchpad user Scott Moser(smoser) wrote on 2020-10-14T15:10:59.321734+00:00
It would address this use case, but would break per-boot behavior, |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-14T17:08:00.965069+00:00 @ryan, I tried your suggestion on adding one more datasource NoCloud along with ConfigDrive but I still ended up with cloudinit resetting to fallback(dhcp). JFYI, I am running on RHEL env. Launchpad attachments: logs attached |
Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T18:08:46.906002+00:00
Yes; you're quite right. Thanks for pointing that part out. |
Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T18:14:29.148260+00:00 @vijayendra The reason you're getting that behavior is that the ds-identify policy is to enable cloud-init if it does not find anything: [up 12.54s] ds-identify The default policy is notfound=disable. Is something changing the default ds-identify policy? |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-19T13:02:10.740625+00:00 As per Ryan harper suggestion with some config change as below, we may not hit fallback(dhcp) network reset once config drive is removed. Change 1: Change 2: Above is just a short term as this will break per boot also once configDrive is removed. Currently re working on one of the PR: #229 and testing in our power env. |
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-11-03T04:44:38.634408+00:00 attached cloud-init log for PR: #647 |
Launchpad user Launchpad Janitor(janitor) wrote on 2021-01-03T04:17:16.453269+00:00 [Expired for cloud-init because there has been no activity for 60 days.] |
This bug was originally filed in Launchpad as LP: #1893770
Launchpad details
Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-09-01T11:18:55.806587+00:00
Cloudinit version: 19.1
Platform: OpenStack based.
OS: RHEL, SUSE
We use config drive(/dev/sr0) as a data source to configure network interfaces in the guest but configdrive is not always available and may be removed after couple of hours from the hypervisor.
On a first boot cloudinit uses data provided in config drive and updates system level network scripts /etc/sysconfig/ifcfg-* (Static configuration of networks) files and also configures the interface in the guest.
As long as the configdrive is available, reboots will relay on system scripts for the configuring network but once configdrive is removed, datasource becomes None meaning it neither system script nor config drive
which makes cloud init to configure default network which is DHCP
The text was updated successfully, but these errors were encountered: