Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudinit resets network scripts to default configuration DHCP once Config Drive is removed after sometime #3777

Open
ubuntu-server-builder opened this issue May 12, 2023 · 27 comments
Labels
bug Something isn't working correctly launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1893770

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2020-09-01T11:18:55.806587+00:00
date_fix_committed = None
date_fix_released = None
id = 1893770
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1893770
milestone = None
owner = vradhakrishna
owner_name = vijayendra radhakrishna
private = False
status = in_progress
submitter = vradhakrishna
submitter_name = vijayendra radhakrishna
tags = []
duplicates = []

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-09-01T11:18:55.806587+00:00

Cloudinit version: 19.1
Platform: OpenStack based.
OS: RHEL, SUSE

We use config drive(/dev/sr0) as a data source to configure network interfaces in the guest but configdrive is not always available and may be removed after couple of hours from the hypervisor.

On a first boot cloudinit uses data provided in config drive and updates system level network scripts /etc/sysconfig/ifcfg-* (Static configuration of networks) files and also configures the interface in the guest.

As long as the configdrive is available, reboots will relay on system scripts for the configuring network but once configdrive is removed, datasource becomes None meaning it neither system script nor config drive

which makes cloud init to configure default network which is DHCP

@ubuntu-server-builder ubuntu-server-builder added bug Something isn't working correctly launchpad Migrated from Launchpad labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-09-01T14:51:12.678196+00:00

Hello,
Thanks for filing a bug.

The config-drive provided to an instance includes metadata that provides and instance-id. If the config-drive is removed, cloud-init should not longer be active during subsequent boots. It sounds like cloud-init is not installed correctly.

Are you using official cloud-images from RHEL and SUSE? Have you manually enabled the cloud-init service files? Cloud-init will disable itself if there are not datasources detected.

Can you run cloud-init collect-logs and attach the generated tarball?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-09-01T14:51:20.381799+00:00

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-09T16:10:11.990832+00:00

cloud-init used here is not from RHEL or SUSE. As a next step, we will try validate if this behavior can be reproduced with cloud-init shipped with RHEL or SLES and see if this can be reproduced there.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-09-10T07:56:53.869899+00:00

We have tested the mentioned behavior on cloudint 19.4(RHEL provided cloudinit) on RHEL8.2 guest VM

Observing the same behaviors as explained. Attached the cloudinit log for your reference.

Please let me know if any other details are required.

Thanks.
Launchpad attachments: cloudinit log

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2020-09-10T12:32:07.941924+00:00

I'm pretty sure cloud-init is working as designed here.
Please see

for more information.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-11T10:02:38.919104+00:00

While we look at the links provided by Scott above
(https://bugs.launchpad.net/cloud-init/+bug/1712680 from inside one of the above links has a lot of data), the current downstream patch that we are using to get past this issue is return from the below function is datasrc is found as None. This has worked for us so far.

at
https://github.com/canonical/cloud-init/blob/stable-19.4/cloudinit/stages.py#L678

def apply_network_config(self, bring_up):

  if ((self.datasource is NULL_DATA_SOURCE) or (
            self.datasource is None)):
        LOG.info("Data source is None. Skipping network config")
        return

    if self.datasource:
        try:
            if ((self.datasource.dsname == "None") or (
                    self.datasource.dsname is "None") or (
                        self.datasource.dsname is None)):
                LOG.info(
                    "Data source is an instance of DataSourceNone. "
                    "Skipping network config")
                return
        except BaseException:
            LOG.info("in except block")
            if (isinstance(self.datasource, dsnone.DataSourceNone)):
                LOG.info(
                    "Data source is an instance of DataSourceNone. "
                    "Skipping network config")
                return

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Eduardo Otubo(otubo) wrote on 2020-09-15T10:11:29.912740+00:00

From RHEL and Openstack point of view, we never remove the config-drive and that's the reason we never faced this issue. Keeping the config-drive attached is an option for you, Divya?

Also, if you could paste a diff-style code it would be much easier to review, or even better a github link with the colored diff.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Eduardo Otubo(otubo) wrote on 2020-09-17T12:19:10.603790+00:00

Just found out other similar bugs related to this behavior and this is fixed downstream only. The bug wasn't exactly on cloud-init, but looks like we were not including ds-identify on the rpm, causing the issue. Please give it a try on any RHEL shipped rpms >= cloud-init-18.5-5.*

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Divya K Konoor(dikonoor) wrote on 2020-09-17T12:38:19.570764+00:00

Edurado, ok. I think we have tried it with the RHEL cloud-init 8.5 that was getting shipped with RHEL8 and could reproduce is but we will try it and get back.

We did take a look at why we have a need to remove the data source and due to some reasons specific to our platform/environment, we have a need to remove the datasource. I believe cloud-init should accomodate both cases- with and without datasource. In a case there is no datasource, cloud-init should not go and reset NW to dhcp.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-09-17T14:50:27.833432+00:00

In a case there is no datasource, cloud-init should not go and reset NW to dhcp.

If there is no datasource, cloud-init does not run. If you've removed your datasource, cloud-init should no longer activate during boot. In your case, you've removed your datasource so cloud-init attempts to look for one. On Power architecture, cloud-init cannot detect if it's running on Openstack without trying to contact the metadata service due to limitations in the Nova/PowerVM.

https://cloudinit.readthedocs.io/en/latest/topics/datasources/openstack.html

cloud-init then attempts to contact OpenStack over the network, it does this by making a best guess on which network interface to bring up, run DHCP and attempt to contact the metadata service.

Command: ['/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-2zup00qk/dhclient.pid', 'env32', '-sf', '/bin/true']
Exit code: 2
Reason: -
Stdout:
Stderr: Internet Systems Consortium DHCP Client 4.3.6
Copyright 2004-2017 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

    Listening on LPF/env32/fa:c9:e6:d2:01:20
    Sending on   LPF/env32/fa:c9:e6:d2:01:20
    Sending on   Socket/fallback
    Created duid "\000\004\220\007\237\263\032[K\023\261\364}\206\256\275Jc".
    DHCPDISCOVER on env32 to 255.255.255.255 port 67 interval 4 (xid=0xde2eb608)
    DHCPDISCOVER on env32 to 255.255.255.255 port 67 interval 9 (xid=0xde2eb608)
    DHCPDISCOVER on env32 to 255.255.255.255 port 67 interval 18 (xid=0xde2eb608)
    DHCPDISCOVER on env32 to 255.255.255.255 port 67 interval 19 (xid=0xde2eb608)
    DHCPDISCOVER on env32 to 255.255.255.255 port 67 interval 11 (xid=0xde2eb608)
    No DHCPOFFERS received.
    Unable to obtain a lease on first try.  Exiting.

Since cloud-init cannot reach the OpenStack metadata service, it does not have an instance-id so it will assume this boot is a first boot. On first boot, without network configuration from Openstack cloud-init will render a fallback network config, dhcp on best guess interface, which is why you see a change in network configuration.

The core issues are:

  1. OpenStack Nova on Power does not have a way to indicate to the guest that it's running on Openstack; on x86 and arm, this is done via smbios/dmi tables; there is not yet an implementation for Power

  2. When you remove your provided datasource (/dev/srX) you've removed the metadata that indicated cloud-init has already been booted.

As Scott mentioned, to address (2), you can look at configuring the manual-cache-clean option which will configure cloud-init in way as to keep existing metadata changes in place until the user manually cleans out cloud-init metadata.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Eduardo Otubo(otubo) wrote on 2020-10-01T14:18:23.795388+00:00

@ryan, I have a question about:

If there is no datasource, cloud-init does not run. If you've removed your datasource, cloud-init

If cloud-init doesn't run, it shouldn't look for datasources, is that correct?

should no longer activate during boot. In your case, you've removed your datasource so cloud-init
attempts to look for one. On Power architecture, cloud-init cannot detect if it's running on
Openstack without trying to contact the metadata service due to limitations in the Nova/PowerVM.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-01T15:00:22.726801+00:00

@eduardo

@ryan, I have a question about:

If there is no datasource, cloud-init does not run. If you've removed your datasource, cloud-init

If cloud-init doesn't run, it shouldn't look for datasources, is that correct?

cloud-init won't run if ds-identify does not detect a datasource.

On each boot cloud-init's systemd generator calls ds-identify. This program runs to determine whether cloud-init should run or not. The ds-identify default policy is to search, and report that cloud-init should run if:

  1. it finds a datasource
  2. if there might be a datasource

For (1); cloud-init examines specific values on the system, files in a
directory, values in system UUID, etc... these types of checks are binary;
we either have the correct value or we don't.

For (2); in some cases, ds-identify cannot be 100% sure the datasource isn't
present because the platform on which we're running does not provide us with
the needed data.

For POWER systems which do not export DMI values, the OpenStack Datasource
will always return maybe as the only way to know is to bring up networking and
query the OpenStack metadata service.

Contrast this with x86 platform where OpenStack VMs export values in the DMI
table. ds-identify can check the value of the dmi product name and know for
sure whether it's running on OpenStack or not.

ds-identify then writes out its conclusions in /run/cloud-init/cloud.cfg
and enables the cloud-init.target which will run the 4 stages of cloud-init.

For this bug, on first boot, /dev/sr1 included an OpenStack ConfigDrive, so
ds-identify reports:

cat /run/cloud-init/cloud.cfg

datasource_list: [ ConfigDrive, None ]

The configdrive payload includes a network configuration which was applied.
After some time, the iso in /dev/sr1 was removed and the node in question
was rebooted. On the next boot when ds-identify runs and does not find
ConfigDrive in /dev/sr1. When checking for OpenStack datasources ds-identify
will report 'maybe' since the arch is not x86. This results in the
following cloud.cfg

cat /run/cloud-init/cloud.cfg

datasource_list: [ OpenStack, None ]

Now, cloud-init OpenStack datasource starts, and because the system currently
lacks the original ConfigDrive datasource, including the instance-id, this
looks like a brand new boot; and cloud-init will then bring up ephemeral
networking DHCP, attempt to see if there is an OpenStack metadata server
on the network (there is not, see the logs); and it then proceeds to using
DataSourceNone which is really a fallback datasource which tries it's best
to be useful but ultimately isn't what folks really need.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-09T09:27:14.768702+00:00

I have explored a bit on manual_cache_clean: true option suggested, Here are some of the findings.

  1. Some of the capture use case has broken as this requires this manual cleaup on VMs as per documentation mentioned here.
    https://cloudinit.readthedocs.io/en/latest/topics/boot.html#not-present

  2. In case if you have very large number of VMs this becomes overhead to do manual cleanup for all capture use cases

  3. It also has security concerns when ssh-key rotation doesn't happen for capture use case where cleanup has not performed

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-12T13:32:10.136458+00:00

In the above explanation, if I am not mistaken, if we remove Datasource(/dev/sr0) and reboot we should end up with below cloud.cfg configuration with openstack set right? which doesn't seems to be case here.

I am seeing cloud.cfg is intact with [ ConfgDrive, None ]. with this in my opinion dscheck_OpenStack() shouldn't be called at all. when no data source cloudinit should get disabled itself but here why do we see dhcp configuration which is fallback

cat /run/cloud-init/cloud.cfg

datasource_list: [ OpenStack, None ]

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-12T14:49:40.155053+00:00

In the above explanation, if I am not mistaken, if we remove
Datasource(/dev/sr0) and reboot we should end up with below cloud.cfg
configuration with openstack set right? which doesn't seems to be case here.

I am seeing cloud.cfg is intact with [ ConfgDrive, None ].

You're saying you've observed the above where you remove the ISO, reboot the
node, and after a reboot, the contents of /run/cloud-init/cloud.cfg shows
ConfigDrive ?

Can you provide the contents of /run/cloud-init/* and /var/log/cloud-init.log
for this scenario?

Could it be either the ISO was not removed, or this may have come from one of the
instances where you were testing manual_cache_clean?

with this in my opinion dscheck_OpenStack() shouldn't be called at all. when

ds-identify is called every boot. The config files in /run/cloud-init are
ephemeral and are thrown away each boot (/run is tmpfs mount).

no data source cloudinit should get disabled itself but here why do we see
dhcp configuration which is fallback

Recall that since you're not on x86, the OpenStack check cannot return False
as the only way to know for sure that there isn't an OpenStack metadata
service on the network is to try it (or have the image configured to not
check OpenStack)

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-13T04:17:44.438351+00:00

@ryan,

Yes, Above is when we explicitly specify ConfigDrive is the datasource in the /etc/cloud/cloud.cfg,
datasource_list: [ ConfigDrive, None ]. In this case also we are observing the cloudinit setting fallback network in /etc/sysconfig/network-scripts/ifcfg-* . Any reason for this?

Attached is the cloud-init collec-logs
Launchpad attachments: cloudinit-log when we explicitly specify ConfigDrive is the datasource

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-13T15:52:12.702667+00:00

Yes, Above is when we explicitly specify ConfigDrive is the datasource in
the /etc/cloud/cloud.cfg, datasource_list: [ ConfigDrive, None ]. In this
case also we are observing the cloudinit setting fallback network in
/etc/sysconfig/network-scripts/ifcfg-* . Any reason for this?

Yes; when you remove the ConfigDrive cloud-init no longer knows it has a
datasource, though you told cloud-init via the hard-coded
/etc/cloud/cloud.cfg file that there would be a ConfigDrive. With out the
ISO attached to the instance, cloud-init cannot tell it is booting into the
same instance as it was last time.

When booting without the iso, cloud-init tries to find it, but fails and
without the datasource and continues to boot trying to do something useful for
the later stages of boot. In the absense of a Datasource to provide
cloud-init with the network-configuration, cloud-init uses its fallback
network config, which is to DHCP on an interface.

From your logs (thanks!),

All of the boots where /dev/sr0 has the Config drive, we see cloud-init search
for a ConfigDrive and find it on /dev/sr0, like so

2020-05-07 06:41:11,105 - util.py[DEBUG]: Cloud-init v. 19.1 running 'init-local' at Thu, 07 May 2020 06:41:11 +0000. Up 3356.48 seconds.
...
2020-05-07 06:41:11,159 - init.py[DEBUG]: Looking for data source in: ['ConfigDrive', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
2020-05-07 06:41:11,162 - init.py[DEBUG]: Searching for local data source in: ['DataSourceConfigDrive']
2020-05-07 06:41:11,163 - handlers.py[DEBUG]: start: init-local/search-ConfigDrive: searching for local data from DataSourceConfigDrive
2020-05-07 06:41:11,163 - init.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceConfigDrive.DataSourceConfigDrive'>
2020-05-07 06:41:11,163 - init.py[DEBUG]: Update datasource metadata and network config due to events: New instance first boot
2020-05-07 06:41:11,163 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,174 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,177 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/cd0'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,180 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/cd1'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,183 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,199 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,208 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=config-2', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,217 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=CONFIG-2', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-05-07 06:41:11,225 - DataSourceConfigDrive.py[DEBUG]: devices=['/dev/sr0'] dslist=['ConfigDrive', 'None']
...
2020-05-07 06:41:11,231 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/sr0', '/run/cloud-init/tmp/tmpqpukn1me'] with allowed return codes [0] (shell=False, capture=True)
2020-05-07 06:41:11,252 - openstack.py[DEBUG]: Selected version '2018-08-27' from ['2012-08-10', '2013-04-04', '2013-10-17', '2015-10-15', '2016-06-30', '2016-10-06', '2017-02-22', '2018-08-27', 'content', 'latest']
2020-05-07 06:41:11,342 - handlers.py[DEBUG]: finish: init-local/search-ConfigDrive: SUCCESS: found local data from DataSourceConfigDrive

On boots where /dev/sr0 does not contain a ConfigDrive ISO, cloud-init does
not find a ConfigDrive datasource since the input is not present,
Cloud-init then proceeds to see if there may be a network datasourceG

2020-10-13 03:30:14,074 - util.py[DEBUG]: Cloud-init v. 19.4 running 'init-local' at Tue, 13 Oct 2020 03:30:13 +0000. Up 27.25 seconds.
020-10-13 03:30:14,261 - init.py[DEBUG]: Looking for data source in: ['ConfigDrive', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
2020-10-13 03:30:14,266 - init.py[DEBUG]: Searching for local data source in: ['DataSourceConfigDrive']
2020-10-13 03:30:14,266 - handlers.py[DEBUG]: start: init-local/search-ConfigDrive: searching for local data from DataSourceConfigDrive
2020-10-13 03:30:14,267 - init.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceConfigDrive.DataSourceConfigDrive'>
2020-10-13 03:30:14,267 - init.py[DEBUG]: Update datasource metadata and network config due to events: New instance first boot
2020-10-13 03:30:14,268 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,280 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,287 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/cd0'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,295 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/cd1'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,302 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,330 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,361 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=config-2', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,384 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=CONFIG-2', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
2020-10-13 03:30:14,403 - DataSourceConfigDrive.py[DEBUG]: devices=[] dslist=['ConfigDrive', 'None']
2020-10-13 03:30:14,403 - init.py[DEBUG]: Datasource DataSourceConfigDrive [net,ver=None][source=None] not updated for events: New instance first boot
2020-10-13 03:30:14,404 - handlers.py[DEBUG]: finish: init-local/search-ConfigDrive: SUCCESS: no local data found from DataSourceConfigDrive
2020-10-13 03:30:14,405 - main.py[DEBUG]: No local datasource found

As I suggested in your PR; I think re-submitting and addressing the feedback
on the idea of a FallbackDatasource; ie, if you don't find the configured
datasource and there's a previous instance-id present on the system; use
this instead.

#229

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-14T12:26:45.330647+00:00

@ryan,

All though I need to take a look at your suggestion about PR 229: Here is few more findings with ds-identify tool, Let me if this can be accepted if I raise a PR against ds-identity tool itself.

Currently ds-identify returns DS_FOUND on subsequent boot even though config drive (/dev/sr0) is removed. I believe this shouldn't happen (let me know if you think otherwise), Here I am trying to fix this particular behavior for power hardware only. Lets say if do something like this in ds-identify

  1. Detect its power hardware and hypervisor is powerVM
  2. check if /dev/sr0(configdrive) is mountable or not if not return DS_NOTFOUND

with this I hope cloudinit will not configure fallback(dhcp) on powerVM

Thanks

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T14:49:16.425838+00:00

All though I need to take a look at your suggestion about PR
229: Here is few more findings with ds-identify tool, Let me if
this can be accepted if I raise a PR against ds-identity tool
itself.

Currently ds-identify returns DS_FOUND on subsequent boot even
though config drive (/dev/sr0) is removed. I believe this
shouldn't happen (let me know if you think otherwise), Here I

ds-identify will already do this. If you remove your hardcoded
datasource list from /etc/cloud/cloud.cfg;

Note that by default (on Ubuntu at least), the datasource_list
is populated with all potential datasources:

root@g1:# grep datasource_list /etc/cloud/cloud.cfg
root@g1:
# cat /etc/cloud/cloud.cfg.d/90_dpkg.cfg

to update this file, run dpkg-reconfigure cloud-init

datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, None ]

When ds-identify runs, it reads /etc/cloud/cloud.cfg and
/etc/cloud/cloud.cfg.d/*.cfg looking for the value of
datasource_list.

By default, this is, as you see, a long list. For each of these
potential datasources, ds-identify attempts to determine if
the datasource is present.

In your case where you've provided a datasource_list value
with one datasource (ds-identify ignores None) then
it does not do any detection at all; the image has "told"
ds-identify which datasource to use.

Look at your ds-identify.log file you provided:

[up 17.85s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=enabled
/etc/cloud/cloud.cfg set datasource_list: [ ConfigDrive, None ]
WARN: No dmidecode program. Cannot read sys_vendor.
WARN: No dmidecode program. Cannot read chassis_asset_tag.
WARN: No dmidecode program. Cannot read product_name.
WARN: No dmidecode program. Cannot read product_serial.
WARN: No dmidecode program. Cannot read product_uuid.
DMI_PRODUCT_NAME=error
DMI_SYS_VENDOR=error
DMI_PRODUCT_SERIAL=error
DMI_PRODUCT_UUID=error
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=error
FS_LABELS=
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/vmlinuz-4.18.0-193.el8.ppc64le root=/dev/mapper/rhel_p8--9--vios1-root ro crashkernel=auto rd.lvm.lv=rhel_p8-9-vios1/root rd.lvm.lv=rhel_p8-9-vios1/swap biosdevname=0
VIRT=none
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=4.18.0-193.el8.ppc64le
UNAME_KERNEL_VERSION=#1 SMP Fri Mar 27 14:40:12 UTC 2020
UNAME_MACHINE=ppc64le
UNAME_NODENAME=vijayendra10.pok.stglabs.ibm.com
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=ConfigDrive None
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=enabled
pid=810 ppid=787
is_container=false
single entry in datasource_list (ConfigDrive None) use that.
[up 18.17s] returning 0

am trying to fix this particular behavior for power hardware
only. Lets say if do something like this in ds-identify

  1. Detect its power hardware and hypervisor is powerVM
  2. check if /dev/sr0(configdrive) is mountable or not if not
    return DS_NOTFOUND

with this I hope cloudinit will not configure fallback(dhcp) on
powerVM

The current behavior for single-datasource could be changed to
check if the single datasource is present.

I think this would address your case. As a quick test for you
if you update your datasource_list to include one more datasource
that you know isn't present, like NoCloud, then ds-identify won't
exit early and will attempt to see if ConfigDrive or NoCloud are
present; they won't be; and cloud-init would stay disabled.

I think such a PR is worth discussing.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2020-10-14T15:10:59.321734+00:00

The current behavior for single-datasource could be changed to
check if the single datasource is present.

I think this would address your case. As a quick test for you
if you update your datasource_list to include one more datasource
that you know isn't present, like NoCloud, then ds-identify won't
exit early and will attempt to see if ConfigDrive or NoCloud are
present; they won't be; and cloud-init would stay disabled.

It would address this use case, but would break per-boot behavior,
as cloud-init would be disabled.... it would not run any per-boot
functionality.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-14T17:08:00.965069+00:00

@ryan, I tried your suggestion on adding one more datasource NoCloud along with ConfigDrive but I still ended up with cloudinit resetting to fallback(dhcp).

JFYI, I am running on RHEL env.

Launchpad attachments: logs attached

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T18:08:46.906002+00:00

The current behavior for single-datasource could be changed to
check if the single datasource is present.

I think this would address your case. As a quick test for you
if you update your datasource_list to include one more datasource
that you know isn't present, like NoCloud, then ds-identify won't
exit early and will attempt to see if ConfigDrive or NoCloud are
present; they won't be; and cloud-init would stay disabled.

It would address this use case, but would break per-boot behavior,
as cloud-init would be disabled.... it would not run any per-boot
functionality.

Yes; you're quite right. Thanks for pointing that part out.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-14T18:14:29.148260+00:00

@vijayendra

The reason you're getting that behavior is that the ds-identify policy is to enable cloud-init if it does not find anything:

[up 12.54s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=enable
...
No ds found [mode=search, notfound=enabled]. Enabled cloud-init [0]
[up 12.78s] returning 0

The default policy is notfound=disable. Is something changing the default ds-identify policy?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-10-19T13:02:10.740625+00:00

As per Ryan harper suggestion with some config change as below, we may not hit fallback(dhcp) network reset once config drive is removed.

Change 1:
Disable cloudinit when no ds is found:
config file: /etc/cloud/ds-identify.cfg
policy: search,found=all,maybe=all,notfound=disabled

Change 2:
config file: /etc/cloud/cloud.cfg
we may also have to add one more non existing data source like NoCloud to avoid cloudinit early exist
datasource_list: [ ConfigDrive, NoCloud, None ]

Above is just a short term as this will break per boot also once configDrive is removed.

Currently re working on one of the PR: #229 and testing in our power env.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-11-02T05:53:48.931585+00:00

@ryan Harper

Created below PR as suggested.
#647

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user vijayendra radhakrishna(vradhakrishna) wrote on 2020-11-03T04:44:38.634408+00:00

attached cloud-init log for PR: #647
Launchpad attachments: attached cloud-init log for PR: #647

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2021-01-03T04:17:16.453269+00:00

[Expired for cloud-init because there has been no activity for 60 days.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant