Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On FreeBSD 13 password is not set after initial reboot with "set-passwords, always" on CloudStack Datasource #4231

Closed
loth opened this issue Jul 7, 2023 · 11 comments
Labels
bug Something isn't working correctly

Comments

@loth
Copy link

loth commented Jul 7, 2023

Bug report

If set-passwords, always is configured, it is expected that cloud-init will retrieve the new password from the metadata server on boot and set it, however it seems to set the same password over again without retrieving the new one.

Steps to reproduce the problem

Start FreeBSD 13 instance with cloud-init
Configure cloud-init to use the "root" user and " - [ set-passwords, always ]"
After initial boot, test and save password
Stop instance in cloudstack
Reset password in cloudstack
Start instance
New password given from cloudstack will not work

Environment details

  • Cloud-init version: 23.1.2
  • Operating System Distribution: FreeBSD 13.2
  • Cloud provider, platform or installer type: CloudStack

cloud-init logs

cloud-init.tar.gz

@loth loth added bug Something isn't working correctly new An issue that still needs triage labels Jul 7, 2023
@TheRealFalcon
Copy link
Member

it is expected that cloud-init will retrieve the new password from the metadata server on boot and set it

This is not how the the module is designed to work, so I'm not sure where this expectation came from. This module only sets passwords statically and doesn't involve the IMDS. Documentation for the module is available at https://cloudinit.readthedocs.io/en/latest/reference/modules.html#set-passwords .

Given this, I'm going to close this, but feel free to re-open or comment if I have misunderstood something.

@TheRealFalcon TheRealFalcon closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2023
@TheRealFalcon TheRealFalcon added invalid This doesn't seem right and removed bug Something isn't working correctly new An issue that still needs triage labels Jul 10, 2023
@loth
Copy link
Author

loth commented Jul 10, 2023

This is a bit confusing for me, since the documentation for Cloudstack directly refers to this module setting the passwords every boot: https://docs.cloudstack.apache.org/en/latest/adminguide/templates/_cloud_init.html#linux-with-cloud-init

Also, it clearly sets it on the first boot, so something is taking the password from metadata in the Cloudstack data source.

@TheRealFalcon TheRealFalcon reopened this Jul 10, 2023
@TheRealFalcon
Copy link
Member

@loth , sorry for the misunderstanding and thanks for the additional info. It looks like this is a feature specific to CloudStack and implemented entirely within in the CloudStack datasource.

Does it work if you change set-passwords to set_passwords in /etc/cloud/cloud.cfg? There was a recent change to make all modules use _ rather than -.

If that doesn't work, please attach /var/log/cloud-init.log and/or the output from cloud-init collect-logs to help debugging.

@loth
Copy link
Author

loth commented Jul 10, 2023

Logs are already in the OP, but I will try your suggestion

@loth
Copy link
Author

loth commented Jul 10, 2023

Booted into my existing template with [ set-passwords, always ], logged in with the generated password, changed to [ set_passwords, always ]. Shut down VM, Issued password reset, started VM. Still had the original root password. Log attached.
cloud-init.log

@blackboxsw
Copy link
Collaborator

@loth in the updated cloud-logs I see 2023-07-10 16:32:34,094 - cc_set_passwords.py[DEBUG]: Changing password for ['root']: every boot. But I only see one external request to the send_my_password URL on first boot. So yes I presume that cloud-init is just using the cached original password per boot here and keeps resetting it to the same orig value
2023-07-10 16:30:10,213 - subp.py[DEBUG]: Running command ['wget', '--quiet', '--tries', '3', '--timeout', '20', '--output-document', '-', '--header', 'DomU_Request: send_my_password', '10.135.100.1:8080'] with allowed rerurn codes [0] (shell=False, capture=True)

@TheRealFalcon TheRealFalcon added bug Something isn't working correctly and removed invalid This doesn't seem right labels Jul 11, 2023
@loth
Copy link
Author

loth commented Jul 26, 2023

I've been digging into this for a couple days now and found a few things.

  1. FreeBSD is actually reading the cache data on every reboot after the first as we can see from the following in the log:
2023-07-24 23:56:11,539 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2023-07-24 23:56:11,539 - util.py[DEBUG]: Read 6944 bytes from /var/lib/cloud/instance/obj.pkl
2023-07-24 23:56:11,558 - util.py[DEBUG]: Reading from /run/cloud-init/.instance-id (quiet=False)
2023-07-24 23:56:11,558 - util.py[DEBUG]: Read 37 bytes from /run/cloud-init/.instance-id
2023-07-24 23:56:11,558 - stages.py[DEBUG]: restored from cache with run check: DataSourceCloudStack
2023-07-24 23:56:11,558 - handlers.py[DEBUG]: finish: init-network/check-cache: SUCCESS: restored from cache with run check: DataSourceCloudStack
  1. Ubuntu (my test case for password resets working) is actually not reading the cache properly, and is running firstboot executions every reboot as we can see in the log:
2023-07-25 00:13:33,646 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2023-07-25 00:13:33,647 - util.py[DEBUG]: Read 6094 bytes from /var/lib/cloud/instance/obj.pkl
2023-07-25 00:13:33,675 - stages.py[DEBUG]: cache invalid in datasource: DataSourceCloudStack
2023-07-25 00:13:33,676 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceCloudStack
2023-07-25 00:13:33,676 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance
2023-07-25 00:13:33,677 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>

I have found the reason for (2) is because of two things:
1. /run/cloud-init/.instance-id doesn't actually exist at the time of the conditional on Ubuntu, and it does on BSD for some reason. (I checked this by adding a simple directory listing at the time of execution)
2. The CloudStack datasource doesn't implement check_instance_id, so it inherits the one from __init__.py and just returns false and thus the conditional if hasattr(ds, "check_instance_id") and ds.check_instance_id( self.cfg ): always returns with invalid cache

So basically, the password reset function that has been working for me is actually due to the firstboot scripts running every time because it cannot read the existence of the cache. FreeBSD can read the cache and it works properly but no longer runs the password get functions.

@igalic
Copy link
Collaborator

igalic commented Jul 26, 2023

So this is partially related to #4180

@loth
Copy link
Author

loth commented Jul 26, 2023

So this is partially related to #4180

I'd say so, changing to the ephemeral run directory would make it more aligned to the Linux way.

I've confirmed setting the path to /var/run/cloud-init in helpers.py will have it behave as linux does, this is after changing the path and rebooting:

2023-07-26 21:11:03,616 - handlers.py[DEBUG]: start: init-network/check-cache: attempting to read from cache [trust]
2023-07-26 21:11:03,617 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2023-07-26 21:11:03,617 - stages.py[DEBUG]: no cache found
2023-07-26 21:11:03,617 - handlers.py[DEBUG]: finish: init-network/check-cache: SUCCESS: no cache found
2023-07-26 21:11:03,617 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance
2023-07-26 21:11:03,619 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.freebsd.Distro'>
2023-07-26 21:11:03,620 - subp.py[DEBUG]: Running command ['ifconfig', '-a'] with allowed return codes [0] (shell=False, capture=True)
2023-07-26 21:11:03,626 - __init__.py[DEBUG]: Looking for data source in: ['CloudStack'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
2023-07-26 21:11:03,630 - __init__.py[DEBUG]: Searching for network data source in: ['DataSourceCloudStack']
2023-07-26 21:11:03,630 - handlers.py[DEBUG]: start: init-network/search-CloudStack: searching for network data from DataSourceCloudStack
2023-07-26 21:11:03,630 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceCloudStack.DataSourceCloudStack'>
2023-07-26 21:11:03,635 - DataSourceCloudStack.py[DEBUG]: Found metadata server '10.135.100.1' via data-server DNS entry
2023-07-26 21:11:03,635 - __init__.py[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance
2023-07-26 21:11:03,636 - __init__.py[DEBUG]: Machine is configured to run on single datasource DataSourceCloudStack.
2023-07-26 21:11:03,637 - url_helper.py[DEBUG]: [0/1] open 'http://10.135.100.1/latest/meta-data/instance-id' with {'url': 'http://10.135.100.1/latest/meta-data/instance-id', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 30.0, 'headers': {'User-Agent': 'Cloud-Init/23.1.2'}} configuration
2023-07-26 21:11:03,654 - url_helper.py[DEBUG]: Read from http://10.135.100.1/latest/meta-data/instance-id (200, 36b) after 1 attempts
2023-07-26 21:11:03,654 - DataSourceCloudStack.py[DEBUG]: Using metadata source: 'http://10.135.100.1/latest/meta-data/instance-id'
2023-07-26 21:11:03,654 - url_helper.py[DEBUG]: [0/6] open 'http://10.135.100.1/latest/user-data' with {'url': 'http://10.135.100.1/latest/user-data', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent': 'Cloud-Init/23.1.2'}} configuration
2023-07-26 21:11:03,658 - url_helper.py[DEBUG]: Read from http://10.135.100.1/latest/user-data (200, 0b) after 1 attempts
2023-07-26 21:11:03,658 - url_helper.py[DEBUG]: [0/6] open 'http://10.135.100.1/latest/meta-data/' with {'url': 'http://10.135.100.1/latest/meta-data/', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 5.0, 'headers': {'User-Agent': 'Cloud-Init/23.1.2'}} configuration
2023-07-26 21:11:03,662 - url_helper.py[DEBUG]: Read from http://10.135.100.1/latest/meta-data/ (200, 157b) after 1 attempts

now whether we should be calling boot-new-instance every time using the CloudStack data source is another question i guess, but this does fix password resets for BSD. I also don't see a problem changing cloud-init to use /var/run instead of /run since /var/run is linked to /run on Linux anyway.

@yaroslav-gwit
Copy link

yaroslav-gwit commented Sep 2, 2023

Can confirm that the above works on FreeBSD 13.2.

With this variable as /run/, if you change the meta-data -> instance-id and reboot, nothing happens:

# /usr/local/lib/python3.9/site-packages/cloud_init-23.3-py3.9.egg/cloudinit/helpers.py
self.run_dir: str = path_cfgs.get("run_dir", "/run/cloud-init")

But once the python helper file is patched with /var/run/ instead of just /run, like so:

#/usr/local/lib/python3.9/site-packages/cloud_init-23.3-py3.9.egg/cloudinit/helpers.py
self.run_dir: str = path_cfgs.get("run_dir", "/var/run/cloud-init")

it works perfectly fine.

Unfortunately it takes 2 reboots in a row, or logging in over the serial console/VNC and restarting the networking service to update the IP address. I guess it has more to do with how static IP and ifconfig functions under the hood in FreeBSD, rather than the Cloud Init itself (unless I am missing something important), but the SSH keys and user passwords are finally being reset when instance-id changes.

@blackboxsw
Copy link
Collaborator

I've been digging into this for a couple days now and found a few things.

  1. FreeBSD is actually reading the cache data on every reboot after the first as we can see from the following in the log:
2023-07-24 23:56:11,539 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2023-07-24 23:56:11,539 - util.py[DEBUG]: Read 6944 bytes from /var/lib/cloud/instance/obj.pkl
2023-07-24 23:56:11,558 - util.py[DEBUG]: Reading from /run/cloud-init/.instance-id (quiet=False)
2023-07-24 23:56:11,558 - util.py[DEBUG]: Read 37 bytes from /run/cloud-init/.instance-id
2023-07-24 23:56:11,558 - stages.py[DEBUG]: restored from cache with run check: DataSourceCloudStack

By default, in Linux, we expect for most datasources to see check_instance_id == False which generates the log message cache invalid in datasource: XXX. This tells the datasource it needs to re-run _get_data() in order to compare the new meta-data['instance-id'] to the previous instance id stored as /var/lib/cloud/data/previous-instance-id in order to determine if all of cloud-init config modules need to re-run as if this were a fresh install. That said, the extra CloudStack._get_data call is what's going to refresh your meta-data on the system and also call the wget handshake to update new passwords.

I think awaiting a Fix for BSD ephemeral treatment of /var/run is the ideal we want here so BSD and Linux are aligned, and so cloud-init isn't seeing stale files in /run/cloud-init that it thinks are fresh representations of metadata.

  1. Ubuntu (my test case for password resets working) is actually not reading the cache properly, and is running firstboot executions every reboot as we can see in the log:

I think we want the "cache invalid" log behavior which triggers _get_data again to refresh your passwords from metadata per boot. But, I' not sure what you mean by running firstboot executions every boot. Do you have the updated cloud-init.log here indicating that? Let's peek at cloud-init.log for "previous iid" logs. I expect we should be seeing only one previous iid found to be NO_PREVIOUS_INSTANCE_ID and all subsequent boots when no passwords were changed, should probably repeat some common UUID: previous iid found to be f753f513-e21b-4f2d-a4cd-b22c94d96aac

I have found the reason for (2) is because of two things: 1. /run/cloud-init/.instance-id doesn't actually exist at the time of the conditional on Ubuntu, and it does on BSD for some reason. (I checked this by adding a simple directory listing at the time of execution) 2. The CloudStack datasource doesn't implement check_instance_id, so it inherits the one from __init__.py and just returns false and thus the conditional if hasattr(ds, "check_instance_id") and ds.check_instance_id( self.cfg ): always returns with invalid cache

the check_instance_id == False is expected as default, it triggers the client VM to refresh the datasource metadata to obtain updated metadata["instance-id"] values to determine if cloud-init needs to treat this as a first boot scenario because instance-id changed or not.

So basically, the password reset function that has been working for me is actually due to the firstboot scripts running every time because it cannot read the existence of the cache. FreeBSD can read the cache and it works properly but no longer runs the password get functions.

I'm understanding it from the log snippets you provided that the _get_data function is called to updated metadata every boot because of the "Invalid cache" value which tells cloud-init "don't trust your local metadata cache, you should _get_data and compare latest metadsata to obtain instance-id information".

It seems like our fix here is to await a fix for #4180 on BSD which should resolve the non-ephemeral nature of /run on BSD by targeting and ephemeral /var/run path instead.

igalic added a commit to igalic/cloud-init that referenced this issue Dec 8, 2023
On *BSD, `/run` is not ephemeral, but the way cloud-init behaves, it
expects it to be.
This is hack is partial fix for canonicalGH-4180 / canonicalGH-4231.
But to be good Unix citizens, we should still make /run relocatable in
the code and in the installer.

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Dec 8, 2023
On *BSD, `/run` is not ephemeral, but the way cloud-init behaves, it
expects it to be.
This is hack is partial fix for canonicalGH-4180 / canonicalGH-4231.
But to be good Unix citizens, we should still make /run relocatable in
the code and in the installer.

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Dec 14, 2023
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Dec 15, 2023
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
Co-authored-by: Brett Holman <brett.holman@canonical.com>
igalic added a commit to igalic/cloud-init that referenced this issue Dec 19, 2023
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
Co-authored-by: Brett Holman <brett.holman@canonical.com>
igalic added a commit to igalic/cloud-init that referenced this issue Jan 4, 2024
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Jan 4, 2024
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Jan 4, 2024
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
igalic added a commit to igalic/cloud-init that referenced this issue Jan 4, 2024
on BSD, /run is not ephemeral.
relocate BSDs config to /var/run

Sponsored by: The FreeBSD Foundation
Fixes canonicalGH-4180
Fixes canonicalGH-4231
@holmanb holmanb closed this as completed in 8937b5e Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

5 participants