Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to launch 22.04 LXD container with cloud-init 23.3.3 #4676

Closed
skatsaounis opened this issue Dec 8, 2023 · 3 comments
Closed

Failure to launch 22.04 LXD container with cloud-init 23.3.3 #4676

skatsaounis opened this issue Dec 8, 2023 · 3 comments
Labels
bug Something isn't working correctly

Comments

@skatsaounis
Copy link

skatsaounis commented Dec 8, 2023

Bug report

During MAAS daily system-tests, we launch a 22.04 LXD container where we install MAAS and perform our tests. Starting from midnight yesterday, the container is failing to launch due to an error on this function: https://github.com/canonical/cloud-init/blob/main/cloudinit/cmd/status.py#L288-L295

Steps to reproduce the problem

Environment details

  • Cloud-init version: 23.3.3-0ubuntu0~22.04.1
  • Operating System Distribution: Ubuntu 22.04
  • Cloud provider, platform or installer type: LXD container

cloud-init logs

maas_container: ┌ lxc launch -q ubuntu:22.04 -e -c 'user.user-data=#cloud-config\n"package_upgrade": !!bool |-\n  true\n"packages":\n- |-\n  dhcp-probe\n"runcmd":\n- - |-\n    netplan\n  - |-\n    apply\n"snap":\n  "commands":\n  - |-\n    snap set system proxy.http="http://xxx" proxy.https="http://xxx"\n  - |-\n    snap refresh snapd\n  - |-\n    snap install maas-test-db --channel=3.3\n  - |-\n    snap install maas --channel=3.3\n"write_files":\n- "content": |-\n    response_wait_time 10\n  "path": |-\n    /etc/dhcp_probe.cf\n- "content": |\n    network:\n      ethernets:\n        maas-ss-pxe:\n          addresses:\n          - xxx.xxx.xxx.xxx/xx\n      version: 2\n  "path": |-\n    /etc/netplan/99-maas-systemtests.yaml\n' -p maas-system-maas maas-system-maas
maas_container: └ ✔
maas_container: Container maas-system-maas created.
maas_container: Waiting for boot to finish...
maas_container: ┌ timeout 2000 cloud-init status --wait --long
maas_container: |................................................................................................................................................................................................................................................................................................................................................................................................................................…
maas_container: |ETraceback (most recent call last):
maas_container: |E  File "/usr/bin/cloud-init", line 33, in <module>
maas_container: |E    sys.exit(load_entry_point('cloud-init==23.3.3', 'console_scripts', 'cloud-init')())
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1080, in main
maas_container: |E    retval = util.log_time(
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2833, in log_time
maas_container: |E    ret = func(*args, **kwargs)
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 126, in handle_status_args
maas_container: |E    details = get_status_details(paths)
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 310, in get_status_details
maas_container: |E    systemd_status = _get_systemd_status()
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 212, in _get_systemd_status
maas_container: |E    stdout = subp.subp(
maas_container: |E  File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
maas_container: |E    raise ProcessExecutionError(
maas_container: |Ecloudinit.subp.ProcessExecutionError: Unexpected error while running command.
maas_container: |ECommand: ['systemctl', 'show', '--property=ActiveState,UnitFileState,SubState,MainPID', 'cloud-final.service']
maas_container: |EExit code: 1
maas_container: |EReason: -
maas_container: |EStdout: 
maas_container: |EStderr: Failed to get properties: Message recipient disconnected from message bus without replying

If we get a shell to the container after the error we receive the following result when running the command manually:

root@maas-system-maas:~# systemctl show --property=ActiveState,UnitFileState,SubState,MainPID cloud-final.service
MainPID=0
ActiveState=active
SubState=exited
UnitFileState=enabled
@skatsaounis skatsaounis added bug Something isn't working correctly new An issue that still needs triage labels Dec 8, 2023
@tsokorai
Copy link

tsokorai commented Dec 8, 2023

Also happens with cloud-init 23.3.1.
From a Terraform-provisioned Openstack 22.04:

null_resource.main[0] (remote-exec): Traceback (most recent call last):
null_resource.main[0] (remote-exec):   File "/usr/bin/cloud-init", line 33, in <module>
null_resource.main[0] (remote-exec):     sys.exit(load_entry_point('cloud-init==23.3.1', 'console_scripts', 'cloud-init')())
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1080, in main
null_resource.main[0] (remote-exec):     retval = util.log_time(
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2833, in log_time
null_resource.main[0] (remote-exec):     ret = func(*args, **kwargs)
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 126, in handle_status_args
null_resource.main[0] (remote-exec):     details = get_status_details(paths)
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 310, in get_status_details
null_resource.main[0] (remote-exec):     systemd_status = _get_systemd_status()
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/cmd/status.py", line 212, in _get_systemd_status
null_resource.main[0] (remote-exec):     stdout = subp.subp(
null_resource.main[0] (remote-exec):   File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
null_resource.main[0] (remote-exec):     raise ProcessExecutionError(
null_resource.main[0] (remote-exec): cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
null_resource.main[0] (remote-exec): Command: ['systemctl', 'show', '--property=ActiveState,UnitFileState,SubState,MainPID', 'cloud-final.service']
null_resource.main[0] (remote-exec): Exit code: 1
null_resource.main[0] (remote-exec): Reason: -
null_resource.main[0] (remote-exec): Stdout:
null_resource.main[0] (remote-exec): Stderr: Failed to get properties: Message recipient disconnected from message bus without replying

Oddly that the same script was working perfectly a few days ago....

@tsokorai
Copy link

tsokorai commented Dec 8, 2023

It seems to be a timing issue between the
cloud-init status --wait
versus systemctl still not being up, because just adding 30s delay before cloud-init status solves it.
Which is why when you log in manually and execute the same systemctl command, it works fine, as the bug report mentions.

@holmanb
Copy link
Member

holmanb commented Dec 9, 2023

Running cloud-init status prior to dbus is pretty early in boot. That command won'twork due to cf474da. Please run that later, until we can find a proper resolution to this issue.

@holmanb holmanb removed the new An issue that still needs triage label Dec 9, 2023
blackboxsw added a commit to blackboxsw/cloud-init that referenced this issue Dec 9, 2023
blackboxsw added a commit to blackboxsw/cloud-init that referenced this issue Dec 9, 2023
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this issue Dec 11, 2023
During `cloud-init status`, we check systemctl to ensure the status
we're reporting is accurate. However, we can get an error from systemctl
if dbus isn't ready yet. This commit will either ignore the error if we
can assume that cloud-init is still running, or retry for a few seconds
before giving up and keeping cloud-init's originally detected status.

Fixes canonicalGH-4676
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this issue Dec 11, 2023
During `cloud-init status`, we check systemctl to ensure the status
we're reporting is accurate. However, we can get an error from systemctl
if dbus isn't ready yet. This commit will either ignore the error if we
can assume that cloud-init is still running, or retry for a few seconds
before giving up and keeping cloud-init's originally detected status.

Fixes canonicalGH-4676
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this issue Dec 11, 2023
During `cloud-init status`, we check systemctl to ensure the status
we're reporting is accurate. However, we can get an error from systemctl
if dbus isn't ready yet. This commit will either ignore the error if we
can assume that cloud-init is still running, or retry for a few seconds
before giving up and keeping cloud-init's originally detected status.

Fixes canonicalGH-4676
aciba90 pushed a commit to aciba90/cloud-init that referenced this issue Dec 14, 2023
fix: Handle systemctl commands when dbus not ready

During `cloud-init status`, we check systemctl to ensure the status
we're reporting is accurate. However, we can get an error from systemctl
if dbus isn't ready yet. This commit will either ignore the error if we
can assume that cloud-init is still running, or retry until we get a
proper response from systemctl.

Fixes canonicalGH-4676
TheRealFalcon added a commit that referenced this issue Dec 14, 2023
fix: Handle systemctl commands when dbus not ready

During `cloud-init status`, we check systemctl to ensure the status
we're reporting is accurate. However, we can get an error from systemctl
if dbus isn't ready yet. This commit will either ignore the error if we
can assume that cloud-init is still running, or retry until we get a
proper response from systemctl.

Fixes GH-4676
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

3 participants