[2.8] Juju fails to connect to instance after juju-clean-shutdown.service timeout in cloud-init #3681

ubuntu-server-builder · 2023-05-12T06:38:51Z

This bug was originally filed in Launchpad as LP: #1878639

Launchpad details

affected_projects = ['juju']
assignee = None
assignee_name = None
date_closed = None
date_created = 2020-05-14T15:47:43.214957+00:00
date_fix_committed = 2020-06-15T10:50:15.003668+00:00
date_fix_released = 2020-06-15T10:50:15.003668+00:00
id = 1878639
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1878639
milestone = None
owner = jason-hobbs
owner_name = Jason Hobbs
private = False
status = incomplete
submitter = genet022
submitter_name = Joshua Genet
tags = ['cdo-qa', 'foundations-engine']
duplicates = []

Launchpad user Joshua Genet(genet022) wrote on 2020-05-14T15:47:43.214957+00:00

AWS does spin up an instance and assigns an IP, but Juju stays stuck in Pending.
There's a bunch of EC2RoleRequest EC2Metadata Errors in the controller logs.

Here's a link to the logs/artifacts:
https://oil-jenkins.canonical.com/artifacts/5e61db53-50f0-4b82-9bb1-957bd0085d46/index.html

ubuntu-server-builder · 2023-05-12T06:38:53Z

Launchpad user Heather Lanigan(hmlanigan) wrote on 2020-05-14T17:06:22.269991+00:00

I spun up a happy aws controller and deployed a unit. Nothing like a k8s config, but I see the same EC2 errors in the /var/log/amazon/ssm files.

ubuntu-server-builder · 2023-05-12T06:38:55Z

Launchpad user Heather Lanigan(hmlanigan) wrote on 2020-05-14T17:22:01.592041+00:00

Machine 18 is the one stuck in pending.

ubuntu-server-builder · 2023-05-12T06:38:57Z

Launchpad user Pen Gale(pengale) wrote on 2020-05-14T17:51:43.028877+00:00

@genet022: can you get us logs from the machine that Juju was having a hard time talking to? Its logs didn't make it into the crash dump, and that's the most interesting machine, from a troubleshooting standpoint.

ubuntu-server-builder · 2023-05-12T06:38:59Z

Launchpad user Joshua Genet(genet022) wrote on 2020-05-14T18:08:51.833369+00:00

@petevg Unfortunately because this was an automated run in our CI, the crashdump is all we have. And like you said, the machine 18 logs are empty.

ubuntu-server-builder · 2023-05-12T06:39:00Z

Launchpad user Tim Penhey(thumper) wrote on 2020-05-14T21:24:33.171312+00:00

Is this a one off? Or is it happening every time?

ubuntu-server-builder · 2023-05-12T06:39:02Z

Launchpad user Tim Penhey(thumper) wrote on 2020-05-14T21:38:11.134968+00:00

Grabbed the logs from the crashdumps. As mentioned by @petevg there is nothing we can use here for diagnosis.

The problem is on the machine that we have no information for. The controller logs show that the machine-18 in the kubernetes model never tried to connect. This normally indicates some networking or cloud-init issue on the started instance.

Without access to the instance that has had the problem, there is nothing we can do.

ubuntu-server-builder · 2023-05-12T06:39:04Z

Launchpad user John George(jog) wrote on 2020-05-15T15:50:03.737691+00:00

We hit something similar on vsphere, and were able to get cloud-init-output.log.
It's available in the artifacts of this run:
https://solutions.qa.canonical.com/#/qa/testRun/0a3705fe-3357-486e-a61f-01abfffe3c58

There is a failure from juju-clean-shutdown.service

/bin/systemctl enable /etc/systemd/system/juju-clean-shutdown.service
Failed to enable unit: Connection timed out
Cloud-init v. 19.4-33-gbb4131a2-0ubuntu118.04.1 running 'modules:final' at Thu, 14 May 2020 16:30:15 +0000. Up 226.09 seconds.
2020-05-14 16:43:15,256 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2020-05-14 16:43:18,676 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2020-05-14 16:43:18,677 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 19.4-33-gbb4131a2-0ubuntu118.04.1 finished at Thu, 14 May 2020 16:43:18 +0000. Datasource DataSourceOVF [seed=iso]. Up 1009.50 seconds

ubuntu-server-builder · 2023-05-12T06:39:06Z

Launchpad user Heather Lanigan(hmlanigan) wrote on 2020-05-15T17:28:37.136713+00:00

With the vsphere config machine-4:

May 14 16:43:18 juju-60990d-4 cloud-init[1562]: + /bin/systemctl enable /etc/systemd/system/juju-clean-shutdown.service
May 14 16:43:18 juju-60990d-4 cloud-init[1562]: Failed to enable unit: Connection timed out

The systectl command failed, causing cloud-init to fail and exit before jujud-machine-4.service cloud be enabled on that machine.

ubuntu-server-builder · 2023-05-12T06:39:08Z

Launchpad user Pen Gale(pengale) wrote on 2020-05-18T14:04:00.347873+00:00

Per convo w/ Juju team, this is likely a systemd bug. Juju can't cleanly do much about it -- if a machine fails to cloud-init, it's never going to get to the point where it can talk to Juju. The "correct" next steps would involve further investigation on the failed machine, and a bug filed against systemd.

That said, there is some investigation that we might do from the Juju end of things. For example, we could queue up the service to be started, rather than blocking on start. This might cause other issues later on in the unit's life cycle, however.

ubuntu-server-builder · 2023-05-12T06:39:10Z

Launchpad user Tim Penhey(thumper) wrote on 2020-05-19T21:18:39.089024+00:00

Isn't this a cloud-init issue? Not a Juju issue?

ubuntu-server-builder · 2023-05-12T06:39:11Z

Launchpad user Pen Gale(pengale) wrote on 2020-05-19T21:43:30.670938+00:00

This is not a regression, and isn't a bug with the Juju service being started. There might be some longer term work to make Juju behave better when a piece of the pipeline fails like this. But this doesn't make sense as a release blocker -- any fixes we did in the release window would be partial, and wouldn't address the underlying bug in cloud-init.

ubuntu-server-builder · 2023-05-12T06:39:13Z

Launchpad user Paride Legovini(paride) wrote on 2020-06-15T10:50:09.173879+00:00

Hi,

I think this is unlikely to be a bug in cloud-init, as the cloud-init failure is a consequence of the failure starting the juju-clean-shutdown service, as noted already. We could get better understanding on what happens on the cloud-init side from the logs tarball generated by running

cloud-init collect-logs

on the failed machine. For the moment I'm marking the cloud-init task as Incomplete.

ubuntu-server-builder · 2023-05-12T06:39:15Z

Launchpad user Joseph Phillips(manadart) wrote on 2020-06-17T11:42:42.107093+00:00

This service is no longer created on machines using systemd.
juju/juju#11717

holmanb · 2024-04-27T07:57:12Z

No longer relevant, and looks like this wasn't a cloud-init issue either. Closing.

ubuntu-server-builder added incomplete Action required by submitter launchpad Migrated from Launchpad labels May 12, 2023

holmanb closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2.8] Juju fails to connect to instance after juju-clean-shutdown.service timeout in cloud-init #3681

[2.8] Juju fails to connect to instance after juju-clean-shutdown.service timeout in cloud-init #3681

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

holmanb commented Apr 27, 2024

[2.8] Juju fails to connect to instance after juju-clean-shutdown.service timeout in cloud-init #3681

[2.8] Juju fails to connect to instance after juju-clean-shutdown.service timeout in cloud-init #3681

Comments

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

ubuntu-server-builder commented May 12, 2023

holmanb commented Apr 27, 2024