Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When deploying custom-ubuntu.tar.gz, the machine stays with "Deploying" even after "Installation complete" #232

Closed
kojiwell opened this issue May 22, 2024 · 6 comments
Labels
bug Something isn't working triaged Triaged to be addressed in a given cycle

Comments

@kojiwell
Copy link

When I make and then deploy a custom-ubuntu.tar.gz to a baremetal machine on our MAAS, the machine somehow stays "Deploying" even after "Installation complete - Node disabled netboot" is logged in the Event log. The machine doesn't respond to ping. And 30 minutes later, the machine will be marked as "Failed deployment."

I checked the machine's status via a remote console; the OS was up; I could log in. However, the /etc/netplan/50-cloud-init.yaml doesn't exist, which is supposed to be generated during the deployment. So, there seems to be an issue on finalizing the deployment process after the installed OS is booted.

Here's the snippet of the Event log around the end of deployment.


Mon, 20 May. 2024 22:36:56 | Node changed status - From 'Deploying' to 'Failed deployment'
Mon, 20 May. 2024 22:36:56 | Marking node failed - Node operation 'Deploying' timed out after 30 minutes.
Mon, 20 May. 2024 22:06:12 | Script result - /tmp/install.log changed status from 'Running' to 'Passed'
Mon, 20 May. 2024 22:06:12 | Rebooting
Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-power_state_change with frequency once-per-instance
Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-final_message with frequency always
Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-ssh_authkey_fingerprints with frequency once-per-instance
Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-install_hotplug with frequency once-per-instance
Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-keys_to_console with frequency once-per-instance
Mon, 20 May. 2024 22:06:10 | Installation complete - Node disabled netboot


Here's the bottom lines in the Installation output. Curtin seems to have finished installation without an issue.

Saving to: ‘/dev/null’

     0K                                                       100%  142K=0s

2024-05-14 05:39:23 (142 KB/s) - ‘/dev/null’ saved [2/2]

curtin: Installation finished.

Here's the versions of related apps.

~# packer --version
Packer v1.10.0

~# snap info maas|grep installed
installed:          3.5.0~rc4-16292-g.18b753d78              (35434) 196MB -

packer-maas# git branch
* main

Here's how I have registered the ubuntu-custom image.

git clone https://github.com/canonical/packer-maas
cd packer-maas/ubuntu
make custom-ubuntu.tar.gz
maas $MAAS_USER boot-resources create name='custom/ubuntu2204' title='custom/ubuntu2204' architecture='amd64/generic' filetype='tgz' base_image='ubuntu/jammy' content@=custom-ubuntu.tar.gz

Thank you for your attention. I will appreciate any comments.

@alexsander-souza
Copy link
Contributor

can you run the following commands in the deployed machine?

  • cloud-init status --format=yaml
  • cloud-init query userdata

Also get the content of /var/log/cloud-init.log and /var/log/cloud-init-output.log

@kojiwell
Copy link
Author

kojiwell commented May 22, 2024

Thank you, @alexsander-souza

Here're the snippets of the outputs you asked, all of which indicate that the machine is unable to communicate with the MAAS.

  • cloud-init status --format=yaml
---
_schema_version: '1'
boot_status_code: enabled-by-generator
datasource: none
detail: DataSourceNone
errors: []
extended_status: degraded done
init:
    errors: []
    finished: 1716416103.0340788
    recoverable_errors: {}
    start: 1716416102.9233449
init-local:
    errors: []
    finished: 1716416099.5163546
    recoverable_errors:
        WARNING:
        - 'Failed posting event: {"name": "init-local/check-cache", "description":
            "attempting to read from cache [trust]", "event_type": "start", "origin":
            "cloudinit", "timestamp": 1716416099.4795318}. This was caused by: HTTPConnectionPool(host=''172.16.40.251'',
            port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb
            (Caused by NewConnectionError(''<urllib3.connection.HTTPConnection object
            at 0x7f4b7c4835b0>: Failed to establish a new connection: [Errno 101]
            Network is unreachable''))'
        - 'Failed posting event: {"name": "init-local/check-cache", "description":
            "restored from cache: DataSourceNone", "event_type": "finish", "origin":
            "cloudinit", "timestamp": 1716416099.482275, "result": "SUCCESS"}. This
            was caused by: HTTPConnectionPool(host=''172.16.40.251'', port=5248):
            Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by
            NewConnectionError(''<urllib3.connection.HTTPConnection object at 0x7f4b7c4813f0>:
            Failed to establish a new connection: [Errno 101] Network is unreachable''))'
    start: 1716416099.4651775
last_update: Wed, 22 May 2024 22:15:06 +0000
 :
 :
  • cloud-init query userdata
#cloud-config
growpart:
  mode: 'off'
locale: en_US.UTF-8
preserve_hostname: true
resize_rootfs: false
ssh_pwauth: true
users:
- gecos: ubuntu
  groups: adm,cdrom,dip,lxd,plugdev,sudo
  lock_passwd: false
  name: ubuntu
  passwd: $6$canonical.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  shell: /bin/bash
  • /var/log/cloud-init.log
 :
 :
2024-05-22 22:15:06,302 - handlers.py[DEBUG]: Queuing POST to http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb, data: {'name': 'modules-final', 'description': 'running modules for final', 'event_type': 'finish', 'origin': 'cloudinit', 'timestamp': 1716416106.3022237, 'result': 'SUCCESS'}
2024-05-22 22:15:06,302 - handlers.py[DEBUG]: WebHookHandler flushing remaining events
2024-05-22 22:15:06,308 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,312 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,317 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,320 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,325 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,329 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,334 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,339 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,342 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,347 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,351 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,355 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,360 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,363 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,368 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,373 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,377 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,380 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,385 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,389 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
2024-05-22 22:15:06,410 - url_helper.py[DEBUG]: Read from http://172.16.40.251:5248/MAAS/metadata/status/ea3yyb (204, 0b) after 1 attempts
  • /var/log/cloud-init-output.log
 :
 :
2024-05-22 22:14:59,498 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "attempting to read from cache [trust]", "event_type": "start", "origin": "cloudinit", "timestamp": 1716416099.4795318}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c4835b0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
2024-05-22 22:14:59,498 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "restored from cache: DataSourceNone", "event_type": "finish", "origin": "cloudinit", "timestamp": 1716416099.482275, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c4813f0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
2024-05-22 22:14:59,517 - handlers.py[WARNING]: Failed posting event: {"name": "init-local", "description": "searching for local datasources", "event_type": "finish", "origin": "cloudinit", "timestamp": 1716416099.516621, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='172.16.40.251', port=5248): Max retries exceeded with url: /MAAS/metadata/status/ea3yyb (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4b7c3a98d0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
 :
 :

Please note that the host '172.16.40.251' is actually reachable because I added the following lines in packer-maas/ubuntu/scripts/install-custom-packages and created the custom image again to become able to ssh to the deployed machine.

cat << EOF > /etc/netplan/00-installer-config.yaml
network:
  ethernets:
    enp7s0f0:
      dhcp4: true
  version: 2
EOF

Here's the ping result.

ubuntu@ubuntu:~$ ping -c 3 172.16.40.251
PING 172.16.40.251 (172.16.40.251) 56(84) bytes of data.
64 bytes from 172.16.40.251: icmp_seq=1 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=2 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=3 ttl=64 time=0.672 ms

--- 172.16.40.251 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2048ms
rtt min/avg/max/mdev = 0.672/0.733/0.764/0.043 ms

This problem doesn't happen with the official Ubuntu image on MAAS, so there are some differences between the official and custom images (e.g. missing required apt packages, etc.)?

Thank you!

@kojiwell
Copy link
Author

kojiwell commented May 23, 2024

By the way, I've also created and registered the custom rocky8 and 9 tar.gz images in the same way, which get deployed without a problem. So Rocky is fine.

This problem seems unique to the custom ubuntu (and maybe debian as well).

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jun 23, 2024
@alexsander-souza
Copy link
Contributor

this was probably fixed by #242, please try again

@github-actions github-actions bot removed the stale label Jun 26, 2024
@alexsander-souza alexsander-souza added bug Something isn't working triaged Triaged to be addressed in a given cycle labels Jul 1, 2024
@kojiwell
Copy link
Author

@alexsander-souza I just tried it again, and it looks like the problem is resolved. Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Triaged to be addressed in a given cycle
Projects
None yet
Development

No branches or pull requests

2 participants