Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipass times out to start a VM #3003

Open
gmarik opened this issue Mar 31, 2023 · 38 comments
Open

multipass times out to start a VM #3003

gmarik opened this issue Mar 31, 2023 · 38 comments
Labels

Comments

@gmarik
Copy link

gmarik commented Mar 31, 2023

Describe the bug

$ multipass start -vvv dev
[2023-03-30T21:14:56.327] [debug] [dev] process working dir ''                  
[2023-03-30T21:14:56.327] [info] [dev] process program 'qemu-system-aarch64'    
[2023-03-30T21:14:56.327] [info] [dev] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso, -virtfs, local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59'
[2023-03-30T21:14:56.336] [debug] [qemu-system-aarch64] [2791] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.MQYyKQ
[2023-03-30T21:14:56.396] [info] [dev] process state changed to Starting        
[2023-03-30T21:14:56.398] [info] [dev] process state changed to Running         
[2023-03-30T21:14:56.399] [debug] [qemu-system-aarch64] [2792] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso -virtfs local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59
[2023-03-30T21:14:56.399] [info] [dev] process started                          
start failed: The following errors occurred:                                    
dev: timed out waiting for response

To Reproduce
See the log above

Expected behavior
VM starts

Logs

[2023-03-30T21:12:55.106] [debug] [update] Latest Multipass release available is version 1.11.1
[2023-03-30T21:12:55.138] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-03-30T21:12:55.143] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket
[2023-03-30T21:12:55.145] [debug] [qemu-img] [2755] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/ml/ubuntu-22.0
4-server-cloudimg-arm64.img
[2023-03-30T21:12:55.151] [debug] [qemu-img] [2756] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/test/ubuntu-22
.04-server-cloudimg-arm64.img
[2023-03-30T21:12:55.157] [debug] [qemu-img] [2757] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.
04-server-cloudimg-arm64.img
[2023-03-30T21:12:55.163] [debug] [qemu-img] [2758] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/bitto/ubuntu-2
2.04-server-cloudimg-arm64.img

[2023-03-30T21:12:55.167] [info] [daemon] Starting Multipass 1.11.1+mac
[2023-03-30T21:12:55.167] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug
[2023-03-30T21:14:56.326] [debug] [dev] process working dir ''
[2023-03-30T21:14:56.327] [info] [dev] process program 'qemu-system-aarch64'
[2023-03-30T21:14:56.327] [info] [dev] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/b
in/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e, -device, 
virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format
=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cd
rom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso, -virtfs, local,security_model=passthrough,uid_map=501:1000,gid_m
ap=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59'
[2023-03-30T21:14:56.336] [debug] [qemu-system-aarch64] [2791] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz
/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.MQYyKQ
[2023-03-30T21:14:56.396] [info] [dev] process state changed to Starting
[2023-03-30T21:14:56.398] [info] [dev] process state changed to Running
[2023-03-30T21:14:56.398] [debug] [qemu-system-aarch64] [2792] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:e7:bf:4e -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dev/cloud-init-config.iso -virtfs local,security_model=passthrough,uid_map=501:1000,gid_map=20:1000,path=/Users/mh/src,mount_tag=m2c7cd19d5b7139e8807a37a5c92e59
[2023-03-30T21:14:56.399] [info] [dev] process started
[2023-03-30T21:14:56.399] [debug] [dev] Waiting for SSH to be up
[2023-03-30T21:14:56.739] [debug] [dev] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 7}, "package": ""}, "capabilities": ["oob"]}}

[2023-03-30T21:14:56.774] [debug] [dev] QMP: {"return": {}}

Additional info

  • OS: macOS 13.2.1 (22D68)

  • $ multipass version

      multipass   1.11.1+mac
      multipassd  1.11.1+mac
    
  • multipass info --all

      Name:           dev
      State:          Unknown
      IPv4:           --
      Release:        --
      Image hash:     8593ce1c6bbd (Ubuntu 22.04 LTS)
      CPU(s):         --
      Load:           --
      Disk usage:     --
      Memory usage:   --
      Mounts:         --
    
      Name:           test
      State:          Running
      IPv4:           192.168.64.12
      Release:        Ubuntu 22.04.2 LTS
      Image hash:     f6b5b3a980f2 (Ubuntu 22.04 LTS)
      CPU(s):         1
      Load:           0.02 0.01 0.00
      Disk usage:     1.6GiB out of 4.7GiB
      Memory usage:   150.7MiB out of 962.5MiB
      Mounts:         --
    
  • multipass get local.driver: qemu

Additional context

  • issue started after upgrading from 1.11.0 to 1.11.1

  • The test vm was created after upgrade and it's working withuot issues.

  • Ran repairs as suggested in the issue

      $ qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/dev/ubuntu-22.04-server-cloudimg-arm64.img 
      Leaked cluster 36 refcount=2 reference=1
      ERROR cluster 53431 refcount=0 reference=1
      ERROR cluster 53432 refcount=0 reference=1
      ERROR cluster 53433 refcount=0 reference=1
      Leaked cluster 117161 refcount=1 reference=0
      Rebuilding refcount structure
      Repairing cluster 1 refcount=1 reference=0
      Repairing cluster 2 refcount=1 reference=0
      Repairing cluster 32771 refcount=1 reference=0
      Repairing cluster 65553 refcount=1 reference=0
      Repairing cluster 98317 refcount=1 reference=0
      Repairing OFLAG_COPIED data cluster: l2_entry=d0b70000 refcount=1
      Repairing OFLAG_COPIED data cluster: l2_entry=d0b80000 refcount=1
      Repairing OFLAG_COPIED data cluster: l2_entry=d10e0000 refcount=1
      Repairing OFLAG_COPIED data cluster: l2_entry=d10f0000 refcount=1
      Repairing OFLAG_COPIED L2 cluster: l1_index=120 l1_entry=240000 refcount=1
      Repairing OFLAG_COPIED L2 cluster: l1_index=121 l1_entry=15bf20000 refcount=1
      …
      …
      The following inconsistencies were found and repaired:
    
          18 leaked clusters
          156 corruptions
    
      Double checking the fixed image now...
      No errors were found on the image.
      125127/983040 = 12.73% allocated, 17.86% fragmented, 8.78% compressed clusters
    
@gmarik gmarik added the bug label Mar 31, 2023
@holta
Copy link

holta commented Mar 31, 2023

Similar problematic behavior is reasonably common, even with the most mainline starting of Ubuntu 22.04 instances (VM's) on an Ubuntu 22.04 host PC.

Repeatedly rebooting the host PC is the only solution I've found so far.

Painful.

Context: this again occurred with the Edge channel version of Multipass, in the last week.

@holta
Copy link

holta commented Mar 31, 2023

Similar problematic behavior is reasonably common, even with the most mainline starting of Ubuntu 22.04 instances (VM's) on an Ubuntu 22.04 host PC.

CLARIF:

The guest VM (Ubuntu 22.04 instance) was severely stuck in my case, having partially but not fully booted.

multipass info showed the insurance's internal IP address but no others. It was impossible to shell into the instance using multipass shell <instance name>

(In the end it took 2 full reboots of the host PC to finally get past the problem...)

@EinBexiii
Copy link

Have the same issue some solutions?

@holta
Copy link

holta commented Apr 14, 2023

If you're using LXD with Multipass, the problems would appear to remain serious:

@fansch
Copy link

fansch commented Aug 29, 2023

Hi,
I think I have the same problem.
Firewall is not active.

Is there one more idea?

Thanks :-).

$ sw_vers
ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90
$ multipass version
multipass   1.12.2+mac
multipassd  1.12.2+mac
$ multipass list
Name                    State             IPv4             Image
primary                 Stopped           --               Ubuntu 22.04 LTS
$ multipass info --all
Name:           primary
State:          Stopped
IPv4:           --
Release:        --
Image hash:     9c59c6097711 (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --
$ sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/Ubuntu-22.04-server-cloudimg-arm64.img
No errors were found on the image.
24470/81920 = 29.87% allocated, 84.56% fragmented, 82.13% compressed clusters
Image end offset: 732495872
$ multipass start -vvvv
[2023-08-29T22:38:13.598] [debug] [primary] process working dir ''
[2023-08-29T22:38:13.598] [info] [primary] process program 'qemu-system-aarch64'
[2023-08-29T22:38:13.598] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:d3:87:76, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-08-29T22:38:13.604] [debug] [qemu-system-aarch64] [5889] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.DrjiOg
[2023-08-29T22:38:13.665] [info] [primary] process state changed to Starting
[2023-08-29T22:38:13.667] [info] [primary] process state changed to Running
[2023-08-29T22:38:13.667] [debug] [qemu-system-aarch64] [5890] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:d3:87:76 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-08-29T22:38:13.667] [info] [primary] process started
start failed: The following errors occurred:
primary: timed out waiting for response

@ricab
Copy link
Collaborator

ricab commented Aug 30, 2023

Hi @fansch, it looks like the VM is not getting an IP. Have you seen our troubleshooting docs? There might be something helpful for you there.

Otherwise, one thing that I have found sometimes helps in such cases is to reset DHCP leases:

  1. If there is nothing you don't mind loosing in the VM, just delete it with multipass delete -p primary
  2. Otherwise, we need to at least stop it. You can try multipass stop.
    1. That sometimes hangs when the VM is in unknown state. If that happens and the VM doesn't stop, you need to kill the QEMU process (pid 5890 in your case): kill -SIGKILL 5890. Unfortunately that may cause corruption to your VM, so deleting right away is preferable if you can.
  3. Follow this. If you have any leases that you don't want to loose, you can instead edit /var/db/dhcpd_leases and remove any entries pertaining to primary.
  4. restart the multipass daemon, for good measure:
    1. sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
    2. sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist
  5. Try to launch/start the VM again (depending on whether you deleted or stopped it)

Let us know how it goes for you!

@ricab
Copy link
Collaborator

ricab commented Aug 30, 2023

We did some changes to improve this part of the starting sequence, in PR #3114, but they were not yet released.

@fansch
Copy link

fansch commented Sep 3, 2023

Thanks for your message.
I took a look ont the troubleshooting docs: firewall is disabled, there is no VPN, Little Snitch is not installed, Internet Sharing is disabled.
About the bootpd DHCP server:

$ sudo lsof -iUDP:67 -n -P
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd   1 root   20u  IPv4 0xff412dbbfaed0889      0t0  UDP *:67
launchd   1 root   33u  IPv4 0xff412dbbfaed0889      0t0  UDP *:67

Otherwise, one thing that I have found sometimes helps in such cases is to reset DHCP leases:

  1. If there is nothing you don't mind loosing in the VM, just delete it with multipass delete -p primary

I have deleted all my VM several times, even reinstalled multipass… It is installed via brew, but I have tested via the downloaded installer, and it is the same.

Otherwise, we need to at least stop it. You can try multipass stop.

  1. That sometimes hangs when the VM is in unknown state. If that happens and the VM doesn't stop, you need to kill the QEMU process (pid 5890 in your case): kill -SIGKILL 5890. Unfortunately that may cause corruption to your VM, so deleting right away is preferable if you can.

Effectively it helps stopping the VM… VM deleted after doing it.

  1. Follow this. If you have any leases that you don't want to loose, you can instead edit /var/db/dhcpd_leases and remove any entries pertaining to primary.

File emptied

  1. restart the multipass daemon, for good measure:

    1. sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
    2. sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

Done

  1. Try to launch/start the VM again (depending on whether you deleted or stopped it)
    Let us know how it goes for you!
$ multipass list
No instances found.
$ multipass start -vvvvvv
[2023-09-04T00:29:20.998] [trace] [url downloader] Found https://codeload.github.com/canonical/multipass-blueprints/zip/refs/heads/main in cache: true
[2023-09-04T00:29:21.003] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-04T00:29:21.006] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "docker" v1
[2023-09-04T00:29:21.008] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-04T00:29:21.014] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-04T00:29:21.144] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/index.json in cache: false
[2023-09-04T00:29:21.187] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json in cache: true
[2023-09-04T00:29:21.329] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/index.json in cache: false
[2023-09-04T00:29:21.331] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/com.ubuntu.cloud:daily:download.json in cache: true
[2023-09-04T00:29:21.333] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/index.json in cache: true
[2023-09-04T00:29:21.334] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/com.ubuntu.appliances:released:download.json in cache: true
[2023-09-04T00:29:21.334] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-04T00:29:21.337] [debug] [qemu-system-aarch64] [10680] started: qemu-system-aarch64 --version
[2023-09-04T00:29:21.420] [debug] [qemu-img] [10681] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.434] [debug] [qemu-img] [10682] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-04T00:29:21.442] [debug] [qemu-img] [10683] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.446] [debug] [primary] process working dir ''
[2023-09-04T00:29:21.446] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-04T00:29:21.446] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-04T00:29:21.449] [debug] [qemu-system-aarch64] [10684] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.jnBXjP
[2023-09-04T00:29:21.486] [info] [primary] process state changed to Starting
[2023-09-04T00:29:21.488] [info] [primary] process state changed to Running
[2023-09-04T00:29:21.489] [debug] [qemu-system-aarch64] [10685] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-04T00:29:21.489] [info] [primary] process started
Starting primary |

And stops with the message: primary: timed out waiting for response

The log:

[2023-09-04T00:29:21.003] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-04T00:29:21.005] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "docker" v1
[2023-09-04T00:29:21.007] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-04T00:29:21.010] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-04T00:29:21.014] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-04T00:29:21.334] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-04T00:29:21.337] [debug] [qemu-system-aarch64] [10680] started: qemu-system-aarch64 --version
[2023-09-04T00:29:21.420] [debug] [qemu-img] [10681] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.434] [debug] [qemu-img] [10682] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-04T00:29:21.442] [debug] [qemu-img] [10683] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-04T00:29:21.446] [debug] [primary] process working dir ''
[2023-09-04T00:29:21.446] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-04T00:29:21.446] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-04T00:29:21.449] [debug] [qemu-system-aarch64] [10684] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.jnBXjP
[2023-09-04T00:29:21.486] [info] [primary] process state changed to Starting
[2023-09-04T00:29:21.488] [info] [primary] process state changed to Running
[2023-09-04T00:29:21.488] [debug] [qemu-system-aarch64] [10685] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:00:47:ae -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-04T00:29:21.489] [info] [primary] process started
[2023-09-04T00:29:21.489] [debug] [primary] Waiting for SSH to be up
[2023-09-04T00:29:21.636] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-09-04T00:29:21.741] [debug] [primary] QMP: {"return": {}}

[2023-09-04T00:29:37.367] [debug] [primary] QMP: {"timestamp": {"seconds": 1693780177, "microseconds": 367878}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}

[2023-09-04T00:30:21.840] [debug] [primary] QMP: {"timestamp": {"seconds": 1693780221, "microseconds": 840743}, "event": "RTC_CHANGE", "data": {"offset": 1, "qom-path": "/machine/unattached/device[4]"}}

@ricab
Copy link
Collaborator

ricab commented Sep 4, 2023

Hmm, someone reports the same issue in here. I wonder if there is something to that security update after all.

@fansch would you be able to try with this package and paste the output of multipass launch -vvvv?

@fansch
Copy link

fansch commented Sep 4, 2023

Sorry, but the link opens a 404.

@ricab
Copy link
Collaborator

ricab commented Sep 4, 2023

Sorry, my bad. I have updated the link.

@fansch
Copy link

fansch commented Sep 5, 2023

Hi,
Here is the result.
If you need more information, tell me.

$ multipass version
multipass   1.13.0-dev.370.ci5835+ge20ef2f1.mac
multipassd  1.13.0-dev.370.ci5835+ge20ef2f1.mac
$ multipass list
No instances found.
$ multipass start -vvvvvv
[2023-09-05T08:09:31.024] [debug] [qemu-system-aarch64] [61764] started: qemu-system-aarch64 --version
[2023-09-05T08:09:31.046] [debug] [qemu-img] [61765] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T08:09:31.061] [debug] [qemu-img] [61766] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-05T08:09:31.071] [debug] [qemu-img] [61767] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T08:09:31.076] [debug] [primary] process working dir ''
[2023-09-05T08:09:31.076] [info] [primary] process program 'qemu-system-aarch64'
[2023-09-05T08:09:31.076] [info] [primary] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-09-05T08:09:31.079] [debug] [qemu-system-aarch64] [61768] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.abMOMe
[2023-09-05T08:09:31.117] [info] [primary] process state changed to Starting
[2023-09-05T08:09:31.119] [info] [primary] process state changed to Running
[2023-09-05T08:09:31.119] [debug] [qemu-system-aarch64] [61769] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-05T08:09:31.119] [info] [primary] process started
[2023-09-05T08:09:31.120] [debug] [primary] Waiting for SSH to be up
launch failed: The following errors occurred:
primary: timed out waiting for response
sudo tail -f multipassd.log
[2023-09-05T08:09:31.119] [info] [primary] process state changed to Running
[2023-09-05T08:09:31.119] [debug] [qemu-system-aarch64] [61769] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:0f:5d:ec -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-09-05T08:09:31.119] [info] [primary] process started
[2023-09-05T08:09:31.120] [debug] [primary] Waiting for SSH to be up
[2023-09-05T08:09:31.158] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-09-05T08:09:31.175] [debug] [primary] QMP: {"return": {}}

[2023-09-05T08:09:44.141] [debug] [primary] QMP: {"timestamp": {"seconds": 1693894184, "microseconds": 141376}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}

@ricab
Copy link
Collaborator

ricab commented Sep 5, 2023

Interesting, no trace after that "waiting for SSH" message. That excludes some possible scenarios 🤔 I guess we still need to add more trace logging around and within this portion of code to try to figure this out once and for all.

In the meantime, @fansch, do you see an entry for primary in /var/db/dhcpd_leases after that timeout? Are you able to launch another instance, with a random name?

@fansch
Copy link

fansch commented Sep 5, 2023

$ multipass launch -vvvv
[2023-09-05T13:51:05.286] [trace] [url downloader] Found https://codeload.github.com/canonical/multipass-blueprints/zip/refs/heads/main in cache: true
[2023-09-05T13:51:05.295] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-09-05T13:51:05.296] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-09-05T13:51:05.297] [debug] [blueprint provider] Loading "docker" v1
[2023-09-05T13:51:05.297] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-09-05T13:51:05.298] [debug] [blueprint provider] Loading "minikube" v1
[2023-09-05T13:51:05.299] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-09-05T13:51:05.299] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-09-05T13:51:05.546] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/index.json in cache: false
[2023-09-05T13:51:05.576] [trace] [url downloader] Found https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json in cache: true
[2023-09-05T13:51:05.754] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/index.json in cache: false
[2023-09-05T13:51:05.910] [trace] [url downloader] Found https://cloud-images.ubuntu.com/buildd/daily/streams/v1/com.ubuntu.cloud:daily:download.json in cache: false
[2023-09-05T13:51:05.924] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/index.json in cache: true
[2023-09-05T13:51:05.925] [trace] [url downloader] Found https://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/com.ubuntu.appliances:released:download.json in cache: true
[2023-09-05T13:51:05.926] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-09-05T13:51:05.934] [debug] [qemu-system-aarch64] [63541] started: qemu-system-aarch64 --version
[2023-09-05T13:51:06.114] [debug] [qemu-img] [63542] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20230828/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T13:51:06.136] [debug] [qemu-img] [63543] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-09-05T13:51:06.155] [debug] [qemu-img] [63544] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img
[2023-09-05T13:51:06.161] [debug] [dazzled-cub] process working dir ''
[2023-09-05T13:51:06.161] [info] [dazzled-cub] process program 'qemu-system-aarch64'
[2023-09-05T13:51:06.161] [info] [dazzled-cub] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:bf:db:60, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/cloud-init-config.iso'
[2023-09-05T13:51:06.164] [debug] [qemu-system-aarch64] [63545] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.dSqTxg
[2023-09-05T13:51:06.201] [info] [dazzled-cub] process state changed to Starting
[2023-09-05T13:51:06.204] [info] [dazzled-cub] process state changed to Running
[2023-09-05T13:51:06.204] [debug] [qemu-system-aarch64] [63546] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:bf:db:60 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dazzled-cub/cloud-init-config.iso
[2023-09-05T13:51:06.204] [info] [dazzled-cub] process started
launch failed: The following errors occurred:
dazzled-cub: timed out waiting for response

About the file /var/db/dhcpd_leases it does not exist.

So I went back to the troubleshoot page. And here are the DHCP things:

$ sudo lsof -iUDP:67 -n -P
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd   1 root   20u  IPv4 0xff412dbbfaed7089      0t0  UDP *:67
launchd   1 root   33u  IPv4 0xff412dbbfaed7089      0t0  UDP *:67
$ sudo launchctl load -w /System/Library/LaunchDaemons/bootps.plist
/System/Library/LaunchDaemons/bootps.plist: service already loaded
Load failed: 17: File exists
$ sudo launchctl unload /System/Library/LaunchDaemons/bootps.plist
$ sudo launchctl load -w /System/Library/LaunchDaemons/bootps.plist
$ sudo ls -ald /var/db/dhcp*
drwx------  6 root  wheel  192 Sep  4 11:36 dhcpclient

And then again multipass launch -vvvv
but no dhcpd_leases file created.

@ricab
Copy link
Collaborator

ricab commented Sep 5, 2023

I don't know, for some reason it doesn't want to hand out IPs. Since you don't have the firewall active, I don't know what could be messing it up. Have you tried just rebooting? How about toggling internet sharing? No VPN either? No Little Snitch?

@fansch
Copy link

fansch commented Sep 5, 2023

Many many thanks for all the time taken.
In between, I have a good news: it works, event with firewall activated.
I cannot tell you what exactly helped, but here are some infos.
Simple rebooting did not help. Not really surprized, as I am used to shut down the laptop while going myself to sleep :-).
I have then made a full cleanup with the free software OnyX: it cleans all the system caches, cleans the ARP caches, and so on. A well known tool.
And since everything is OK.
I have much more lines with this:

$ sudo lsof -iUDP:67 -n -P
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
launchd    1 root   48u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   51u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   52u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
launchd    1 root   53u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    0u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    1u  IPv4 0x85ae9608287143c1      0t0  UDP *:67
bootpd  6341 root    2u  IPv4 0x85ae9608287143c1      0t0  UDP *:67

If you need still more info in the before/after.
Again many thanks.

@ricab
Copy link
Collaborator

ricab commented Sep 5, 2023

Hey, great to know you got it working @fansch. Thanks for the tip regarding Onyx.

@percygrunwald
Copy link

percygrunwald commented Sep 26, 2023

Wow, surprisingly OnyX worked for me... I was having the same problem with instances hanging on starting in the CLI and showing Unknown.

OnyX > Maintenance, with the following options (the defaults for me when I opened the OnyX appfor the first time).

Verifying
Structure of the file system [x]
Delete APFS snapshots [x]

Rebuilding
Launch Services database [x]

Cleaning
System [x]
Applications [x]
Internet [x]
Other [x]

OnyX automatically reboots the system. After rebooting the system following the OnyX run, I was able to launch instances successfully (random name bionic, random name focal, static name bionic with --cpus and --mount).

OnyX: 4.4.4 for macOS Ventura 13
Multipass(d): 1.12.2+mac (installed from downloaded .dmg)
Mac:
  Model Name:	MacBook Pro
  Model Identifier:	Mac14,10
  Model Number:	MNW83LL/A
  Chip:	Apple M2 Pro
  System Version:	macOS 13.6 (22G120)
  Kernel Version:	Darwin 22.6.0
  Boot Volume:	Macintosh HD

Not exactly sure what OnyX changed/removed that got this working. Hope this helps users struggling with this issue or the devs to find what exactly is blocking the start.

@nnarhinen
Copy link

Not sure what happeed afer upgrading but now I'm experiencing this on all my old instances..

current version:

multipass   1.12.2+mac
multipassd  1.12.2+mac
% multipass -vvvv start codesense
[2023-10-05T14:40:43.565] [debug] [codesense] process working dir ''
[2023-10-05T14:40:43.566] [info] [codesense] process program 'qemu-system-aarch64'
[2023-10-05T14:40:43.566] [info] [codesense] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:cc:4e:25, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 8192M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/cloud-init-config.iso'
[2023-10-05T14:40:43.594] [debug] [qemu-system-aarch64] [13670] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.XUiGUT
[2023-10-05T14:40:44.273] [info] [codesense] process state changed to Starting
[2023-10-05T14:40:44.276] [info] [codesense] process state changed to Running
[2023-10-05T14:40:44.276] [debug] [qemu-system-aarch64] [13671] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:cc:4e:25 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/codesense/cloud-init-config.iso
[2023-10-05T14:40:44.276] [info] [codesense] process started
start failed: The following errors occurred:
codesense: timed out waiting for response

@upngo
Copy link

upngo commented Oct 16, 2023

Am getting this issue on sonoma (and had on ventura as well) on multipass 1.13.0-dev.298 (has the increased ssh time outs) and on 1.12.2

I periodically get this issue and go through countless variations of deleting vms, reloading daemons, reinstalling multipass, deleting dhcpd_leases and restarting my computer before it decides it's happy again...

Even with firewall disabled it's always the same issue where it's waiting for ssh to be up and the last line has
NIC_RX_FILTER_CHANGED then eventually times out..

Not properly closing your VM before turning your computer off (or running out of power) is a good way to break all your VMs (and can't create new ones either).

Here's one of my current outputs at trying to launch a new box

[2023-10-16T14:44:21.028] [debug] [blueprint provider] Loading "anbox-cloud-appliance" v1
[2023-10-16T14:44:21.029] [debug] [blueprint provider] Loading "charm-dev" v1
[2023-10-16T14:44:21.030] [debug] [blueprint provider] Loading "docker" v1
[2023-10-16T14:44:21.031] [debug] [blueprint provider] Loading "jellyfin" v1
[2023-10-16T14:44:21.031] [debug] [blueprint provider] Loading "minikube" v1
[2023-10-16T14:44:21.032] [debug] [blueprint provider] Loading "ros-noetic" v1
[2023-10-16T14:44:21.032] [debug] [blueprint provider] Loading "ros2-humble" v1
[2023-10-16T14:44:23.728] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-10-16T14:44:23.734] [debug] [qemu-system-aarch64] [1570] started: qemu-system-aarch64 --version
[2023-10-16T14:44:23.762] [debug] [qemu-img] [1571] started: qemu-img info /var/root/Library/Caches/multipassd/qemu/vault/images/jammy-20231010/ubuntu-22.04-server-cloudimg-arm64.img
[2023-10-16T14:44:23.777] [debug] [qemu-img] [1572] started: qemu-img resize /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img 5368709120
[2023-10-16T14:44:23.786] [debug] [qemu-img] [1573] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img
[2023-10-16T14:44:23.791] [debug] [promoting-bedbug] process working dir ''
[2023-10-16T14:44:23.791] [info] [promoting-bedbug] process program 'qemu-system-aarch64'
[2023-10-16T14:44:23.791] [info] [promoting-bedbug] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:96:b6:b3, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 1, -m, 1024M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/cloud-init-config.iso'
[2023-10-16T14:44:23.792] [debug] [qemu-system-aarch64] [1574] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.IDUJOh
[2023-10-16T14:44:23.823] [info] [promoting-bedbug] process state changed to Starting
[2023-10-16T14:44:23.824] [info] [promoting-bedbug] process state changed to Running
[2023-10-16T14:44:23.824] [debug] [qemu-system-aarch64] [1575] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:96:b6:b3 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/promoting-bedbug/cloud-init-config.iso
[2023-10-16T14:44:23.824] [info] [promoting-bedbug] process started
[2023-10-16T14:44:23.825] [debug] [promoting-bedbug] Waiting for SSH to be up
[2023-10-16T14:44:23.854] [debug] [promoting-bedbug] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}

[2023-10-16T14:44:23.870] [debug] [promoting-bedbug] QMP: {"return": {}}

[2023-10-16T14:44:37.065] [debug] [promoting-bedbug] QMP: {"timestamp": {"seconds": 1697420677, "microseconds": 65787}, "event": "NIC_RX_FILTER_CHANGED", "data": {"path": "/machine/unattached/device[6]/virtio-backend"}}

@upngo
Copy link

upngo commented Oct 16, 2023

After a number of reinstalls and restarts I got a new setup working.

@percygrunwald
Copy link

percygrunwald commented Oct 25, 2023

My VM got into this same stuck state again and this process worked for me this time (might have been luck too):

  1. Stop and delete/purge all Multipass VMs
  2. rm /var/db/dhcpd_leases
  3. Reboot

Following the reboot the VM came up fine.

@bschick
Copy link

bschick commented Jan 11, 2024

Think I'm seeing a similar issue. I have a number of VMs that were created > 2 months ago and none will fully start. Symtoms are similar to previous comments...

% multipass start -vvvvvv angular
[2024-01-10T20:53:29.681] [debug] [angular] process working dir ''
[2024-01-10T20:53:29.682] [info] [angular] process program 'qemu-system-aarch64'
[2024-01-10T20:53:29.682] [info] [angular] process arguments '-machine, virt,gic-version=3, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, host, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 2, -m, 2048M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso'
[2024-01-10T20:53:29.693] [debug] [qemu-system-aarch64] [8377] started: qemu-system-aarch64 -machine virt,gic-version=3 -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.wzWGig
[2024-01-10T20:53:29.752] [info] [angular] process state changed to Starting
[2024-01-10T20:53:29.754] [info] [angular] process state changed to Running
[2024-01-10T20:53:29.755] [debug] [qemu-system-aarch64] [8378] started: qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso
[2024-01-10T20:53:29.755] [info] [angular] process started
Starting angular
start failed: The following errors occurred:
angular: timed out waiting for response

The qemu process starts, but an IP address is never assigned and the command times out waiting for ssh. Multipass log ends with:

[2024-01-10T21:00:35.152] [debug] [fervent-euglena] Waiting for SSH to be up
[2024-01-10T21:00:35.199] [debug] [fervent-euglena] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}}^M

[2024-01-10T21:00:35.220] [debug] [fervent-euglena] QMP: {"return": {}}^M

I noticed that machine_type for all was virt-7.1 until I tried to start running them today. Now they have virt-8.0.

I've tried to uninstall multipass and reinstall from the package and and then from homebrew. Tried Onyx maintenance and deleting /var/db/dhcpd_leases and restarting (no IP address shows up in the lease file on a failed start btw).

Also I am able to launch new VM just fine. Its just starting VMs that have existed for > 2 months that doesn't start.

% multipass version
multipass 1.13.0+mac
multipassd 1.13.0+mac

Apple M1 Pro
macOS 14.2.1 (23C71

@ricab
Copy link
Collaborator

ricab commented Jan 11, 2024

Hi @bschick, do you see an even with NIC_RX_FILTER_CHANGED in the logs? I don't see it in the lines you pasted, so your VMs may not even be getting to the point of requesting an IP.

Were they in a suspended state before you tried to start them? If so, does it help if you run sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img snapshot -d suspend "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img" and try to start again?

@bschick
Copy link

bschick commented Jan 12, 2024

No NIC_RX_FILTER_CHANGED present in the logs and /var/db/dhcpd_leases is never created.

Important detail I just noticed (a bit late). I had 3 VMs defined on my system. Two of them were left running from > 2 months back. From then until now I've applied OS patches, suspended and restarted my laptop etc. When I looked at their state this week the 'angular' VM state was "stopped" and the other one was "suspended". Neither of them will start. The 3rd VM, which was gracefully stopped >2 months ago now starts just fine. So the problem could be caused by some non-graceful termination or OS patch with running VMs

I tried the "-d suspend" command you provide on the VM that was previously listed as suspended and the output was "qemu-img: Could not delete snapshot 'suspend': snapshot not found". Likely since I stopped the VM and killed the qemu processes already.

@ricab
Copy link
Collaborator

ricab commented Jan 12, 2024

Hi @bschick, OK, so you got rid of the suspension state but the VM didn't start. Could you try to run the following command after making sure the corresponding VM is stopped in Multipass?

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application\ Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -cdrom /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso

Do you see a window popping up? Does that window eventually get to a login prompt?

@bschick
Copy link

bschick commented Jan 14, 2024

Thanks for continuing to followup. Verified VM was in stopped state. Your command launched qemu-system-aarch64 with a terminal windows that contains one line that says "parallel0 console" and a flashing cursor. Nothing else

The window where I ran the command has:

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 8}, "package": ""}, "capabilities": ["oob"]}} 2024-01-13 21:27:01.784 qemu-system-aarch64[49110:2414358] This process won't be able to run remote lookup service

Waited 30 minutes and both seem to be stuck. No login prompt.

@ricab
Copy link
Collaborator

ricab commented Jan 15, 2024

Hi @bschick, I was afraid of something like that. It looks like the image is messed up and the VM can no longer boot ☹️ Did you by any chance upgrade directly from 1.11 to 1.13? Or were those VMs last used in 1.11 when you tried them in 1.13?

Where did say that you saw virt-8.0 instead of virt-7.1 again? Does it help if you run the same command but with -machine virt-7.1?

QEMU was updated to 8.0 in Multipass 1.12. We found then that some users had trouble resuming from suspension snapshots created earlier. Some people were able to get past that by deleting the suspension snapshot, which is what I suggested. Unfortunately, that wasn't enough for your case. The way things are today, when suspending VMs, Multipass discards the running state of VMs after capturing the suspension snapshot, which may leave them in an corrupted or inconsistent state if they are not resumed from the snapshot. This could be what's happening to you. We're now considering a different mechanism, that would attempt to shutdown the VM in the background without making the user wait for that.

I am not very hopeful, but one other thing you could try is sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img", then try the command above again. Let me know what you find!

@bschick
Copy link

bschick commented Jan 16, 2024

Hi @bschick, I was afraid of something like that. It looks like the image is messed up and the VM can no longer boot ☹️ Did you by any chance upgrade directly from 1.11 to 1.13? Or were those VMs last used in 1.11 when you tried them in 1.13?

I hit the problems while running 1.12 and upgrade to 1.13 hoping it might solve the problem.

Where did say that you saw virt-8.0 instead of virt-7.1 again? Does it help if you run the same command but with -machine virt-7.1?

In the value for "machine_type" in /var/root/Library/Application\ Support/multipassd/qemu/multipassd-vm-instances.json. I noticed that changed from 7.1 to 8.0 when I tried to start them with 1.12 Running the qemu-system-aarch64 command with that flag didn't change anything.

QEMU was updated to 8.0 in Multipass 1.12. We found then that some users had trouble resuming from suspension snapshots created earlier. Some people were able to get past that by deleting the suspension snapshot, which is what I suggested. Unfortunately, that wasn't enough for your case. The way things are today, when suspending VMs, Multipass discards the running state of VMs after capturing the suspension snapshot, which may leave them in an corrupted or inconsistent state if they are not resumed from the snapshot. This could be what's happening to you. We're now considering a different mechanism, that would attempt to shutdown the VM in the background without making the user wait for that.

Keep in mind that I have multiple VMs with this problem. I only saw one of then reported to be suspended, but its possible they were both suspended at one point and that multipass reset state during a failed start (and I just never noticed the suspended state).

I am not very hopeful, but one other thing you could try is sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-img check -r all "/var/root/Library/Application Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img", then try the command above again. Let me know what you find!

I did an image check while troubleshooting this on my own. It did find many things to correct but exited without an error. When I run it now, all looks fine. Trying the qemu-system-aarch64 command again results in the same hang.

No errors were found on the image.
117316/163840 = 71.60% allocated, 17.82% fragmented, 5.24% compressed clusters
Image end offset: 8930787328

@ricab
Copy link
Collaborator

ricab commented Jan 17, 2024

Hmm, I see @bschick. Were you ever able to run the VMs that now fail in 1.12, or was it after upgrading from 1.11 to 1.12 that those problems started?

@ricab
Copy link
Collaborator

ricab commented Jan 17, 2024

Hi @bschick, it looks like that parallel0 thing is a red herring. On my x86 mac, the analogous to the command line I gave you brings up a VGA screen that eventually gets to the login console. However, someone with an M1 tells me they don't see that, so it either doesn't open or isn't in focus by default with qemu-system-aarch64 for some reason. Could you try the following command instead?

sudo /Library/Application\ Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application\ Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:88:16:d9 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 2048M -qmp stdio -cdrom /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/angular/cloud-init-config.iso

Do you now see a boot sequence that eventually gets to a login prompt? Otherwise, please try to select "serial0" from the "View" menu. Let me know what you find!

@bschick
Copy link

bschick commented Jan 18, 2024

Hmm, I see @bschick. Were you ever able to run the VMs that now fail in 1.12, or was it after upgrading from 1.11 to 1.12 that those problems started?

Were working then tried under 1.12 months later and they never started successfully again. Have not tried to revert to 1.11

Do you now see a boot sequence that eventually gets to a login prompt? Otherwise, please try to select "serial0" from the "View" menu. Let me know what you find!

Yes, looking at serial0 shows the boot sequence. Looks like the idea of disk corruption is correct. Haven't yet figure out how to dump the output text so took screenshots.

Stage 1: Something flashed by that is too fast to capture

Stage 2: See attached "stage2" image
stage2

Stage 3: See attached "stage3" image (doesn't get past Btrfs loaded)
stage3

@ricab
Copy link
Collaborator

ricab commented Jan 18, 2024

Hmm, OK. Looking at a successful instance, the ext4 filesystem in sda1 is mounted shortly after that last btrfs line. What about the other VMs, do you see them failing in the exact same way @bschick?

QEMU might be unable to mount the filesystem because it is corrupted, but I wouldn't expect multiple images to get corrupted the same way. They could be corrupted in different ways that all prevent the filesystem from being mounted, but still... It feels like a systemic problem. Have you tried using virt-7.1 now that you have the correct command line? If that doesn't work, can you please try reverting to 1.11?

@bschick
Copy link

bschick commented Jan 23, 2024

Yes, I tried with -machine virt-7.1,gic-version=3 and it did not make a difference. The other VM now causes multipassd to hang so I've removed it.

But I now have a newer VM created in the past week that is in the same state. It was working fine, then this morning I started getting EINVAL disk read errors, now when I try to restart the VM is hangs right after the "Btrfs loaded" line as the previous VM.

Anything you'd like me to check before I revert to 1.11?

@bschick
Copy link

bschick commented Jan 24, 2024

I reverted to 1.11.1 and it cannot start any of the broken VMs either. My suspicious is that the VMs end up in a bad state when left running and my laptop sleeps. I don't remember that happening in the past so perhaps 1.12 or 1.13 changed something to make that more likely or perhaps something has changed on my laptop.

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

@ricab
Copy link
Collaborator

ricab commented Jan 24, 2024

Hi @bschick, thanks for sticking with us.

But I now have a newer VM created in the past week that is in the same state. It was working fine, then this morning I started getting EINVAL disk read errors, now when I try to restart the VM is hangs right after the "Btrfs loaded" line as the previous VM.

Intriguing. I don't suppose you're running out of disk space?

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

OK, that would be useful to know. If you could identify when a VM stops working and send us the log, that would be great too.

@bschick
Copy link

bschick commented Jan 26, 2024

Intriguing. I don't suppose you're running out of disk space?

Not out of disk space.

I will create yet another new VM and see if it ends up corrupted while running in version 1.11.1.

OK, that would be useful to know. If you could identify when a VM stops working and send us the log, that would be great too.

No problems so far under 1.11, but last time it took a week. If another VM does get into a bad state, which logs do you want? /Library/Logs/Multipass didn't have much info last time.

Would it help you to have the most recent corrupted VM to test locally? The only confidential data on it is a private key used for git signing that I could replace on github.

@ricab
Copy link
Collaborator

ricab commented Jan 30, 2024

Hi @bschick

No problems so far under 1.11, but last time it took a week. If another VM does get into a bad state, which logs do you want? /Library/Logs/Multipass didn't have much info last time.

There should be a multipassd.log in there. That's the one I'm after.

Would it help you to have the most recent corrupted VM to test locally? The only confidential data on it is a private key used for git signing that I could replace on github.

No need for the time being. Let us have a look at logs first. If that is still inconclusive, then we may ask for an image. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants