Existing instances suddenly stopped working #2931

OleksandrKvl · 2023-02-19T13:00:52Z

Describe the bug
I have two Ubuntu 22 instances (primary and work) and both of them just stopped working. I did several things prior to that but unfortunately can't remember the order and am not sure they are related. One thing was mount command on my primary instance, command itself executed without a problem but I didn't see actual folder being mapped (so I unmounted it later). Second thing was Multipass update to 1.11.1. When I executed installer, Multipass was running those 2 instances, update was OK.
Then, strange things began. While VM was still working, I couldn't run htop on it with an error cannot execute binary file: Exec format error. Then, after a while, my macOS crashed. Now I can't start any of my VMs, in the Activity Monitor I see that the process named qemu-system-arch64 is using 100% but VM is definitely not up because "Open shell" shows only Starting primary. It's interesting that I still can create and use new instance (test) but my old ones are not working.

What I've tried so far without success:

install previous Multipass version
macOS restart
update macOS from Monterey to Ventura
switching driver to VirtualBox failed:

alex@alexs-MacBook-Pro ~ % sudo multipass set local.driver=virtualbox
set failed: Invalid setting 'local.driver=virtualbox': Invalid driver

running uninstall.sh without VM removing with further reinstallation

To summarize, here's the current behavior:

start primary
qemu process eats 100% CPU
start fails after timeout
qemu process still exists but doesn't eat CPU now
in the menu bar, VM state is Unknown
try to stop primary
nothing happens, qemu process never dies (you'll crash in the log because I manually terminated it after a while)

Logs

[2023-02-19T13:57:29.963] [debug] [update] Latest Multipass release available is version 1.11.1
[2023-02-19T13:57:31.332] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-02-19T13:57:31.356] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket
[2023-02-19T13:57:31.369] [debug] [qemu-img] [7142] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/work/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.861] [info] [sshfs-mount-handler] initializing mount /Users/alex/Downloads => /home/ubuntu/host_downloads in 'work'
[2023-02-19T13:57:31.865] [debug] [qemu-img] [7144] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/test/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.879] [debug] [qemu-img] [7145] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.884] [warning] [Qt] QIODevice::write (QFile, "/var/root/Library/Caches/multipassd/qemu/vault/multipassd-image-records.json"): device not open
[2023-02-19T13:57:31.884] [info] [daemon] Starting Multipass 1.11.1+mac
[2023-02-19T13:57:31.884] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug
[2023-02-19T13:57:52.983] [debug] [primary] process working dir ''
[2023-02-19T13:57:52.983] [info] [primary] process program 'qemu-system-aarch64'
[2023-02-19T13:57:52.983] [info] [primary] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:31:66:61, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 8, -m, 8192M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-02-19T13:57:53.006] [debug] [qemu-system-aarch64] [7171] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.PUycAu
[2023-02-19T13:57:53.806] [info] [primary] process state changed to Starting
[2023-02-19T13:57:53.808] [info] [primary] process state changed to Running
[2023-02-19T13:57:53.808] [debug] [qemu-system-aarch64] [7173] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:31:66:61 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 8 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-02-19T13:57:53.808] [info] [primary] process started
[2023-02-19T13:57:53.809] [debug] [primary] Waiting for SSH to be up
[2023-02-19T13:57:54.227] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 7}, "package": ""}, "capabilities": ["oob"]}}

[2023-02-19T13:57:54.314] [debug] [primary] QMP: {"return": {}}

[2023-02-19T14:11:00.991] [info] [daemon] Cannot open ssh session on "primary" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.3'
[2023-02-19T14:11:00.993] [debug] [primary] QMP: {"return": {}}

[2023-02-19T14:11:00.993] [debug] [primary] QMP: {"timestamp": {"seconds": 1676808660, "microseconds": 993520}, "event": "POWERDOWN"}

[2023-02-19T14:11:00.993] [info] [primary] VM powering down
[2023-02-19T14:28:42.735] [error] [primary] process error occurred Crashed program: qemu-system-aarch64; error: Process crashed
[2023-02-19T14:28:42.737] [info] [primary] process state changed to NotRunning
[2023-02-19T14:28:42.737] [error] [primary] error: program: qemu-system-aarch64; error: Process crashed

Additional info

OS: macOS Ventura 13.2.1
multipass version: 1.11.1+mac
multipass info --all

Name:           primary
State:          Stopped
IPv4:           --
Release:        --
Image hash:     9620f479bd5a (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --

Name:           test
State:          Stopped
IPv4:           --
Release:        --
Image hash:     d044311b6e2d (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --

Name:           work
State:          Stopped
IPv4:           --
Release:        --
Image hash:     61b29e585d5b (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         /Users/alex/Downloads => /home/ubuntu/host_downloads
                    UID map: 501:default
                    GID map: 20:default

multipass get local.driver: qemu

Is there anything I can do to at least access the data from my VMs?

The text was updated successfully, but these errors were encountered:

OleksandrKvl · 2023-02-19T15:35:41Z

I also tried to manually replace test image file with primary one and got the same behavior so I think there's something wrong with the image itself. However, I don't understand how 2 images can become broken at the same time. Maybe you can suggest any tool I can use to extract/covert that image file to something readable?

luis4a0 · 2023-02-20T12:15:02Z

Hi @OleksandrKvl! Let us understand your problem first. Are you running on a Mac with Apple Silicon processor, right? The VirtualBox driver works only on Intel.

I don't think this is an issue of image corruption. I'd try first with unloading Multipass, killing the QEMU processes running and starting it again:

sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
sudo killall -9 qemu-system-aarch64
sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

Do you have the same issue after this?

OleksandrKvl · 2023-02-20T14:35:37Z

Yes, I'm on Apple Silicon, thanks for letting me know about VirtualBox. Maybe it makes sense to update documentation because it says that VirtualBox is an alternative without mentioning specific architecture:

By default, Multipass on macOS uses hyperkit driver for the Intel macOS and the qemu driver for the M1 macOS. However, an alternative option is to use VirtualBox.

Yes, got the same result after those commands. If problem is Multipass itself, I'd expect all instances to become not usable. In my case I can create and use new instance but not the old ones.

luis4a0 · 2023-02-20T14:46:10Z

Maybe it makes sense to update documentation because it says that VirtualBox is an alternative without mentioning specific architecture

Absolutely, we must update the documentation.

About the bug, maybe it's indeed a corruption when Multipass was updating. Can you please run first

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img check /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

and send us the output?

Thanks!

OleksandrKvl · 2023-02-20T14:59:05Z

The output is quite big, neither github nor pastebin accepts it as is so I put it on godbolt: https://godbolt.org/z/K3M1ecGcM

luis4a0 · 2023-02-20T15:21:58Z

Ok, so maybe the image is corrupted but that doesn't imply you lost data. You can try to fix it by using -r all:

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

and, following that,

echo $?

to see the exit code of the run. Can you please do it and share both output with us?

Thanks!

OleksandrKvl · 2023-02-20T15:46:37Z

Here it is: https://godbolt.org/z/s9bE8G65T. Looks like everything is repaired but still primary and work instances don't work.

luis4a0 · 2023-02-20T17:13:34Z

Ok, let's see if there is some snapshot in the image:

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img snapshot -l all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

OleksandrKvl · 2023-02-20T18:56:56Z

Nope, both VMs have no snapshots

luis4a0 · 2023-02-21T11:00:00Z

Ok, that's unfortunate. If the instances don't work, the last option would be to recover the data. For this, the image files need to be mounted on another instance (you told me the new instances are working, so you can launch a new one for this, or use an already running one). The steps for doing this would be:

copy the image file (for instance, /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img) to a folder of your property with sudo, and change the ownership so you can access the file;
mount in the new instance that folder;
shell into the new instance, and run the following:

sudo apt install qemu-utils
sudo modprobe nbd max_part=8
sudo qemu-nbd --connect=/dev/nbd0 image.qcow2
sudo mount /dev/nbd0p1 /some_new_mount_folder

(Of course, replacing the paths in the example with yours). And you'll have the contents of the corrupted image file on /some_new_mount_folder. Good luck!

Let us know how it goes. Thanks!

OleksandrKvl · 2023-02-21T13:26:33Z

@luis4a0 thanks for you help! I tried those commands and here's what I got:

# is it correct? you used .qcow2 extension
sudo qemu-nbd --connect=/dev/nbd0 ~/img_mount/ubuntu-22.04-server-cloudimg-arm64.img

# got an error wiht nbd0p1
sudo mount /dev/nbd0p1 /home/ubuntu/vm_image
mount: /home/ubuntu/vm_image: special device /dev/nbd0p1 does not exist.

# and with nbd0 used in qemu-nbd command
sudo mount /dev/nbd0 /home/ubuntu/vm_image
mount: /home/ubuntu/vm_image: wrong fs type, bad option, bad superblock on /dev/nbd0, missing codepage or helper program, or other error.

luis4a0 · 2023-02-21T13:28:24Z

Ok, can you try on the instance sudo fdisk -l /dev/nbd0? This will display the partitions on the image.

luis4a0 · 2023-02-21T13:29:01Z

And yes, it was my mistake: using .img extension is correct.

OleksandrKvl · 2023-02-21T13:32:53Z

$ sudo fdisk -l /dev/nbd0
Disk /dev/nbd0: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Looks like there are no partitions?

luis4a0 · 2023-02-21T13:44:41Z

Indeed, there are no more partitions 😢

The only thing left to do is to use a rescue program like testdisk (install with sudo apt install testdisk). I'm afraid I can't help you more in your particular case.

OTOH, we can't tell when the image file was corrupted, if this was an upgrade thing or other reason.

Good luck with recovering!

OleksandrKvl · 2023-02-21T13:52:14Z

@luis4a0 thank you again, will try my luck with testdisk. I think the issue can be closed as there's nothing particular to investigate, feel free to reopen it if you want.

OleksandrKvl added the bug label Feb 19, 2023

OleksandrKvl closed this as completed Feb 21, 2023

ricab mentioned this issue Mar 1, 2024

Mac M1 machine cannot start the container. timed out waiting for response #3411

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing instances suddenly stopped working #2931

Existing instances suddenly stopped working #2931

OleksandrKvl commented Feb 19, 2023 •

edited

Loading

OleksandrKvl commented Feb 19, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

Existing instances suddenly stopped working #2931

Existing instances suddenly stopped working #2931

Comments

OleksandrKvl commented Feb 19, 2023 • edited Loading

OleksandrKvl commented Feb 19, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 20, 2023

OleksandrKvl commented Feb 20, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

luis4a0 commented Feb 21, 2023

OleksandrKvl commented Feb 21, 2023

OleksandrKvl commented Feb 19, 2023 •

edited

Loading