Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing instances suddenly stopped working #2931

Closed
OleksandrKvl opened this issue Feb 19, 2023 · 16 comments
Closed

Existing instances suddenly stopped working #2931

OleksandrKvl opened this issue Feb 19, 2023 · 16 comments
Labels

Comments

@OleksandrKvl
Copy link

OleksandrKvl commented Feb 19, 2023

Describe the bug
I have two Ubuntu 22 instances (primary and work) and both of them just stopped working. I did several things prior to that but unfortunately can't remember the order and am not sure they are related. One thing was mount command on my primary instance, command itself executed without a problem but I didn't see actual folder being mapped (so I unmounted it later). Second thing was Multipass update to 1.11.1. When I executed installer, Multipass was running those 2 instances, update was OK.
Then, strange things began. While VM was still working, I couldn't run htop on it with an error cannot execute binary file: Exec format error. Then, after a while, my macOS crashed. Now I can't start any of my VMs, in the Activity Monitor I see that the process named qemu-system-arch64 is using 100% but VM is definitely not up because "Open shell" shows only Starting primary. It's interesting that I still can create and use new instance (test) but my old ones are not working.

What I've tried so far without success:

  • install previous Multipass version
  • macOS restart
  • update macOS from Monterey to Ventura
  • switching driver to VirtualBox failed:
alex@alexs-MacBook-Pro ~ % sudo multipass set local.driver=virtualbox
set failed: Invalid setting 'local.driver=virtualbox': Invalid driver
  • running uninstall.sh without VM removing with further reinstallation

To summarize, here's the current behavior:

  • start primary
  • qemu process eats 100% CPU
  • start fails after timeout
  • qemu process still exists but doesn't eat CPU now
  • in the menu bar, VM state is Unknown
  • try to stop primary
  • nothing happens, qemu process never dies (you'll crash in the log because I manually terminated it after a while)

Logs

[2023-02-19T13:57:29.963] [debug] [update] Latest Multipass release available is version 1.11.1
[2023-02-19T13:57:31.332] [info] [VMImageHost] Did not find any supported products in "appliance"
[2023-02-19T13:57:31.356] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket
[2023-02-19T13:57:31.369] [debug] [qemu-img] [7142] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/work/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.861] [info] [sshfs-mount-handler] initializing mount /Users/alex/Downloads => /home/ubuntu/host_downloads in 'work'
[2023-02-19T13:57:31.865] [debug] [qemu-img] [7144] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/test/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.879] [debug] [qemu-img] [7145] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img
[2023-02-19T13:57:31.884] [warning] [Qt] QIODevice::write (QFile, "/var/root/Library/Caches/multipassd/qemu/vault/multipassd-image-records.json"): device not open
[2023-02-19T13:57:31.884] [info] [daemon] Starting Multipass 1.11.1+mac
[2023-02-19T13:57:31.884] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug
[2023-02-19T13:57:52.983] [debug] [primary] process working dir ''
[2023-02-19T13:57:52.983] [info] [primary] process program 'qemu-system-aarch64'
[2023-02-19T13:57:52.983] [info] [primary] process arguments '-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -cpu, cortex-a72, -nic, vmnet-shared,model=virtio-net-pci,mac=52:54:00:31:66:61, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 8, -m, 8192M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso'
[2023-02-19T13:57:53.006] [debug] [qemu-system-aarch64] [7171] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.PUycAu
[2023-02-19T13:57:53.806] [info] [primary] process state changed to Starting
[2023-02-19T13:57:53.808] [info] [primary] process state changed to Running
[2023-02-19T13:57:53.808] [debug] [qemu-system-aarch64] [7173] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu cortex-a72 -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:31:66:61 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 8 -m 8192M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso
[2023-02-19T13:57:53.808] [info] [primary] process started
[2023-02-19T13:57:53.809] [debug] [primary] Waiting for SSH to be up
[2023-02-19T13:57:54.227] [debug] [primary] QMP: {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 7}, "package": ""}, "capabilities": ["oob"]}}

[2023-02-19T13:57:54.314] [debug] [primary] QMP: {"return": {}}

[2023-02-19T14:11:00.991] [info] [daemon] Cannot open ssh session on "primary" shutdown: ssh connection failed: 'Timeout connecting to 192.168.64.3'
[2023-02-19T14:11:00.993] [debug] [primary] QMP: {"return": {}}

[2023-02-19T14:11:00.993] [debug] [primary] QMP: {"timestamp": {"seconds": 1676808660, "microseconds": 993520}, "event": "POWERDOWN"}

[2023-02-19T14:11:00.993] [info] [primary] VM powering down
[2023-02-19T14:28:42.735] [error] [primary] process error occurred Crashed program: qemu-system-aarch64; error: Process crashed
[2023-02-19T14:28:42.737] [info] [primary] process state changed to NotRunning
[2023-02-19T14:28:42.737] [error] [primary] error: program: qemu-system-aarch64; error: Process crashed

Additional info

  • OS: macOS Ventura 13.2.1
  • multipass version: 1.11.1+mac
  • multipass info --all
Name:           primary
State:          Stopped
IPv4:           --
Release:        --
Image hash:     9620f479bd5a (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --

Name:           test
State:          Stopped
IPv4:           --
Release:        --
Image hash:     d044311b6e2d (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         --

Name:           work
State:          Stopped
IPv4:           --
Release:        --
Image hash:     61b29e585d5b (Ubuntu 22.04 LTS)
CPU(s):         --
Load:           --
Disk usage:     --
Memory usage:   --
Mounts:         /Users/alex/Downloads => /home/ubuntu/host_downloads
                    UID map: 501:default
                    GID map: 20:default
  • multipass get local.driver: qemu

Is there anything I can do to at least access the data from my VMs?

@OleksandrKvl
Copy link
Author

I also tried to manually replace test image file with primary one and got the same behavior so I think there's something wrong with the image itself. However, I don't understand how 2 images can become broken at the same time. Maybe you can suggest any tool I can use to extract/covert that image file to something readable?

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 20, 2023

Hi @OleksandrKvl! Let us understand your problem first. Are you running on a Mac with Apple Silicon processor, right? The VirtualBox driver works only on Intel.

I don't think this is an issue of image corruption. I'd try first with unloading Multipass, killing the QEMU processes running and starting it again:

sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
sudo killall -9 qemu-system-aarch64
sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

Do you have the same issue after this?

@OleksandrKvl
Copy link
Author

Yes, I'm on Apple Silicon, thanks for letting me know about VirtualBox. Maybe it makes sense to update documentation because it says that VirtualBox is an alternative without mentioning specific architecture:

By default, Multipass on macOS uses hyperkit driver for the Intel macOS and the qemu driver for the M1 macOS. However, an alternative option is to use VirtualBox.

Yes, got the same result after those commands. If problem is Multipass itself, I'd expect all instances to become not usable. In my case I can create and use new instance but not the old ones.

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 20, 2023

Maybe it makes sense to update documentation because it says that VirtualBox is an alternative without mentioning specific architecture

Absolutely, we must update the documentation.

About the bug, maybe it's indeed a corruption when Multipass was updating. Can you please run first

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img check /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

and send us the output?

Thanks!

@OleksandrKvl
Copy link
Author

The output is quite big, neither github nor pastebin accepts it as is so I put it on godbolt: https://godbolt.org/z/K3M1ecGcM

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 20, 2023

Ok, so maybe the image is corrupted but that doesn't imply you lost data. You can try to fix it by using -r all:

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img check -r all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

and, following that,

echo $?

to see the exit code of the run. Can you please do it and share both output with us?

Thanks!

@OleksandrKvl
Copy link
Author

Here it is: https://godbolt.org/z/s9bE8G65T. Looks like everything is repaired but still primary and work instances don't work.

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 20, 2023

Ok, let's see if there is some snapshot in the image:

/Library/Application\ Support\com.canonical.multipass/bin/qemu-img snapshot -l all /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img

@OleksandrKvl
Copy link
Author

Nope, both VMs have no snapshots

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 21, 2023

Ok, that's unfortunate. If the instances don't work, the last option would be to recover the data. For this, the image files need to be mounted on another instance (you told me the new instances are working, so you can launch a new one for this, or use an already running one). The steps for doing this would be:

  • copy the image file (for instance, /var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img) to a folder of your property with sudo, and change the ownership so you can access the file;
  • mount in the new instance that folder;
  • shell into the new instance, and run the following:
sudo apt install qemu-utils
sudo modprobe nbd max_part=8
sudo qemu-nbd --connect=/dev/nbd0 image.qcow2
sudo mount /dev/nbd0p1 /some_new_mount_folder

(Of course, replacing the paths in the example with yours). And you'll have the contents of the corrupted image file on /some_new_mount_folder. Good luck!

Let us know how it goes. Thanks!

@OleksandrKvl
Copy link
Author

@luis4a0 thanks for you help! I tried those commands and here's what I got:

# is it correct? you used .qcow2 extension
sudo qemu-nbd --connect=/dev/nbd0 ~/img_mount/ubuntu-22.04-server-cloudimg-arm64.img

# got an error wiht nbd0p1
sudo mount /dev/nbd0p1 /home/ubuntu/vm_image
mount: /home/ubuntu/vm_image: special device /dev/nbd0p1 does not exist.

# and with nbd0 used in qemu-nbd command
sudo mount /dev/nbd0 /home/ubuntu/vm_image
mount: /home/ubuntu/vm_image: wrong fs type, bad option, bad superblock on /dev/nbd0, missing codepage or helper program, or other error.

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 21, 2023

Ok, can you try on the instance sudo fdisk -l /dev/nbd0? This will display the partitions on the image.

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 21, 2023

And yes, it was my mistake: using .img extension is correct.

@OleksandrKvl
Copy link
Author

$ sudo fdisk -l /dev/nbd0
Disk /dev/nbd0: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Looks like there are no partitions?

@luis4a0
Copy link
Contributor

luis4a0 commented Feb 21, 2023

Indeed, there are no more partitions 😢

The only thing left to do is to use a rescue program like testdisk (install with sudo apt install testdisk). I'm afraid I can't help you more in your particular case.

OTOH, we can't tell when the image file was corrupted, if this was an upgrade thing or other reason.

Good luck with recovering!

@OleksandrKvl
Copy link
Author

@luis4a0 thank you again, will try my luck with testdisk. I think the issue can be closed as there's nothing particular to investigate, feel free to reopen it if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants