Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Big Sur] VMs turn to "Starting" state and won't budge #1839

Closed
flying-sausages opened this issue Nov 16, 2020 · 14 comments
Closed

[Big Sur] VMs turn to "Starting" state and won't budge #1839

flying-sausages opened this issue Nov 16, 2020 · 14 comments
Labels

Comments

@flying-sausages
Copy link

flying-sausages commented Nov 16, 2020

Describe the bug
When I come back to my machine in the morning, my VMs are in a "Starting" state and I cannot get them to shut down, start up, suspend or anything. Deleting and purging works but i'd prefer not to have to wait for my environment to setup every time this happens.

To Reproduce
How, and what happened?

  1. Start some VMs with multipass
  2. Let some time pass while not using them, maybe put host to sleep
  3. Come back, see all previously running machines are now in "Starting" state

Expected behavior
Not having to kill/delete+purge instances

Logs
Please provide logs from the daemon, see accessing logs on where to find them on your platform.

[2020-11-16T02:53:29.033] [info] [update] Failed to fetch update info: failed to download from 'https://multipass.run/static/latest-release.json': Host multipass.run not found
[2020-11-16T02:53:30.352] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket, SSL:on
[2020-11-16T02:53:30.357] [info] [daemon] primary needs starting. Starting now...
[2020-11-16T02:53:30.362] [info] [daemon] Starting Multipass 1.5.0+mac
[2020-11-16T02:53:30.362] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug
[2020-11-16T02:53:30.366] [warning] [url downloader] Cannot retrieve last modified date for https://cloud-images.ubuntu.com/releases/streams/v1/index.json: Host cloud-images.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.366] [debug] [primary] process working dir ''
[2020-11-16T02:53:30.366] [info] [primary] process program '/Library/Application Support/com.canonical.multipass/bin/hyperkit'
[2020-11-16T02:53:30.366] [info] [primary] process arguments '-c, 1, -m, 1024M, -u, -A, -H, -U, 386bba5a-5dc4-3ac2-95c9-cf0b9a29b352, -s, 0:0,hostbridge, -s, 2:0,virtio-net, -s, 5,virtio-rnd, -s, 31,lpc, -l, com1,autopty=/var/root/Library/Application Support/multipassd/vault/instances/primary/pty,log=/Library/Logs/Multipass/primary-hyperkit.log, -s, 1:0,virtio-blk,file:///var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1,format=qcow,qcow-config=discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false, -s, 1:1,ahci-cd,/var/root/Library/Application Support/multipassd/vault/instances/primary/cloud-init-config.iso, -f, kexec,/var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64-vmlinuz-generic,/var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64-initrd-generic,earlyprintk=serial console=ttyS0 root=/dev/vda1 rw panic=1 no_timer_check'
[2020-11-16T02:53:30.367] [info] [primary] process state changed to Starting
[2020-11-16T02:53:30.368] [debug] [primary] Trying SSH on 192.168.118.5:22
[2020-11-16T02:53:30.371] [warning] [url downloader] Cannot retrieve last modified date for https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json: Host cloud-images.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.383] [info] [primary] process state changed to Running
[2020-11-16T02:53:30.479] [error] [primary] Using fd 5 for I/O notifications
[2020-11-16T02:53:30.481] [error] [primary] hv_vm_create unknown error -85377023
[2020-11-16T02:53:30.481] [error] [primary] process error occurred Crashed
[2020-11-16T02:53:30.481] [info] [primary] process state changed to NotRunning
[2020-11-16T02:53:30.481] [info] [primary] process finished with exit code 6
[2020-11-16T02:53:30.505] [warning] [url downloader] Cannot retrieve last modified date for https://cloud-images.ubuntu.com/daily/streams/v1/index.json: Host cloud-images.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.507] [warning] [url downloader] Cannot retrieve last modified date for https://cloud-images.ubuntu.com/daily/streams/v1/com.ubuntu.cloud:daily:download.json: Host cloud-images.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.546] [warning] [url downloader] Cannot retrieve last modified date for http://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/index.json: Host cdimage.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.548] [warning] [url downloader] Cannot retrieve last modified date for http://cdimage.ubuntu.com/ubuntu-core/appliances/streams/v1/com.ubuntu.appliances:released:download.json: Host cdimage.ubuntu.com not found. Using cached data instead.
[2020-11-16T02:53:30.550] [warning] [VMImageHost] Could not update manifest: failed to download from 'http://cdimage.ubuntu.com/ubuntu-core/16/stable/current/ubuntu-core-16-amd64.img.xz': Host cdimage.ubuntu.com not found
[2020-11-16T02:53:30.552] [warning] [VMImageHost] Could not update manifest: failed to download from 'https://cloud-images.ubuntu.com/buildd/releases/bionic/release/bionic-server-cloudimg-amd64-disk.img': Host cloud-images.ubuntu.com not found
[2020-11-16T09:58:47.474] [debug] [primary] Trying SSH on 192.168.118.5:22
[2020-11-16T10:14:31.111] [debug] [primary] Trying SSH on 192.168.118.5:22
[2020-11-16T10:16:00.232] [info] [daemon] Received signal 15 (Terminated: 15)
[2020-11-16T10:16:00.233] [info] [daemon] Goodbye!
[2020-11-16T10:16:00.338] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket, SSL:on
[2020-11-16T10:16:00.344] [info] [daemon] Starting Multipass 1.5.0+mac
[2020-11-16T10:16:00.344] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd --verbosity debug
[2020-11-16T10:16:01.716] [debug] [update] Latest Multipass release available is version 1.5.0
[2020-11-16T10:16:13.834] [debug] [primary] process working dir ''
[2020-11-16T10:16:13.834] [info] [primary] process program '/Library/Application Support/com.canonical.multipass/bin/hyperkit'
[2020-11-16T10:16:13.835] [info] [primary] process arguments '-c, 1, -m, 1024M, -u, -A, -H, -U, 386bba5a-5dc4-3ac2-95c9-cf0b9a29b352, -s, 0:0,hostbridge, -s, 2:0,virtio-net, -s, 5,virtio-rnd, -s, 31,lpc, -l, com1,autopty=/var/root/Library/Application Support/multipassd/vault/instances/primary/pty,log=/Library/Logs/Multipass/primary-hyperkit.log, -s, 1:0,virtio-blk,file:///var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1,format=qcow,qcow-config=discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false, -s, 1:1,ahci-cd,/var/root/Library/Application Support/multipassd/vault/instances/primary/cloud-init-config.iso, -f, kexec,/var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64-vmlinuz-generic,/var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64-initrd-generic,earlyprintk=serial console=ttyS0 root=/dev/vda1 rw panic=1 no_timer_check'
[2020-11-16T10:16:13.835] [debug] [primary] Trying SSH on 192.168.118.5:22
[2020-11-16T10:16:13.835] [info] [primary] process state changed to Starting
[2020-11-16T10:16:13.861] [info] [primary] process state changed to Running
[2020-11-16T10:16:13.947] [error] [primary] Using fd 5 for I/O notifications
[2020-11-16T10:16:13.957] [info] [primary] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None

[2020-11-16T10:16:14.036] [info] [primary] hyperkit: [INFO] Resized file to 45661 clusters (5844608 sectors)
[2020-11-16T10:16:14.036] [info] [primary] hyperkit: [INFO] block recycler starting with keep_erased = 2048
[2020-11-16T10:16:14.037] [info] [primary] hyperkit: [INFO] image has 0 free sectors and 45657 used sectors
[2020-11-16T10:16:14.037] [info] [primary] mirage_block_open: block_config = file:///var/root/Library/Application Support/multipassd/vault/instances/primary/ubuntu-20.04-server-cloudimg-amd64.img?sync=os&buffered=1 and qcow_config = discard=true;compact_after_unmaps=262144;keep_erased=262144;runtime_asserts=false and stats_config = None returning 0
mirage_block_stat

[2020-11-16T10:16:14.037] [info] [primary] /var/root/Library/Application Support/multipassd/vault/instances/primary/cloud-init-config.iso: fcntl(F_PUNCHHOLE) Operation not permitted: block device will not support TRIM/DISCARD
[2020-11-16T10:16:14.540] [error] [primary] vmx_set_ctlreg: cap_field: 4 bit: 12 unspecified don't care: bit is 0
[2020-11-16T10:16:14.540] [error] [primary] vmx_set_ctlreg: cap_field: 4 bit: 20 unspecified don't care: bit is 0
[2020-11-16T10:16:14.540] [error] [primary] vmx_set_ctlreg: cap_field: 3 bit: 13 unspecified don't care: bit is 0
[2020-11-16T10:16:15.043] [info] [primary] rdmsr to register 0x140 on vcpu 0
[2020-11-16T10:16:16.366] [info] [primary] rdmsr to register 0x34 on vcpu 0
[2020-11-16T10:16:19.112] [info] [primary] hyperkit: [INFO] block recycler: triggering background flush: 45657 used; 0 junk; 0 erased; 0 available; 0 copies; 0 roots; 0 Copying; 0 Copied; 0 Flushed; 0 Referenced; max_cluster = 45660
[2020-11-16T10:16:30.329] [info] [primary] hyperkit: [INFO] Allocator: 45657 used; 0 junk; 0 erased; 0 available; 0 copies; 0 roots; 0 Copying; 0 Copied; 0 Flushed; 0 Referenced; max_cluster = 45660
[2020-11-16T10:16:30.330] [info] [primary] hyperkit: [INFO] Allocator: file contains cluster 0 .. 45660 will enlarge file to 0 .. 46172
[2020-11-16T10:16:30.330] [info] [primary] hyperkit: [INFO] resize: adding available clusters (Node ((x 45661) (y 46172) (l Empty) (r Empty) (h 1) (cardinal 512)))
[2020-11-16T10:16:37.216] [info] [sshfs-mounts] mounting /Users/flying-sausages/Git/Fun/swizzin => /Users/flying-sausages/Git/Fun/swizzin in primary
[2020-11-16T10:16:37.216] [info] [sshfs-mounts] process program '/Library/Application Support/com.canonical.multipass/bin/sshfs_server'
[2020-11-16T10:16:37.216] [info] [sshfs-mounts] process arguments '192.168.118.5, 22, ubuntu, /Users/flying-sausages/Git/Fun/swizzin, /Users/flying-sausages/Git/Fun/swizzin, 501:-1,, 20:-1,'
[2020-11-16T10:16:37.216] [debug] [basic_process] starting: /Library/Application Support/com.canonical.multipass/bin/sshfs_server 192.168.118.5 22 ubuntu /Users/flying-sausages/Git/Fun/swizzin /Users/flying-sausages/Git/Fun/swizzin 501:-1, 20:-1,
[2020-11-16T10:17:40.452] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.452] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 15
[2020-11-16T10:17:40.452] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0
[2020-11-16T10:17:40.465] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.466] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 9
[2020-11-16T10:17:40.466] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0
[2020-11-16T10:17:40.474] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.475] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 10
[2020-11-16T10:17:40.475] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0
[2020-11-16T10:17:40.484] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.484] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 11
[2020-11-16T10:17:40.484] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0
[2020-11-16T10:17:40.493] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.493] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 11
[2020-11-16T10:17:40.493] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0
[2020-11-16T10:17:40.539] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:117 read_stream(type = 0, timeout = -1): 
[2020-11-16T10:17:40.539] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 19
[2020-11-16T10:17:40.539] [debug] [ssh process] /Users/cibot/actions-runner/_work/multipass-private/multipass-private/src/ssh/ssh_process.cpp:135 read_stream(): num_bytes = 0

Additional info

  • OS: macOS 11 Big Sur
  • multipass version: 1.5.0+mac
  • multipass info --all: Will update once I am able to reproduce this again, seeing as I binned everything and started again.

Additional context
Current workaround is finding the multipassd process through sudo htop and then killing it. There is no trace of multipass in launchctl

@Saviq
Copy link
Collaborator

Saviq commented Nov 16, 2020

Hi @flying-sausages, when this happens, what do you see in /Library/Logs/Multipass/<instance>-hyperkit.log?

Can you comment on when did you leave / get back to the machine?

This suggests the VM crashed (when the host was going to sleep?):

[2020-11-16T02:53:30.481] [error] [primary] hv_vm_create unknown error -85377023
[2020-11-16T02:53:30.481] [error] [primary] process error occurred Crashed

And then Multipass failed to realize?

[2020-11-16T09:58:47.474] [debug] [primary] Trying SSH on 192.168.118.5:22

Would you please enable trace logging, as that may give us more of a clue?

$ sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
$ sudo /usr/libexec/PlistBuddy -c "Add :ProgramArguments: string --verbosity" -c "Add :ProgramArguments: string trace" /Library/LaunchDaemons/com.canonical.multipassd.plist
$ sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist

@flying-sausages
Copy link
Author

Thanks for the fast response! I'm going to wait until this reproduces again and will let you know. I've restarted my instances since so I don't have those old logs.

As for the timing, I've left it when I went to bed, and picked it up in the morning. I'll see if this reproduces again over longe periods of sleep.

@ricab
Copy link
Collaborator

ricab commented Nov 16, 2020

Hi @flying-sausages, does this happen only when powering the machine on? If so, this could be a variation of #1652. Multipass comes up and tries to start the VMs that were previously running, but the OS doesn't seem to be ready yet. We could try a 5s or 10s delay in the daemon when on macOS, but that may always fall short. Otherwise, I don't see an infallible way to handle this. There might be a way to interrogate the OS for a fine-grained status or something like that... it would need investigating.

However, I would expect the crashed instance to be stopped (as in the linked issue). You mentioned you weren't sure if you saw a time-out on the command line, right? Let us know if you see one.

Also, you said you could not restart or stop the VM when it's in that state, correct? But after restarting the daemon (as you mention in your workaround) you can recover it, IIUC? If so, I think that's the best option for you in such situations, to avoid dropping the VMs.

@naturaverl
Copy link

naturaverl commented Dec 13, 2020

Hello, I believe I have the same or at least similar problem. It occurs after I reboot my host (macOS Mojave 10.14.6).

If the VMs were stopped before rebooting, I can restart them just fine.
But if any VMs were running when the host reboot, they end up "stuck".

studio:~ jay$ sw_vers 
ProductName:	Mac OS X
ProductVersion:	10.14.6
BuildVersion:	18G6032

studio:~ jay$ multipass version
multipass  1.5.0+mac
multipassd 1.5.0+mac

studio:~ jay$ multipass start -vvvv microstack-vm                                
start failed: The following errors occurred:                                    
Instance stopped while starting

studio:~ jay$ multipass list
Name                    State             IPv4             Image
microstack-vm           Starting          --               Ubuntu 20.04 LTS

studio:~ jay$ multipass stop -vvvv microstack-vm
Stopping microstack-vm /[2020-12-12T19:49:00.640] [info] [daemon] Cannot open ssh session on "microstack-vm" shutdown: ssh connection failed: 'Failed to connect: Host is down'

[2020-12-12T19:49:00.641] [debug] [sshfs-mounts] No mounts to stop for instance "microstack-vm"

studio:~ jay$ multipass restart -vvvv microstack-vm                               
restart failed: instance "microstack-vm" is not running                         

Regardless of the order I do the list, start, stop, restart commands, I always get the same result.

Workarounds:
1- Avoid "stuck" state: Before rebooting, stop the VM. After rebooting, start the VM.
2- Recover from "stuck" state: run command to stop the vm, then reboot again, and then start the VM.

@ricab
Copy link
Collaborator

ricab commented Dec 14, 2020

Hi @naturaverl, in the session you pasted, your stop command could be succeeding (despite the info message about the ssh failure). Then restart would complain if the VM was stopped. You'd need start instead.

If that still doesn't work, you could try multipass delete <vm> + multipass recover <vm>? Hopefully that would force it to stop and provide a workaround. Unloading and reloading the multipass daemon could also be a workaround without a full reboot — the verbose logs would be interesting too 🙂

@flying-sausages
Copy link
Author

I'm going to wait until this reproduces again and will let you know

Sorry I haven’t forgotten about this but my MacBook Pro has some hardware issues so until spare parts come through I can’t verify this.

@rodmaz
Copy link

rodmaz commented Jan 17, 2021

I have had the same problem twice already in my macOS Big Sur

% multipass version
multipass  1.5.0+mac
multipassd 1.5.0+mac

This sucks as the images get into an corrupted state.

@ricab No, multipass delete <vm> + multipass recover <vm> does not fix it. Yes, restarting the daemon does "fix" the image.

@flying-sausages
Copy link
Author

I'm going to wait until this reproduces again and will let you know

Sorry I haven’t forgotten about this but my MacBook Pro has some hardware issues so until spare parts come through I can’t verify this.

So I got an M1-equipped machine in the meantime, guess I can't come back to you on this one.

@Saviq
Copy link
Collaborator

Saviq commented Oct 26, 2021

@flying-sausages v1.8.0 supports the M1 now, too.

All, please reopen if you encounter this again.

@Saviq Saviq closed this as completed Oct 26, 2021
@mn-dimension
Copy link

I am seeing this on BigSur. Restart and the image becomes corrupt.

multipass 1.8.1+mac
multipassd 1.8.1+mac

As above ... Reestarting the daemon does "fix" the image.

@yyaremenko
Copy link

yyaremenko commented Nov 29, 2021

same here

Big Sur v 11.6

multipass 1.8.1+mac
multipassd 1.8.1+mac

@Saviq apparently only the reporter can reopen the issue;
if the project collaborator closed the issue, nobody can reopen it (perhaps except for the collaborator)
https://stackoverflow.com/questions/21333654/how-to-re-open-an-issue-in-github/21333938
so probably it would make sense if you could reopen the issue, as the problem still exists

@ammmir
Copy link

ammmir commented Jan 23, 2022

still persists in 1.8.1 on macOS 12.1

@philrawlings
Copy link

Same problem on Windows 10, multipass 1.8.1. Instances are stuck after a reboot - just says "starting" indefinitely when running multipass shell (primary and another explicitly created VM)

@Saviq
Copy link
Collaborator

Saviq commented Feb 2, 2022

@philrawlings can you please file a new issue with your details? 1.8.1 was only released for macOS, btw :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants