Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upQubes doesn't start VMs on 20% of boots #3626
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 24, 2018
Member
Attempting to run qvm-run -p vm 'echo bla' returns
gntshr: error: ioctl failed: No space left on device
Failed to start data vchan server
This looks related. How many VMs do you have running? Check also how many qrexec-client processes are in dom0 (should be about 1 per VM).
This looks related. How many VMs do you have running? Check also how many qrexec-client processes are in dom0 (should be about 1 per VM). |
andrewdavidwong
added
bug
C: core
labels
Feb 25, 2018
andrewdavidwong
added this to the Release 4.0 milestone
Feb 25, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
3hhh
Mar 8, 2018
Yesterday I ran into this one again.
This looks related. How many VMs do you have running?
It's directly after boot, so just 4.
Check also how many qrexec-client processes are in dom0 (should be about 1 per VM).
Yes, there was exactly one qrexec-daemon per VM; same for client (before I started trying qvm-run commands).
Another interesting fact:
xl console [vm]
Unable to attach console.
3hhh
commented
Mar 8, 2018
|
Yesterday I ran into this one again.
It's directly after boot, so just 4.
Yes, there was exactly one qrexec-daemon per VM; same for client (before I started trying qvm-run commands). Another interesting fact:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
3hhh
Mar 8, 2018
qvm-ls however reported the [vm] to be running. No custom init scripts ran though, e.g. no network manager applet for sys-net etc.
3hhh
commented
Mar 8, 2018
|
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2018
Member
Another interesting fact:
xl console [vm]
Unable to attach console.
Use sudo xl console -t pv [vm]
What virt_mode this VM have? (qvm-prefs [vm] virt_mode)
Use What |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
3hhh
Mar 8, 2018
Use sudo xl console -t pv [vm]
I think I also tried with sudo, but ok I'll try again next time it happens.
What virt_mode this VM have? (qvm-prefs [vm] virt_mode)
3* pvh (out of these 1 disposable), 1* hvm (sys-usb as disposable).
I just sent you the entire journalctl output of the original issue description via e-mail as well - maybe you see more than I do (I didn't want to put it all online).
3hhh
commented
Mar 8, 2018
I think I also tried with sudo, but ok I'll try again next time it happens.
3* pvh (out of these 1 disposable), 1* hvm (sys-usb as disposable). I just sent you the entire journalctl output of the original issue description via e-mail as well - maybe you see more than I do (I didn't want to put it all online). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
3hhh
Mar 14, 2018
sudo xl console -t pv [vm]
That was the right command, thanks!
All of my VMs seemed up and running fine according to their console, but the GUI didn't show.
Thus I believe I identified the root cause:
As part of #3149 I was starting VMs via the WM autostart (awesome for me), which apparently caused a racing condition between the Qubes GUI daemon (or whatever else is started in /etc/xdg/autostart) start and my autostarted VMs.
I just ran into this 7 times in a row and once I removed the VM autostart, waited a few seconds after my X showed up and then started the VMs manually it now worked 2 times in a row. Time will tell more.
I don't really understand this though as in the default Qubes config the VMs are started before the xdg autostart anyway? So why would I have to wait for it?
For reference some suspicious entries in the journal of one of the VMs (might indicate another unrelated issue though):
...
Mar 14 20:38:47 d-sys-usb systemd[1]: Started Session c27 of user user.
Mar 14 20:38:57 d-sys-usb qrexec-agent[633]: pid 1298 exited with -1
Mar 14 20:38:57 d-sys-usb qrexec-agent[633]: eintr
Mar 14 20:39:11 d-sys-usb qrexec-agent[633]: executed user:QUBESRPC qubes.WaitFo
Mar 14 20:39:11 d-sys-usb qrexec-agent[1317]: pam_unix(qrexec:session): session
Mar 14 20:39:11 d-sys-usb systemd[1]: Started Session c28 of user user.
Mar 14 20:39:20 d-sys-usb qrexec-agent[633]: pid 1317 exited with -1
Mar 14 20:39:20 d-sys-usb qrexec-agent[633]: eintr
Mar 14 20:39:34 d-sys-usb qrexec-agent[633]: executed user:QUBESRPC qubes.WaitFo
Mar 14 20:39:34 d-sys-usb qrexec-agent[1338]: pam_unix(qrexec:session): session
Mar 14 20:39:34 d-sys-usb systemd[1]: Started Session c29 of user user.
Mar 14 20:39:44 d-sys-usb qrexec-agent[633]: pid 1338 exited with -1
Mar 14 20:39:44 d-sys-usb qrexec-agent[633]: eintr
Mar 14 20:39:58 d-sys-usb qrexec-agent[633]: executed user:QUBESRPC qubes.WaitFo
Mar 14 20:39:58 d-sys-usb qrexec-agent[1392]: pam_unix(qrexec:session): session
Mar 14 20:39:58 d-sys-usb systemd[1]: Started Session c30 of user user.
...
Mar 14 20:47:06 d-sys-usb kernel: xenbus: xen store gave: unknown error E2BIG
Mar 14 20:47:06 d-sys-usb qrexec-agent[633]: Data vchan connection failed
Mar 14 20:47:06 d-sys-usb qrexec-agent[633]: eintr
Mar 14 20:47:06 d-sys-usb qrexec-agent[633]: Cannot setup watch on /local/domain
Mar 14 20:47:06 d-sys-usb qrexec-agent[633]: Data vchan connection failed
Mar 14 20:47:06 d-sys-usb kernel: xenbus: xen store gave: unknown error E2BIG
...
3hhh
commented
Mar 14, 2018
That was the right command, thanks! Thus I believe I identified the root cause: I just ran into this 7 times in a row and once I removed the VM autostart, waited a few seconds after my X showed up and then started the VMs manually it now worked 2 times in a row. Time will tell more. I don't really understand this though as in the default Qubes config the VMs are started before the xdg autostart anyway? So why would I have to wait for it? For reference some suspicious entries in the journal of one of the VMs (might indicate another unrelated issue though):
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
3hhh
Mar 27, 2018
As part of #3149 I was starting VMs via the WM autostart (awesome for me), which apparently caused a racing condition between the Qubes GUI daemon (or whatever else is started in /etc/xdg/autostart) start and my autostarted VMs.
I just ran into this 7 times in a row and once I removed the VM autostart, waited a few seconds after my X showed up and then started the VMs manually it now worked 2 times in a row. Time will tell more.
Nowadays I call this script from the awesome WM autostart:
#!/bin/bash
#start all the XFCE autostart stuff
#for some stupid reason this apparently must happen BEFORE VMs are started -_- (as of 2018-03 Qubes otherwise crashes in 20% of the boots, cf. https://github.com/QubesOS/qubes-issues/issues/3626)
#dex doesn't block so we need to wait (it also screws with pipes, so we don't give it any)
pgrep -f -a mqui.tray | grep -v $$ || ( dex-autostart -a -e XFCE < /dev/null &> /dev/null ; sleep 10 )
#start VMs
#make sure they are not started by Qubes (qvm-prefs)
#qubes will automatically start all VMs further downstream
qvm-start --skip-if-running sys-usb &
pid=$!
qvm-start --skip-if-running sys-firewall
wait $pidSo far this fixed the issue.
It would be interesting to understand the root cause though.
3hhh
commented
Mar 27, 2018
Nowadays I call this script from the awesome WM autostart: #!/bin/bash
#start all the XFCE autostart stuff
#for some stupid reason this apparently must happen BEFORE VMs are started -_- (as of 2018-03 Qubes otherwise crashes in 20% of the boots, cf. https://github.com/QubesOS/qubes-issues/issues/3626)
#dex doesn't block so we need to wait (it also screws with pipes, so we don't give it any)
pgrep -f -a mqui.tray | grep -v $$ || ( dex-autostart -a -e XFCE < /dev/null &> /dev/null ; sleep 10 )
#start VMs
#make sure they are not started by Qubes (qvm-prefs)
#qubes will automatically start all VMs further downstream
qvm-start --skip-if-running sys-usb &
pid=$!
qvm-start --skip-if-running sys-firewall
wait $pidSo far this fixed the issue. |
3hhh commentedFeb 24, 2018
Qubes OS version:
Since 4.0rc3, currently with 4.0rc4.
Affected component(s):
VMs, dom0 starts fine
Steps to reproduce the behavior:
boot and get unlucky in 20% of cases (at least for me that's the case)
Expected behavior:
VMs can be started
Actual behavior:
VMs seem to start (notification appears), but apparently they hang at some point; all are affected equally
Attempting to run
qvm-run -p vm 'echo bla'returnsSome suspicious entries from a dom0
sudo journalctl -b0:Multiple
Feb 24 12:50:27 dom0 qmemman.daemon.reqhandler[2067]: EOFand multiple
qubesd seemed to be up and running. A qubesd restart didn't change anything.
General notes:
debian-8 and debian-9 templates were tested
Related issues:
#3616 sounds similar wrt the outcome, but I think it's a different issue.