New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel 4.9 no longer works in PVGRUB2 VMs #2762

Closed
rtiangha opened this Issue Apr 19, 2017 · 18 comments

Comments

Projects
None yet
3 participants
@rtiangha

Qubes OS version (e.g., R3.2):

R3.2

Affected TemplateVMs (e.g., fedora-23, if applicable):

Debian 8 & 9, haven't tested on Fedora.


Expected behavior:

A PVGRUB2 based VM starts up with the green status indicator in Qubes VM Manager.

Actual behavior:

VM starts up but only gets to yellow. Sometimes (like on boot up), the indicator will be green but will quickly change to yellow as soon as you try to launch any app (ex. xterm). You can still connect to the VM via virsh. There is no noticeable error messages in dmesg; the only meaningful error message is in guid.log:

ErrorHandler: BadAccess (attempt to access private resource denied)
Major opcode: 130 (MIT-SHM)
Minor opcode: 1 (X_ShmAttach)
ResourceID: 0x43
Failed serial number: 101
Current serial number: 102

Steps to reproduce the behavior:

Install a 4.9 kernel (either a 4.9 coldkernel or the stock Debian 8 4.9 kernel from jessie-backports) into an up-to-date Debian 8 VM on the Qubes stable track. Switch the vm kernel to PVGRUB2 and boot it.

General notes:

The issue does not occur with a stock Debian 3.16 PVGRUB kernel. I have not tested with other kernels other than those two. The issue did not occur with a 4.9 based coldkernel until the latest round of stable Qubes VM updates; 4.9 coldkernels have been running fine in PVGRUB mode for months.


Related issues:

Message Thread:
https://groups.google.com/forum/#!topic/qubes-users/2X8wi5XebJc

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

I should also add that I'm currently running everything that's available in current-testing right now in dom0. My testing Debian 8 template was a clean and updated stock Debian 8 template with backports enabled to get the stock Debian 4.9 kernel.

I should also add that I'm currently running everything that's available in current-testing right now in dom0. My testing Debian 8 template was a clean and updated stock Debian 8 template with backports enabled to get the stock Debian 4.9 kernel.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 19, 2017

Member

Anything interesting in dmesg? Maybe problem with recent gui agent (#2617 ) (although that one was about X server in Debian 9 or Fedora 25)?

Member

marmarek commented Apr 19, 2017

Anything interesting in dmesg? Maybe problem with recent gui agent (#2617 ) (although that one was about X server in Debian 9 or Fedora 25)?

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

That's the problem; there really is nothing in dmesg that's out of the ordinary. The only thing different is that message from guid.log, although I could see if I can access some other diagnostic that has greater verbosity.

The machines I have based on Debian 9 simply won't start; when monitoring with virsh, grub loads, it tries to decompress the kernel, and then the machine just dies and shuts down; nothing obvious in console log and the others have nothing useful except "domain dead". But I haven't looked into that machine too much. It does boot fine with a default dom0 vm kernel, though.

On a whim, I did try downgrading qubes-gui-agent to 3.2.8 on that Debian 8 template to see if that would help (I had to delete the two new xserver packages to get it to downgrade, though) and while the light turned green, windows still wouldn't launch. I did notice that qubes-set-window-layout was eating CPU and was still running when I woke up in the morning. I don't know if that's helpful or not. Is there anything I should downgrade to in order to try and test for a regression?

rtiangha commented Apr 19, 2017

That's the problem; there really is nothing in dmesg that's out of the ordinary. The only thing different is that message from guid.log, although I could see if I can access some other diagnostic that has greater verbosity.

The machines I have based on Debian 9 simply won't start; when monitoring with virsh, grub loads, it tries to decompress the kernel, and then the machine just dies and shuts down; nothing obvious in console log and the others have nothing useful except "domain dead". But I haven't looked into that machine too much. It does boot fine with a default dom0 vm kernel, though.

On a whim, I did try downgrading qubes-gui-agent to 3.2.8 on that Debian 8 template to see if that would help (I had to delete the two new xserver packages to get it to downgrade, though) and while the light turned green, windows still wouldn't launch. I did notice that qubes-set-window-layout was eating CPU and was still running when I woke up in the morning. I don't know if that's helpful or not. Is there anything I should downgrade to in order to try and test for a regression?

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

Edit: I was wrong. In stretch (I didn't downgrade anything on this one), it dies right after saying

"No such device: /boot/xen/pvboot-x86_64.elf"
Reading (xen/xvda/boot/grub/grub.cfg)
'error: file /boot/grub/fonts/unicode.pf2' not found.
error: no suitable video mode found

I don't know if that's normal or not though; I've never really watched that part of the boot sequence before when everything was working normally.

rtiangha commented Apr 19, 2017

Edit: I was wrong. In stretch (I didn't downgrade anything on this one), it dies right after saying

"No such device: /boot/xen/pvboot-x86_64.elf"
Reading (xen/xvda/boot/grub/grub.cfg)
'error: file /boot/grub/fonts/unicode.pf2' not found.
error: no suitable video mode found

I don't know if that's normal or not though; I've never really watched that part of the boot sequence before when everything was working normally.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

Back to the Debian 8 template, I was running journalctl -f using virsh and when trying to launch an xterm window, I noticed a lot of these messages appearing on the screen (only a copy/paste excerpt; I'd attach a log but I don't know where these ones are being stored; there were a lot of these flying through the screen):

Apr 19 09:09:21 Multimedia su[7853]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:21 Multimedia systemd[1]: Starting Session c169 of user root.
Apr 19 09:09:21 Multimedia systemd-logind[773]: New session c169 of user root.
Apr 19 09:09:21 Multimedia systemd[1]: Started Session c169 of user root.
Apr 19 09:09:22 Multimedia su[7868]: Successful su for user by root
Apr 19 09:09:22 Multimedia su[7868]: + ??? root:user
Apr 19 09:09:22 Multimedia su[7868]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:22 Multimedia su[7868]: pam_unix(su:session): session closed for user user
Apr 19 09:09:22 Multimedia su[7853]: pam_unix(su:session): session closed for user root
Apr 19 09:09:22 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: pid 7853 exited with 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:22 Multimedia systemd-logind[773]: Removed session c169.
Apr 19 09:09:22 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7894
Apr 19 09:09:22 Multimedia su[7894]: Successful su for root by root
Apr 19 09:09:22 Multimedia su[7894]: + ??? root:root
Apr 19 09:09:22 Multimedia su[7894]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:22 Multimedia systemd[1]: Starting Session c170 of user root.
Apr 19 09:09:22 Multimedia systemd-logind[773]: New session c170 of user root.
Apr 19 09:09:22 Multimedia systemd[1]: Started Session c170 of user root.
Apr 19 09:09:22 Multimedia su[7908]: Successful su for user by root
Apr 19 09:09:22 Multimedia su[7908]: + ??? root:user
Apr 19 09:09:22 Multimedia su[7908]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:22 Multimedia su[7908]: pam_unix(su:session): session closed for user user
Apr 19 09:09:22 Multimedia su[7894]: pam_unix(su:session): session closed for user root
Apr 19 09:09:22 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: pid 7894 exited with 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:22 Multimedia systemd-logind[773]: Removed session c170.
Apr 19 09:09:22 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7932
Apr 19 09:09:22 Multimedia su[7932]: Successful su for root by root
Apr 19 09:09:22 Multimedia su[7932]: + ??? root:root
Apr 19 09:09:22 Multimedia su[7932]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:22 Multimedia systemd[1]: Starting Session c171 of user root.
Apr 19 09:09:22 Multimedia systemd-logind[773]: New session c171 of user root.
Apr 19 09:09:22 Multimedia systemd[1]: Started Session c171 of user root.
Apr 19 09:09:23 Multimedia su[7947]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[7947]: + ??? root:user
Apr 19 09:09:23 Multimedia su[7947]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:23 Multimedia su[7947]: pam_unix(su:session): session closed for user user
Apr 19 09:09:23 Multimedia su[7932]: pam_unix(su:session): session closed for user root
Apr 19 09:09:23 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: pid 7932 exited with 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:23 Multimedia systemd-logind[773]: Removed session c171.
Apr 19 09:09:23 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7975
Apr 19 09:09:23 Multimedia su[7975]: Successful su for root by root
Apr 19 09:09:23 Multimedia su[7975]: + ??? root:root
Apr 19 09:09:23 Multimedia su[7975]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:23 Multimedia systemd[1]: Starting Session c172 of user root.
Apr 19 09:09:23 Multimedia systemd-logind[773]: New session c172 of user root.
Apr 19 09:09:23 Multimedia systemd[1]: Started Session c172 of user root.
Apr 19 09:09:23 Multimedia su[7990]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[7990]: + ??? root:user
Apr 19 09:09:23 Multimedia su[7990]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:23 Multimedia su[7990]: pam_unix(su:session): session closed for user user
Apr 19 09:09:23 Multimedia su[7975]: pam_unix(su:session): session closed for user root
Apr 19 09:09:23 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: pid 7975 exited with 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:23 Multimedia systemd-logind[773]: Removed session c172.
Apr 19 09:09:23 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 8013
Apr 19 09:09:23 Multimedia su[8013]: Successful su for root by root
Apr 19 09:09:23 Multimedia su[8013]: + ??? root:root
Apr 19 09:09:23 Multimedia su[8013]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:23 Multimedia systemd[1]: Starting Session c173 of user root.
Apr 19 09:09:23 Multimedia systemd-logind[773]: New session c173 of user root.
Apr 19 09:09:23 Multimedia systemd[1]: Started Session c173 of user root.
Apr 19 09:09:23 Multimedia su[8028]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[8028]: + ??? root:user
Apr 19 09:09:23 Multimedia su[8028]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:24 Multimedia su[8028]: pam_unix(su:session): session closed for user user
Apr 19 09:09:24 Multimedia su[8013]: pam_unix(su:session): session closed for user root
Apr 19 09:09:24 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:24 Multimedia qrexec-agent[819]: pid 8013 exited with 1
Apr 19 09:09:24 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:24 Multimedia systemd-logind[773]: Removed session c173.

Back to the Debian 8 template, I was running journalctl -f using virsh and when trying to launch an xterm window, I noticed a lot of these messages appearing on the screen (only a copy/paste excerpt; I'd attach a log but I don't know where these ones are being stored; there were a lot of these flying through the screen):

Apr 19 09:09:21 Multimedia su[7853]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:21 Multimedia systemd[1]: Starting Session c169 of user root.
Apr 19 09:09:21 Multimedia systemd-logind[773]: New session c169 of user root.
Apr 19 09:09:21 Multimedia systemd[1]: Started Session c169 of user root.
Apr 19 09:09:22 Multimedia su[7868]: Successful su for user by root
Apr 19 09:09:22 Multimedia su[7868]: + ??? root:user
Apr 19 09:09:22 Multimedia su[7868]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:22 Multimedia su[7868]: pam_unix(su:session): session closed for user user
Apr 19 09:09:22 Multimedia su[7853]: pam_unix(su:session): session closed for user root
Apr 19 09:09:22 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: pid 7853 exited with 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:22 Multimedia systemd-logind[773]: Removed session c169.
Apr 19 09:09:22 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7894
Apr 19 09:09:22 Multimedia su[7894]: Successful su for root by root
Apr 19 09:09:22 Multimedia su[7894]: + ??? root:root
Apr 19 09:09:22 Multimedia su[7894]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:22 Multimedia systemd[1]: Starting Session c170 of user root.
Apr 19 09:09:22 Multimedia systemd-logind[773]: New session c170 of user root.
Apr 19 09:09:22 Multimedia systemd[1]: Started Session c170 of user root.
Apr 19 09:09:22 Multimedia su[7908]: Successful su for user by root
Apr 19 09:09:22 Multimedia su[7908]: + ??? root:user
Apr 19 09:09:22 Multimedia su[7908]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:22 Multimedia su[7908]: pam_unix(su:session): session closed for user user
Apr 19 09:09:22 Multimedia su[7894]: pam_unix(su:session): session closed for user root
Apr 19 09:09:22 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: pid 7894 exited with 1
Apr 19 09:09:22 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:22 Multimedia systemd-logind[773]: Removed session c170.
Apr 19 09:09:22 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7932
Apr 19 09:09:22 Multimedia su[7932]: Successful su for root by root
Apr 19 09:09:22 Multimedia su[7932]: + ??? root:root
Apr 19 09:09:22 Multimedia su[7932]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:22 Multimedia systemd[1]: Starting Session c171 of user root.
Apr 19 09:09:22 Multimedia systemd-logind[773]: New session c171 of user root.
Apr 19 09:09:22 Multimedia systemd[1]: Started Session c171 of user root.
Apr 19 09:09:23 Multimedia su[7947]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[7947]: + ??? root:user
Apr 19 09:09:23 Multimedia su[7947]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:23 Multimedia su[7947]: pam_unix(su:session): session closed for user user
Apr 19 09:09:23 Multimedia su[7932]: pam_unix(su:session): session closed for user root
Apr 19 09:09:23 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: pid 7932 exited with 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:23 Multimedia systemd-logind[773]: Removed session c171.
Apr 19 09:09:23 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 7975
Apr 19 09:09:23 Multimedia su[7975]: Successful su for root by root
Apr 19 09:09:23 Multimedia su[7975]: + ??? root:root
Apr 19 09:09:23 Multimedia su[7975]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:23 Multimedia systemd[1]: Starting Session c172 of user root.
Apr 19 09:09:23 Multimedia systemd-logind[773]: New session c172 of user root.
Apr 19 09:09:23 Multimedia systemd[1]: Started Session c172 of user root.
Apr 19 09:09:23 Multimedia su[7990]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[7990]: + ??? root:user
Apr 19 09:09:23 Multimedia su[7990]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:23 Multimedia su[7990]: pam_unix(su:session): session closed for user user
Apr 19 09:09:23 Multimedia su[7975]: pam_unix(su:session): session closed for user root
Apr 19 09:09:23 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: pid 7975 exited with 1
Apr 19 09:09:23 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:23 Multimedia systemd-logind[773]: Removed session c172.
Apr 19 09:09:23 Multimedia qrexec-agent[819]: executed root:QUBESRPC qubes.WaitForSession none pid 8013
Apr 19 09:09:23 Multimedia su[8013]: Successful su for root by root
Apr 19 09:09:23 Multimedia su[8013]: + ??? root:root
Apr 19 09:09:23 Multimedia su[8013]: pam_unix(su:session): session opened for user root by (uid=0)
Apr 19 09:09:23 Multimedia systemd[1]: Starting Session c173 of user root.
Apr 19 09:09:23 Multimedia systemd-logind[773]: New session c173 of user root.
Apr 19 09:09:23 Multimedia systemd[1]: Started Session c173 of user root.
Apr 19 09:09:23 Multimedia su[8028]: Successful su for user by root
Apr 19 09:09:23 Multimedia su[8028]: + ??? root:user
Apr 19 09:09:23 Multimedia su[8028]: pam_unix(su:session): session opened for user user by (uid=0)
Apr 19 09:09:24 Multimedia su[8028]: pam_unix(su:session): session closed for user user
Apr 19 09:09:24 Multimedia su[8013]: pam_unix(su:session): session closed for user root
Apr 19 09:09:24 Multimedia qrexec-agent[819]: send exit code 1
Apr 19 09:09:24 Multimedia qrexec-agent[819]: pid 8013 exited with 1
Apr 19 09:09:24 Multimedia qrexec-agent[819]: eintr
Apr 19 09:09:24 Multimedia systemd-logind[773]: Removed session c173.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

Hmm, I think I may need to do more testing because I've only tested one machine, but in full disclosure, I do run the "Replacing password-less root access with Dom0 user prompt" configuration as listed here:

https://www.qubes-os.org/doc/vm-sudo/

And reverting it made a window launch normally. I'm going to switch back all of my VMs, try again, and will report the results soon.

Hmm, I think I may need to do more testing because I've only tested one machine, but in full disclosure, I do run the "Replacing password-less root access with Dom0 user prompt" configuration as listed here:

https://www.qubes-os.org/doc/vm-sudo/

And reverting it made a window launch normally. I'm going to switch back all of my VMs, try again, and will report the results soon.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

I'm still in the midst of creating a new baseline on my laptop without the sudo changes, but on my desktop, I haven't made the vm-sudo modifications yet and one of my VMs is exhibiting the same behaviour where the light is still yellow. I've been running journalctl -f in the background on that VM as I work, and it just now started spitting out a bunch of messages similar to the above with me not doing anything. So I'm not sure if those errors are related, or if the vm-sudo changes truly have anything to do with this issue. I'll try to find out once I get to a new baseline and reboot the laptop.

rtiangha commented Apr 19, 2017

I'm still in the midst of creating a new baseline on my laptop without the sudo changes, but on my desktop, I haven't made the vm-sudo modifications yet and one of my VMs is exhibiting the same behaviour where the light is still yellow. I've been running journalctl -f in the background on that VM as I work, and it just now started spitting out a bunch of messages similar to the above with me not doing anything. So I'm not sure if those errors are related, or if the vm-sudo changes truly have anything to do with this issue. I'll try to find out once I get to a new baseline and reboot the laptop.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

OK, after A LOT more testing, I now believe this is not a vm-sudo issue, and more of the typical dkms-did-not-compile-u2mfs-correctly issue.

This is what I think happened:

First, in order to get a 4.9 coldkernel running in a Qubes Debian 8/9/Fedora XX template, I had to modify u2mfn.c with those three extra lines in order for it to compile against kernels newer than 4.8. So technically, I was still running u2mfn 3.2.3. I also run a 4.10 kernel in dom0 that I compiled myself, along with the corresponding qubes-kernel-vm version of it that I use for everything else (ex. TemplateVMs).

When the update to the official qubes-kernel-vm-support package came out, during the upgrade cycle, it uninstalled the 3.2.3 u2mfn module, and then I noticed it (dkms?) threw an error message saying that it couldn't find the headers or source code to the currently running kernel (which was the 4.10 dom0 vm kernel), which is normal because that always happens and I don't have that stuff installed.

What I think happened is that instead of continuing on and rebuilding modules for every locally installed kernel in the vm, it exited at that point and the upgrade continued on.

I think this is what happened because after reinstalling the coldkernel and headers in those VMs, the green light came back on after the AppVMs were rebooted as I assume that on install, dkms successfully recompiled that module.

As for what made me think the vm-sudo modifications were the issue, I think I actually did try to reinstall the kernel in that AppVM's TemplateVM in order to see if forcing it to recompile that module would fix things, and then I just forgot about it. So when I had the idea to revert those vm-sudo changes, I inadvertently did it to that same vm I had meant to run a different test on, and then just assumed the vm-sudo changes were what made it work since it was the last thing I had modified on that vm.

As for what made me think that 4.9 in general was the issue, it was because when I did the second test (on the same VM) with the stock Debian 4.9 kernel from backports, the header files did not get automatically pulled in (the did with the 3.16 kernel) and thus, I assume dkms failed again to compile the u2mfn module. Manually ensuring that the matching kernel headers were installed and then invoking

ls /var/lib/initramfs-tools |
sudo xargs -n1 /usr/lib/dkms/dkms_autoinstaller start

worked. Running apt-get install -t jessie-backports linux-headers-amd64 linux-image-amd64 in a fresh template also worked.

So yes, this ticket is a variant of dkms simply not compiling that u2mfn module correctly. I don't know what could be done to prevent such things from happening in the future since there are so many different ways to manually install a local kernel in a template, but if the qubes-kernel-vm-support script has any control over how dkms is invoked, then if it isn't set to compile dkms modules for every single kernel that may be installed in a vm rather than just doing it for the currently running kernel, then maybe that's a change that could be made to make upgrades easier in the future.

OK, after A LOT more testing, I now believe this is not a vm-sudo issue, and more of the typical dkms-did-not-compile-u2mfs-correctly issue.

This is what I think happened:

First, in order to get a 4.9 coldkernel running in a Qubes Debian 8/9/Fedora XX template, I had to modify u2mfn.c with those three extra lines in order for it to compile against kernels newer than 4.8. So technically, I was still running u2mfn 3.2.3. I also run a 4.10 kernel in dom0 that I compiled myself, along with the corresponding qubes-kernel-vm version of it that I use for everything else (ex. TemplateVMs).

When the update to the official qubes-kernel-vm-support package came out, during the upgrade cycle, it uninstalled the 3.2.3 u2mfn module, and then I noticed it (dkms?) threw an error message saying that it couldn't find the headers or source code to the currently running kernel (which was the 4.10 dom0 vm kernel), which is normal because that always happens and I don't have that stuff installed.

What I think happened is that instead of continuing on and rebuilding modules for every locally installed kernel in the vm, it exited at that point and the upgrade continued on.

I think this is what happened because after reinstalling the coldkernel and headers in those VMs, the green light came back on after the AppVMs were rebooted as I assume that on install, dkms successfully recompiled that module.

As for what made me think the vm-sudo modifications were the issue, I think I actually did try to reinstall the kernel in that AppVM's TemplateVM in order to see if forcing it to recompile that module would fix things, and then I just forgot about it. So when I had the idea to revert those vm-sudo changes, I inadvertently did it to that same vm I had meant to run a different test on, and then just assumed the vm-sudo changes were what made it work since it was the last thing I had modified on that vm.

As for what made me think that 4.9 in general was the issue, it was because when I did the second test (on the same VM) with the stock Debian 4.9 kernel from backports, the header files did not get automatically pulled in (the did with the 3.16 kernel) and thus, I assume dkms failed again to compile the u2mfn module. Manually ensuring that the matching kernel headers were installed and then invoking

ls /var/lib/initramfs-tools |
sudo xargs -n1 /usr/lib/dkms/dkms_autoinstaller start

worked. Running apt-get install -t jessie-backports linux-headers-amd64 linux-image-amd64 in a fresh template also worked.

So yes, this ticket is a variant of dkms simply not compiling that u2mfn module correctly. I don't know what could be done to prevent such things from happening in the future since there are so many different ways to manually install a local kernel in a template, but if the qubes-kernel-vm-support script has any control over how dkms is invoked, then if it isn't set to compile dkms modules for every single kernel that may be installed in a vm rather than just doing it for the currently running kernel, then maybe that's a change that could be made to make upgrades easier in the future.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 19, 2017

Member

Thanks @rtiangha for testing all those combinations! So, just to be sure - all those issues were caused by broken installation of u2mfn, but after fixing it, 4.9 (whether coldkernel one or not) do work correctly?

Member

marmarek commented Apr 19, 2017

Thanks @rtiangha for testing all those combinations! So, just to be sure - all those issues were caused by broken installation of u2mfn, but after fixing it, 4.9 (whether coldkernel one or not) do work correctly?

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

Yeah. I had to do it manually, however and a user may not know they need to do that. If Qubes has any control over that part of the upgrade process (particularly ensuring dkms compiles modules for all locally installed kernels in a VM rather than just for the current running one), then that might be something to look at to help prevent some (but maybe not all since there are many different ways to manually install a kernel in a vm) of these sorts of things happening in the future.

Yeah. I had to do it manually, however and a user may not know they need to do that. If Qubes has any control over that part of the upgrade process (particularly ensuring dkms compiles modules for all locally installed kernels in a VM rather than just for the current running one), then that might be something to look at to help prevent some (but maybe not all since there are many different ways to manually install a kernel in a vm) of these sorts of things happening in the future.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 19, 2017

Member

Hmm, that's interesting. I've just tried installing qubes-kernel-vm-support on Debian 9 and got this:

Setting up qubes-kernel-vm-support (3.2.4+deb9u1) ...
Loading new u2mfn-3.2.4 DKMS files...
It is likely that 4.4.55-11.pvops.qubes.x86_64 belongs to a chroot's host
Building for 4.4.31-11.pvops.qubes.x86_64, 4.4.55-11.pvops.qubes.x86_64 and 4.9.0-2-amd64
Module build for kernel 4.4.31-11.pvops.qubes.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
Module build for kernel 4.4.55-11.pvops.qubes.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
Building initial module for 4.9.0-2-amd64
Done.

u2mfn:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.9.0-2-amd64/updates/dkms/

depmod...

DKMS: install completed.

So, apparently it tries to compile the module for all installed kernels, not only running one. This VM is running on 4.4.55-11.pvops.qubes.x86_64.

Is it any different on Debian 8? Or maybe initial install vs update make a difference here?

Member

marmarek commented Apr 19, 2017

Hmm, that's interesting. I've just tried installing qubes-kernel-vm-support on Debian 9 and got this:

Setting up qubes-kernel-vm-support (3.2.4+deb9u1) ...
Loading new u2mfn-3.2.4 DKMS files...
It is likely that 4.4.55-11.pvops.qubes.x86_64 belongs to a chroot's host
Building for 4.4.31-11.pvops.qubes.x86_64, 4.4.55-11.pvops.qubes.x86_64 and 4.9.0-2-amd64
Module build for kernel 4.4.31-11.pvops.qubes.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
Module build for kernel 4.4.55-11.pvops.qubes.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
Building initial module for 4.9.0-2-amd64
Done.

u2mfn:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.9.0-2-amd64/updates/dkms/

depmod...

DKMS: install completed.

So, apparently it tries to compile the module for all installed kernels, not only running one. This VM is running on 4.4.55-11.pvops.qubes.x86_64.

Is it any different on Debian 8? Or maybe initial install vs update make a difference here?

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

Maybe? I'm not sure anymore. All I know is that I had a 4.9.22 grsecurity kernel already installed, ran the latest batch of Qubes updates (which included qubes-kernel-vm-support in it), and when I tried to restart the AppVM, I got the yellow light. Unfortunately, I wasn't paying close attention to any error messages.

What was your upgrade path on Debian 9? Was the Debian 4.9 kernel already installed before upgrading qubes-kernel-vm-support? I think I have one generic Whonix VM left that hasn't been updated yet, so I could test to see what happens if a 4.9 kernel is pre-installed before upgrading to qubes-kernel-vm-support 3.2.4. I think that would be the closest thing that would mimic the set up I had before.

rtiangha commented Apr 19, 2017

Maybe? I'm not sure anymore. All I know is that I had a 4.9.22 grsecurity kernel already installed, ran the latest batch of Qubes updates (which included qubes-kernel-vm-support in it), and when I tried to restart the AppVM, I got the yellow light. Unfortunately, I wasn't paying close attention to any error messages.

What was your upgrade path on Debian 9? Was the Debian 4.9 kernel already installed before upgrading qubes-kernel-vm-support? I think I have one generic Whonix VM left that hasn't been updated yet, so I could test to see what happens if a 4.9 kernel is pre-installed before upgrading to qubes-kernel-vm-support 3.2.4. I think that would be the closest thing that would mimic the set up I had before.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 19, 2017

Member

This was a Debian 9 template without any kernel installed in it. And also without qubes-kernel-vm-support. First I've installed qubes-kernel-vm-support (which pulled in kernel headers) and only then linux-image.

Member

marmarek commented Apr 19, 2017

This was a Debian 9 template without any kernel installed in it. And also without qubes-kernel-vm-support. First I've installed qubes-kernel-vm-support (which pulled in kernel headers) and only then linux-image.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 19, 2017

OK; let me try it the other way with a 4.9 kernel pre-installed before upgrading qubes-kernel-vm-support. In my head, it shouldn't make a difference, but maybe it does.

OK; let me try it the other way with a 4.9 kernel pre-installed before upgrading qubes-kernel-vm-support. In my head, it shouldn't make a difference, but maybe it does.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 20, 2017

Alright, this is what I did on a Debian 8 template:

Install linux-headers-amd64 and linux-image-amd64 from both jessie and jessie-backports (so that I'd get a 3.16 and 4.9 kernel)
Install qubes-kernel-vm-support

I did it this way to try and simulate what happens when qubes-kernel-vm-support gets upgraded or on first install.

This was the output:

user@host:~$ sudo apt-get install qubes-kernel-vm-support
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
qubes-kernel-vm-support
0 upgraded, 1 newly installed, 0 to remove and 6 not upgraded.
Need to get 10.2 kB of archives.
After this operation, 65.5 kB of additional disk space will be used.
Get:1 http://deb.qubesos4rrrrz6n4.onion/r3.2/vm/ jessie/main qubes-kernel-vm-support amd64 3.2.4+deb8u1 [10.2 kB]
Fetched 10.2 kB in 2s (4,879 B/s)
Selecting previously unselected package qubes-kernel-vm-support.
(Reading database ... 124591 files and directories currently installed.)
Preparing to unpack .../qubes-kernel-vm-support_3.2.4+deb8u1_amd64.deb ...
Unpacking qubes-kernel-vm-support (3.2.4+deb8u1) ...
Setting up qubes-kernel-vm-support (3.2.4+deb8u1) ...
Loading new u2mfn-3.2.4 DKMS files...
First Installation: checking all kernels...
dpkg: warning: version '4.10.11-12.pvops.qubes.x86_64' has bad syntax: invalid character in revision number
dpkg: warning: version '4.10.11-12.pvops.qubes.x86_64' has bad syntax: invalid character in revision number
It is likely that 4.10.11-12.pvops.qubes.x86_64 belongs to a chroot's host
Module build for the currently running kernel was skipped since the
kernel source for this kernel does not seem to be installed.

And that's it. It does not have the "Building initial module" message for any of the pre-installed kernels like the Debian 9 one did.

There is no updates/dkms directory in in /lib/modules/4.9.0-0.bpo.2-amd64:

root@host:/lib/modules# ls /lib/modules/4.9.0-0.bpo.2-amd64/
build modules.builtin modules.devname modules.symbols.bin
kernel modules.builtin.bin modules.order source
modules.alias modules.dep modules.softdep
modules.alias.bin modules.dep.bin modules.symbols

And there are no kernel entries at all in /var/lib/dkms/u2mfn/3.2.4:

root@host:/lib/modules# ls /var/lib/dkms/u2mfn/3.2.4
build source

So I still think dkms stopped after trying to compile a u2mfn module for the dom0 vm kernel (4.10 in my case). But I haven't tried the same procedure on a Debian 9 template yet. If it's the same, then it's an issue with upgrade or first install of qubes-kernel-vm-support. If it isn't, then it would appear to be a Debian 8 specific thing.

Alright, this is what I did on a Debian 8 template:

Install linux-headers-amd64 and linux-image-amd64 from both jessie and jessie-backports (so that I'd get a 3.16 and 4.9 kernel)
Install qubes-kernel-vm-support

I did it this way to try and simulate what happens when qubes-kernel-vm-support gets upgraded or on first install.

This was the output:

user@host:~$ sudo apt-get install qubes-kernel-vm-support
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
qubes-kernel-vm-support
0 upgraded, 1 newly installed, 0 to remove and 6 not upgraded.
Need to get 10.2 kB of archives.
After this operation, 65.5 kB of additional disk space will be used.
Get:1 http://deb.qubesos4rrrrz6n4.onion/r3.2/vm/ jessie/main qubes-kernel-vm-support amd64 3.2.4+deb8u1 [10.2 kB]
Fetched 10.2 kB in 2s (4,879 B/s)
Selecting previously unselected package qubes-kernel-vm-support.
(Reading database ... 124591 files and directories currently installed.)
Preparing to unpack .../qubes-kernel-vm-support_3.2.4+deb8u1_amd64.deb ...
Unpacking qubes-kernel-vm-support (3.2.4+deb8u1) ...
Setting up qubes-kernel-vm-support (3.2.4+deb8u1) ...
Loading new u2mfn-3.2.4 DKMS files...
First Installation: checking all kernels...
dpkg: warning: version '4.10.11-12.pvops.qubes.x86_64' has bad syntax: invalid character in revision number
dpkg: warning: version '4.10.11-12.pvops.qubes.x86_64' has bad syntax: invalid character in revision number
It is likely that 4.10.11-12.pvops.qubes.x86_64 belongs to a chroot's host
Module build for the currently running kernel was skipped since the
kernel source for this kernel does not seem to be installed.

And that's it. It does not have the "Building initial module" message for any of the pre-installed kernels like the Debian 9 one did.

There is no updates/dkms directory in in /lib/modules/4.9.0-0.bpo.2-amd64:

root@host:/lib/modules# ls /lib/modules/4.9.0-0.bpo.2-amd64/
build modules.builtin modules.devname modules.symbols.bin
kernel modules.builtin.bin modules.order source
modules.alias modules.dep modules.softdep
modules.alias.bin modules.dep.bin modules.symbols

And there are no kernel entries at all in /var/lib/dkms/u2mfn/3.2.4:

root@host:/lib/modules# ls /var/lib/dkms/u2mfn/3.2.4
build source

So I still think dkms stopped after trying to compile a u2mfn module for the dom0 vm kernel (4.10 in my case). But I haven't tried the same procedure on a Debian 9 template yet. If it's the same, then it's an issue with upgrade or first install of qubes-kernel-vm-support. If it isn't, then it would appear to be a Debian 8 specific thing.

@rtiangha

This comment has been minimized.

Show comment
Hide comment
@rtiangha

rtiangha Apr 20, 2017

And I just did it the same way on a Debian 9 template and got your result. So yeah, I'm a bit more confident in declaring this a Debian 8 specific issue.

That said, doing it your way on a Debian 8 template works fine; it happens that way each time I update my coldkernel so I'm confident in that mechanism working the way it should. But it does looks like there might be an issue whenever qubes-kernel-vm-support gets upgraded in the future if there are local kernels that exist in the template at the time, at least from what I can tell.

And I just did it the same way on a Debian 9 template and got your result. So yeah, I'm a bit more confident in declaring this a Debian 8 specific issue.

That said, doing it your way on a Debian 8 template works fine; it happens that way each time I update my coldkernel so I'm confident in that mechanism working the way it should. But it does looks like there might be an issue whenever qubes-kernel-vm-support gets upgraded in the future if there are local kernels that exist in the template at the time, at least from what I can tell.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 20, 2017

Member
Member

marmarek commented Apr 20, 2017

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Apr 2, 2018

Member

Closing as fixed upstream - in Debian 9.

Member

marmarek commented Apr 2, 2018

Closing as fixed upstream - in Debian 9.

@marmarek marmarek closed this Apr 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment