New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All VMs fail to start after resuming from suspend #2153

Open
andrewdavidwong opened this Issue Jul 5, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@andrewdavidwong
Member

andrewdavidwong commented Jul 5, 2016

Qubes OS version (e.g., R3.1):

R3.2-rc1

Problem Description:

@evadogstar wrote:

Failed to start any VM after suspend (tested on xfce)

Qubes Manager show icon with following error:
Error starting vm internal error: libxenlight failed to create new domain

Log file: libvchen_is_eof

@grote

This comment has been minimized.

Show comment
Hide comment
@grote

grote Sep 17, 2016

I am seeing the same problem now as well after coming back from suspend. Is there any workaround besides restarting the whole machine?

grote commented Sep 17, 2016

I am seeing the same problem now as well after coming back from suspend. Is there any workaround besides restarting the whole machine?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 17, 2016

Member

Can you check /var/log/libvirt/libxl/libxl-driver.log for details?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 17, 2016

Can you check /var/log/libvirt/libxl/libxl-driver.log for details?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@grote

This comment has been minimized.

Show comment
Hide comment
@grote

grote Sep 17, 2016

This seems to be the log directly related to the problem:

xc: error: panic: xc_dom_core.c:676: xc_dom_find_loader: no loader found: Invalid kernel
libxl: error: libxl_dom.c:713:libxl__build_pv: xc_dom_parse_image failed: No such file or directory
libxl: error: libxl_create.c:1145:domcreate_rebuild_done: cannot (re-)build domain: -3
xc: error: panic: xc_dom_core.c:386: xc_dom_do_gunzip: inflate failed (rc=-3): Internal error
xc: error: panic: xc_dom_bzimageloader.c:711: xc_dom_probe_bzimage_kernel: unable to gzip decompress kernel: Invalid kernel
xc: error: panic: xc_dom_core.c:676: xc_dom_find_loader: no loader found: Invalid kernel
libxl: error: libxl_dom.c:713:libxl__build_pv: xc_dom_parse_image failed: No such file or directory
libxl: error: libxl_create.c:1145:domcreate_rebuild_done: cannot (re-)build domain: -3

This is probably unrelated, but also doesn't look good:

libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=72: Invalid argument
libxl: error: libxl_pci.c:1310:do_pci_remove: xc_domain_irq_permission irq=72: Operation not permitted
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=74: Invalid argument
libxl: error: libxl_pci.c:1310:do_pci_remove: xc_domain_irq_permission irq=74: Operation not permitted
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1041:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=16: Invalid argument
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27328] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27668] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27467] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.

The latter log might have to do with a VM that is automatically started at boot and tries to attach a block device to itself via the rc.local script but recently fails to do so. Maybe that should be its own ticket or it is a user error (race-condition?).

grote commented Sep 17, 2016

This seems to be the log directly related to the problem:

xc: error: panic: xc_dom_core.c:676: xc_dom_find_loader: no loader found: Invalid kernel
libxl: error: libxl_dom.c:713:libxl__build_pv: xc_dom_parse_image failed: No such file or directory
libxl: error: libxl_create.c:1145:domcreate_rebuild_done: cannot (re-)build domain: -3
xc: error: panic: xc_dom_core.c:386: xc_dom_do_gunzip: inflate failed (rc=-3): Internal error
xc: error: panic: xc_dom_bzimageloader.c:711: xc_dom_probe_bzimage_kernel: unable to gzip decompress kernel: Invalid kernel
xc: error: panic: xc_dom_core.c:676: xc_dom_find_loader: no loader found: Invalid kernel
libxl: error: libxl_dom.c:713:libxl__build_pv: xc_dom_parse_image failed: No such file or directory
libxl: error: libxl_create.c:1145:domcreate_rebuild_done: cannot (re-)build domain: -3

This is probably unrelated, but also doesn't look good:

libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=72: Invalid argument
libxl: error: libxl_pci.c:1310:do_pci_remove: xc_domain_irq_permission irq=72: Operation not permitted
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=74: Invalid argument
libxl: error: libxl_pci.c:1310:do_pci_remove: xc_domain_irq_permission irq=74: Operation not permitted
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/1/0 not ready
libxl: error: libxl_pci.c:1041:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_pci.c:1306:do_pci_remove: xc_physdev_unmap_pirq irq=16: Invalid argument
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27328] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27668] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [27467] exited with error status 1
libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.

The latter log might have to do with a VM that is automatically started at boot and tries to attach a block device to itself via the rc.local script but recently fails to do so. Maybe that should be its own ticket or it is a user error (race-condition?).

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 17, 2016

Member

This is interesting - looks like somethings is wrong with kernel image.
Check what do you have as "kernel" property for that VM, then check /var/lib/qubes/vm-kernels/<VERSION_FROM_KERNEL_PROPERTY>. There should be vmlinuz file, with non-zero size. Running a file on it should give something like:

[marmarek@dom0 ~]$ file /var/lib/qubes/vm-kernels/4.4.14-11/vmlinuz 
/var/lib/qubes/vm-kernels/4.4.14-11/vmlinuz: Linux kernel x86 boot executable bzImage, version 4.4.14-11.pvops.qubes.x86_64 (user@release) #1 SMP Tue Jul 19 0, RO-rootFS, swap_dev 0x5, Normal VGA

All that /etc/xen/scripts/block related errors are about cleaning up (see remove parameter). While this is still a bug, harmless one probably.

Member

marmarek commented Sep 17, 2016

This is interesting - looks like somethings is wrong with kernel image.
Check what do you have as "kernel" property for that VM, then check /var/lib/qubes/vm-kernels/<VERSION_FROM_KERNEL_PROPERTY>. There should be vmlinuz file, with non-zero size. Running a file on it should give something like:

[marmarek@dom0 ~]$ file /var/lib/qubes/vm-kernels/4.4.14-11/vmlinuz 
/var/lib/qubes/vm-kernels/4.4.14-11/vmlinuz: Linux kernel x86 boot executable bzImage, version 4.4.14-11.pvops.qubes.x86_64 (user@release) #1 SMP Tue Jul 19 0, RO-rootFS, swap_dev 0x5, Normal VGA

All that /etc/xen/scripts/block related errors are about cleaning up (see remove parameter). While this is still a bug, harmless one probably.

@grote

This comment has been minimized.

Show comment
Hide comment
@grote

grote Sep 18, 2016

There should be vmlinuz file, with non-zero size. Running a file on it should give something like

Now after having restarted, this is all true and gives exactly the same output. If I ever encounter the problem again (and I use suspend a lot) I will check this again, but I suspect the same result or why would this file suddenly change on dom0.

grote commented Sep 18, 2016

There should be vmlinuz file, with non-zero size. Running a file on it should give something like

Now after having restarted, this is all true and gives exactly the same output. If I ever encounter the problem again (and I use suspend a lot) I will check this again, but I suspect the same result or why would this file suddenly change on dom0.

@andrewdavidwong andrewdavidwong added this to the Release 3.2 updates milestone Dec 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment