New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to assign pci device at start of usbvm #1544

Open
Rudd-O opened this Issue Dec 26, 2015 · 41 comments

Comments

Projects
None yet
5 participants
@Rudd-O

Rudd-O commented Dec 26, 2015

I'm experiencing a weird error starting a usbvm:

[user@dom0 ~]$ qvm-start usbvm
--> Creating volatile image: /var/lib/qubes/appvms/usbvm/volatile.img...
--> Loading the VM (type = AppVM)...
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 125, in <module>
    main()
  File "/usr/bin/qvm-start", line 109, in main
    xid = vm.start(verbose=options.verbose,
preparing_dvm=options.preparing_dvm, start_guid=not options.noguid,
notify_function=tray_notify_generic if options.tray else None)
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py",
line 1849, in start
    nd.dettach()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 5249, in
dettach
    if ret == -1: raise libvirtError ('virNodeDeviceDettach() failed')
libvirt.libvirtError: Requested operation is not valid: PCI device
0000:00:1a.0 is in use by driver xenlight, domain usbvm

Restarting libvirtd only aggravates the issue:

[user@dom0 ~]$ qvm-start usbvm
--> Creating volatile image: /var/lib/qubes/appvms/usbvm/volatile.img...
--> Loading the VM (type = AppVM)...
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 125, in <module>
    main()
  File "/usr/bin/qvm-start", line 109, in main
    xid = vm.start(verbose=options.verbose, preparing_dvm=options.preparing_dvm, start_guid=not options.noguid, notify_function=tray_notify_generic if options.tray else None)
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py", line 1857, in start
    self.libvirt_domain.createWithFlags(libvirt.VIR_DOMAIN_START_PAUSED)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirt.libvirtError: internal error: libxenlight failed to create new domain 'usbvm'

Weird errors in libxl log:

2015-12-26 x:31:47 TZ libxl: error: libxl_pci.c:1000:do_pci_add: xc_assign_device failed: Operation not permitted
2015-12-26 x:31:47 TZ libxl: error: libxl_create.c:1422:domcreate_attach_pci: libxl_device_pci_add failed: -3

The hypervisor log says:

(XEN) [VT-D] It's disallowed to assign 0000:00:1a.0 with shared RMRR at dbe9a000 for Dom19.
(XEN) XEN_DOMCTL_assign_device: assign 0000:00:1a.0 to dom19 failed (-1)
@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

What's that about the RMRR?

Rudd-O commented Dec 26, 2015

What's that about the RMRR?

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

The VM has never been started.

Not even using autostart at boot?

What's that about the RMRR?
Appears to be some new shit:
http://www.gossamer-threads.com/lists/xen/devel/391684

We have a way to set rdm_policy=relaxed, bundled with pci_strictreset=false - it should be set by default salt formula for sys-usb, exactly for this reason.
My understanding is that those devices in fact shares some resources, so can't be safely isolated from each other. And Xen doesn't support group assignment (at least for now), so don't know that you are going to assign all such devices to the same VM (which should be safe).

Member

marmarek commented Dec 26, 2015

The VM has never been started.

Not even using autostart at boot?

What's that about the RMRR?
Appears to be some new shit:
http://www.gossamer-threads.com/lists/xen/devel/391684

We have a way to set rdm_policy=relaxed, bundled with pci_strictreset=false - it should be set by default salt formula for sys-usb, exactly for this reason.
My understanding is that those devices in fact shares some resources, so can't be safely isolated from each other. And Xen doesn't support group assignment (at least for now), so don't know that you are going to assign all such devices to the same VM (which should be safe).

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Yes, the VM had autostart at boot and the systemd service had failed for this reason.

How do I determine which devices share the RMRR? I couldn't find anything in my logs.

Rudd-O commented Dec 26, 2015

Yes, the VM had autostart at boot and the systemd service had failed for this reason.

How do I determine which devices share the RMRR? I couldn't find anything in my logs.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Holy shit, setting pci_strictreset to False actually let me start that stupid VM!

Rudd-O commented Dec 26, 2015

Holy shit, setting pci_strictreset to False actually let me start that stupid VM!

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

I don't know, but guess it is the other USB controller (or USB2.0/USB3.0). If you assign both/all of them to the same VM, you'll see the same address in xen log (assuming you set pci_strictreset=False first, otherwise VM start will fail at the first device...). Yes, kinda ugly way to determine that...

Member

marmarek commented Dec 26, 2015

I don't know, but guess it is the other USB controller (or USB2.0/USB3.0). If you assign both/all of them to the same VM, you'll see the same address in xen log (assuming you set pci_strictreset=False first, otherwise VM start will fail at the first device...). Yes, kinda ugly way to determine that...

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Wait, spoke too soon. The VM never ran qrexec-daemon.

Rudd-O commented Dec 26, 2015

Wait, spoke too soon. The VM never ran qrexec-daemon.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

It now says in the hypervisor log "It's risky to assign blah blah".

Rudd-O commented Dec 26, 2015

It now says in the hypervisor log "It's risky to assign blah blah".

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

libxl log:

<date> libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/24/0 not ready

Rudd-O commented Dec 26, 2015

libxl log:

<date> libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/24/0 not ready
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

Did it crashed just after startup (state "c" on xl list)? If so, probably not enough continuous memory available (take a look #1038 for details) . You can try to free some with xl mem-set 0 <some-number-in-MB> to reduce dom0 memory drastically. For example down to 1500. Sometimes it helps. Otherwise, reboot...

Member

marmarek commented Dec 26, 2015

Did it crashed just after startup (state "c" on xl list)? If so, probably not enough continuous memory available (take a look #1038 for details) . You can try to free some with xl mem-set 0 <some-number-in-MB> to reduce dom0 memory drastically. For example down to 1500. Sometimes it helps. Otherwise, reboot...

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Yes. It's state crashed. I just checked.

Assigning all USB devices to the same VM worked to fix the problem.

This sucks. Now I don't have my mouse.

Rudd-O commented Dec 26, 2015

Yes. It's state crashed. I just checked.

Assigning all USB devices to the same VM worked to fix the problem.

This sucks. Now I don't have my mouse.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Note that assigning all USB PCI devices did NOT help start the VM. Even with pci_strictreset set to false. It just killed my mouse.

Rudd-O commented Dec 26, 2015

Note that assigning all USB PCI devices did NOT help start the VM. Even with pci_strictreset set to false. It just killed my mouse.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

I will try rebooting now. BRB.

Rudd-O commented Dec 26, 2015

I will try rebooting now. BRB.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

The VM crash at startup is generally a problem with starting VM with PCI devices after some system uptime, memory is much fragmented then. It is independent of previous problem (which is solved with pci_strictreset).

Member

marmarek commented Dec 26, 2015

The VM crash at startup is generally a problem with starting VM with PCI devices after some system uptime, memory is much fragmented then. It is independent of previous problem (which is solved with pci_strictreset).

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Alright, excellent. My USB VM has now received the assignment of the two USB PCI devices I intended to isolate (the Bluetooth and camera devices). I still keep the ability to use my mouse. This is GREAT.

Thanks for the pci_strictreset trick.

Improvement: it really should be somehow autodetected whether it is necessary or not.

Rudd-O commented Dec 26, 2015

Alright, excellent. My USB VM has now received the assignment of the two USB PCI devices I intended to isolate (the Bluetooth and camera devices). I still keep the ability to use my mouse. This is GREAT.

Thanks for the pci_strictreset trick.

Improvement: it really should be somehow autodetected whether it is necessary or not.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

It is set for USB VM by salt formula by default.

Member

marmarek commented Dec 26, 2015

It is set for USB VM by salt formula by default.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Dec 26, 2015

Yes, that's true, but it would not be the default for a manually created USB VM, which was my case, and I bet the case in many cases. A smart default lower in the stack would reduce the support load.

Rudd-O commented Dec 26, 2015

Yes, that's true, but it would not be the default for a manually created USB VM, which was my case, and I bet the case in many cases. A smart default lower in the stack would reduce the support load.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 26, 2015

Member

The proper solution would be to have PCI group assignment supported by Xen. This way it would detect whether it is really risky to assign particular devices to the VM.

Member

marmarek commented Dec 26, 2015

The proper solution would be to have PCI group assignment supported by Xen. This way it would detect whether it is really risky to assign particular devices to the VM.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment

Rudd-O commented Dec 26, 2015

Agreed.

@marmarek marmarek added this to the Far in the future milestone Jan 6, 2016

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jan 6, 2016

Member

Summary:

  • Xen missing feature of PCI group device assignment
  • libvirt bug in tracking which PCI device is used where (libvirt.libvirtError: Requested operation is not valid: PCI device 0000:00:1a.0 is in use by driver xenlight, domain usbvm while starting usbvm)
Member

marmarek commented Jan 6, 2016

Summary:

  • Xen missing feature of PCI group device assignment
  • libvirt bug in tracking which PCI device is used where (libvirt.libvirtError: Requested operation is not valid: PCI device 0000:00:1a.0 is in use by driver xenlight, domain usbvm while starting usbvm)
@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Jan 16, 2016

Quick update: I assigned two of my USB PCI devices (out of three) to a USB VM. That caused hangs and reboots to happen around once each day. They stopped happening as soon as I decided never to power on that VM again. I still have yet to try adding all three USB PCI devices to the USB VM (doing so would disable all USB ports on this machine) We'll see if that causes hangs.

Rudd-O commented Jan 16, 2016

Quick update: I assigned two of my USB PCI devices (out of three) to a USB VM. That caused hangs and reboots to happen around once each day. They stopped happening as soon as I decided never to power on that VM again. I still have yet to try adding all three USB PCI devices to the USB VM (doing so would disable all USB ports on this machine) We'll see if that causes hangs.

@andrewclausen

This comment has been minimized.

Show comment
Hide comment
@andrewclausen

andrewclausen Mar 14, 2016

The pci_strictreset option didn't have any effect for me. (Exactly the same error messages, etc.)

The pci_strictreset option didn't have any effect for me. (Exactly the same error messages, etc.)

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Mar 15, 2016

I found my problem. It was a mouse whose receiver stopped working properly, and started causing lockups and hard reboots whenever it was plugged, irrespective of which VMs it was assigned to. The mouse is now in the trash. PCI strict reset did work for starting the VM though.

Rudd-O commented Mar 15, 2016

I found my problem. It was a mouse whose receiver stopped working properly, and started causing lockups and hard reboots whenever it was plugged, irrespective of which VMs it was assigned to. The mouse is now in the trash. PCI strict reset did work for starting the VM though.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 15, 2016

Member

There is a strange issue related to some Logitech receivers: #1689 . I can confirm it indeed happens, but no idea why. I'd rather blame some kernel driver, not the device itself.

Member

marmarek commented Mar 15, 2016

There is a strange issue related to some Logitech receivers: #1689 . I can confirm it indeed happens, but no idea why. I'd rather blame some kernel driver, not the device itself.

@nothingmuch

This comment has been minimized.

Show comment
Hide comment
@nothingmuch

nothingmuch Oct 23, 2016

I am getting this error too, with pci_strictreset set to false, on a clean install of R3.2 on a Lenovo Yoga 12 which previously had R3.1 working with a usbvm. Disabling USB3 in the bios seemed to work, upgrading the BIOS as mentioned in this thread https://groups.google.com/forum/#!msg/qubes-users/Z6bEMZTjiz4/FbV6T-l_AQAJ did not seem to make a difference.

nothingmuch commented Oct 23, 2016

I am getting this error too, with pci_strictreset set to false, on a clean install of R3.2 on a Lenovo Yoga 12 which previously had R3.1 working with a usbvm. Disabling USB3 in the bios seemed to work, upgrading the BIOS as mentioned in this thread https://groups.google.com/forum/#!msg/qubes-users/Z6bEMZTjiz4/FbV6T-l_AQAJ did not seem to make a difference.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 23, 2016

The Logitech device issue is no longer a problem in modern kernels.

Rudd-O commented Oct 23, 2016

The Logitech device issue is no longer a problem in modern kernels.

@nothingmuch

This comment has been minimized.

Show comment
Hide comment
@nothingmuch

nothingmuch Oct 24, 2016

I'm seeing this with no external USB devices connected.

I'm seeing this with no external USB devices connected.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 27, 2016

Since today, I cannot boot the VMs that have USB devices connected to them. The VM doesn't even show an XL console.

Rudd-O commented Oct 27, 2016

Since today, I cannot boot the VMs that have USB devices connected to them. The VM doesn't even show an XL console.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 27, 2016

I have pci_strictreset set to false.

Rudd-O commented Oct 27, 2016

I have pci_strictreset set to false.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 27, 2016

Member

Is that after upgrading Xen to 4.6.3?
What is exact error? If you start it from command line, does it show up
in xl list (when qvm-start is still running)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Oct 27, 2016

Is that after upgrading Xen to 4.6.3?
What is exact error? If you start it from command line, does it show up
in xl list (when qvm-start is still running)?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

No xl list entry, cannot even open the xen console to it — it just hangs for a little while and then exits.

This single line appears in the xl dmesg:

(XEN) [VT-D] It's risky to assign <PCI device ID> with shared RMRR at <address> for DomNNN

Note that this worked fine before.

Rudd-O commented Oct 28, 2016

No xl list entry, cannot even open the xen console to it — it just hangs for a little while and then exits.

This single line appears in the xl dmesg:

(XEN) [VT-D] It's risky to assign <PCI device ID> with shared RMRR at <address> for DomNNN

Note that this worked fine before.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

Correction, the entry appears briefly:

usbvm            437   200   1   ---sc-     0.0

Rudd-O commented Oct 28, 2016

Correction, the entry appears briefly:

usbvm            437   200   1   ---sc-     0.0
@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

qrexec log for that VM says

domain dead
cannot connect to qrexec agent: No such process

Rudd-O commented Oct 28, 2016

qrexec log for that VM says

domain dead
cannot connect to qrexec agent: No such process
@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

libxl-driver.log says:

<time and date> libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/<domid>0 not ready

Rudd-O commented Oct 28, 2016

libxl-driver.log says:

<time and date> libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/<domid>0 not ready
@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

Xen on the dom0 is 4.6.1-20.fc23

Rudd-O commented Oct 28, 2016

Xen on the dom0 is 4.6.1-20.fc23

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 28, 2016

Member

Probably well known memory fragmentation issue - PV VM with PCI device needs few megs of physically continuous memory for DMA purpose. You can free some by getting it away from dom0: xl mem-set 0 <some-memory-size-in-MB>, where the size is smaller than the current one (for example 500MB smaller). If it does not help, try shutdown some VMs. If still nothing, reboot...

Member

marmarek commented Oct 28, 2016

Probably well known memory fragmentation issue - PV VM with PCI device needs few megs of physically continuous memory for DMA purpose. You can free some by getting it away from dom0: xl mem-set 0 <some-memory-size-in-MB>, where the size is smaller than the current one (for example 500MB smaller). If it does not help, try shutdown some VMs. If still nothing, reboot...

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

Let me try. But if this works, this really should be documented somewhere!

Rudd-O commented Oct 28, 2016

Let me try. But if this works, this really should be documented somewhere!

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 28, 2016

Nope, it did not work at all.

Rudd-O commented Oct 28, 2016

Nope, it did not work at all.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 28, 2016

Member

Ok, lets try harder: touch /var/run/qubes/do-not-membalance. Then try again xl mem-set and qvm-start. And if it doesn't work, repeat (just one more time).

Member

marmarek commented Oct 28, 2016

Ok, lets try harder: touch /var/run/qubes/do-not-membalance. Then try again xl mem-set and qvm-start. And if it doesn't work, repeat (just one more time).

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Oct 30, 2016

On 10/28/2016 10:40 PM, Marek Marczykowski-Górecki wrote:

Ok, lets try harder: |touch /var/run/qubes/do-not-membalance|. Then
try again |xl mem-set| and |qvm-start|. And if it doesn't work, repeat
(just one more time).

A reboot fixed it.

Rudd-O
http://rudd-o.com/

Rudd-O commented Oct 30, 2016

On 10/28/2016 10:40 PM, Marek Marczykowski-Górecki wrote:

Ok, lets try harder: |touch /var/run/qubes/do-not-membalance|. Then
try again |xl mem-set| and |qvm-start|. And if it doesn't work, repeat
(just one more time).

A reboot fixed it.

Rudd-O
http://rudd-o.com/
@xloem

This comment has been minimized.

Show comment
Hide comment
@xloem

xloem Jan 1, 2017

Same experience. I needed to to touch /var/run/qubes/do-not-membalance to get xl mem-set to do anything at all. I kept dropping the dom0 ram in 512MB increments, and qvm-start kept failing, until the system stopped responding. Then things worked after reboot.

Maybe some file to review to determine memory fragmentation, and where the VM memory is getting allocated, for next time? Or some way to determine what made the VM crash?

xloem commented Jan 1, 2017

Same experience. I needed to to touch /var/run/qubes/do-not-membalance to get xl mem-set to do anything at all. I kept dropping the dom0 ram in 512MB increments, and qvm-start kept failing, until the system stopped responding. Then things worked after reboot.

Maybe some file to review to determine memory fragmentation, and where the VM memory is getting allocated, for next time? Or some way to determine what made the VM crash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment