nVidia GPUs not listed when configured for PCI-Passthrough #5968

gavin-cudo · 2022-09-08T15:48:35Z

Description
nVidia GPUs are not listed under a PCI devices on a host configured with PCI-Passthrough.

To Reproduce
Configure a host with nVidia GPUs for PCI-Passthrough as per the documentation at https://docs.opennebula.io/6.4/open_cluster_deployment/kvm_node/pci_passthrough.html

Set the filter under /var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf on the frontend to be:

:filter:
  - '*:*'
:short_address: []
:device_name: []

Expected behavior
All PCI devices are listed under a onehost show <host_id> including nVidia GPUs.

Actual behavior
All PCI devices are listed except nVidia GPUs.

Details

Hypervisor: KVM
Version: 6.4.0 CE and 6.4.0 Enterprise

Additional context
GPUs are listed fine on the host with:

lspci -nn -d 10de:*
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
02:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
81:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
81:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
82:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
82:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
83:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
83:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
84:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c2:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c3:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c3:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c4:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c4:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)

vfio driver is confirmed working as seen below:

lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 147e
	Flags: fast devsel, IRQ 255, NUMA node 0
	Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 38060000000 (64-bit, prefetchable) [size=256M]
	Memory at 38070000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at f5000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Resizable BAR <?>
	Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00] Lane Margining at the Receiver <?>
	Capabilities: [e00] Data Link Feature <?>
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau

cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.4.0-125-generic root=UUID=c01382e3-cd50-4103-a06c-576e6bafe9ce ro iommu=pt amd_iommu=on amdgpu.runpm=0 kvm_amd.sev=1 modprobe.blacklist=nouveau nouveau.modeset=0 nouveau.runpm=0 nvidia-drm.modeset=1 pcie_aspm=off radeon.modeset=0 radeon.runpm=0 vfio-pci vfio_iommu_type1.allow_unsafe_interrupts=1

The above configuration was known to be working on version 6.2.0.

The text was updated successfully, but these errors were encountered:

gavin-cudo · 2022-09-14T08:43:33Z

After further investigation, commenting out lines 114 to 118 in https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/node-probes.d/pci.rb#L114 allows the GPUs to be listed (albeit without the names showing, only IDs).

The offending lines are:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
`ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
next
end

So it looks like a bug introduced in 6.4 when vGPU support was added with commit 7f71959

JungleCatSW · 2022-09-28T19:14:41Z

@xorel can you review the changes on this branch as it fixes a further issue where VMs fail as the UUID is still set in the host xml meaning that vGPU is always enabled even on hosts that only support passthrough.

#5982

cirquit · 2022-09-30T15:53:16Z

@JungleCatSW @gavin-cudo Do you have a workaround to actually get PCI passthrough to work on 6.4? We found ourselves in the same boat with an invisible GPU until we commented out the filtering in pci.rb.

We face a similar problem to this ON Forum Post, with the full error message (04:00.00 is our host PCI address):

Fri Sep 30 17:45:23 2022: DEPLOY: Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error: Failed to create domain from /var/lib/one/datastores/110/1814/deployment.0 error: device not found: mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found Could not create domain from /var/lib/one/datastores/110/1814/deployment.0 ExitCode: 255

JungleCatSW · 2022-09-30T18:24:35Z

@cirquit see #5982

You just need to replace one line in each pci.rb and pci.conf

cirquit · 2022-10-04T08:18:48Z

@JungleCatSW Unfortunately, applying this change only fixes the invisibility of the PCI device, but not the passthrough error when booting a new VM with a passthrough GPU (not a vGPU). It looks to me that #5968 and the passthrough problem are related, as ON currently wants to get a mediated device (vGPU) instead of a PCI passthrough device.

Did I maybe miss a configuration from the official PCI passthrough documentation which enables vGPUs by default?

(formatted for clarity, taken from the GUI when spawning a new VM)

Driver Error
Tue Oct 4 10:08:29 2022: DEPLOY:
Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error:
Failed to create domain from /var/lib/one/datastores/110/1821/deployment.0 error: device not found:
mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found
Could not create domain from /var/lib/one/datastores/110/1821/deployment.0
ExitCode: 255

JungleCatSW · 2022-10-04T09:08:18Z

@cirquit We had the same issue, once you have added a host with the old pci.rb and pci.conf the UUID gets stored, so even when you correct them the pci data just gets merged with the old incorrect data.

try running:
$ onehost show -x <hostid>

scroll up and look for the pci section to see if there is a UUID field:

      <PCI>
        ...
        <DEVICE><![CDATA[228b]]></DEVICE>
        <DEVICE_NAME><![CDATA[Device]]></DEVICE_NAME>
        <UUID><![CDATA[              Is this here ?????  ]]></UUID>
        <DOMAIN><![CDATA[0000]]></DOMAIN>
        ...
      </PCI>

The way ONE knows whether it is using passthrough or vGPU is whether the UUID field exists in the PCI section of the host.

one/src/vmm_mad/remotes/lib/kvm/opennebula_vm.rb

Line 218 in 2024f62

if exist? 'UUID'

If you enroll a new host it should work, but to clear an existing host you have to:

delete it
set the pci filter to no devices in pci.conf :filter: '0:0' # no devices
add it
delete it
set the pci filter to all devices or nvidia devices in pci.conf :filter: '*:*' # all devices
add it again

you can use $ onehost show -x <hostid> each time you add it to check the xml is correclty being removed and then refreshed

let me know if that works for you

cirquit · 2022-10-06T08:39:23Z

@JungleCatSW Thanks for the detailed explanation!

It worked out exactly as you said. One interesting detail is when the PCI device was added as a vGPU it did not follow the natural ordering of the device address (04:00.0) in the PCI tab or the onehost show <id> and was always at the end of the list. When added correctly, it follows the ordering.

For other people who find this issue and have problems with GPU PCI passthrough with KVMs, make sure that you have the correct name and group rights on your /dev/vfio/* directory, they have to match the user and group which is defined in /etc/libvirt/qemu.conf on the host, otherwise you will get a permission denied via the ON frontend when accessing the /dev/vfio directory. In my case, it was a chown oneadmin:oneadmin -R /dev/vfio.

Also, in my case, I needed to reduce the memory size of the VM by ~ 2GB compared to a no-PCI-passthrough VM, as I would get an OOM by qemu. In my case, the host and VM became unresponsive via SSH and only got back after a few hours when (I presume) the qemu process was terminated by the OS.

(cherry picked from commit bd4f53e)

vickmp · 2022-10-06T09:39:00Z

The problem should be solved with this patch 3f300f3

The source of the issue comes from forcing the use of vGPU, avoiding the use of the physical GPU for PCI-Passthrough. As @gavin-cudo commented, one of the problems resided here:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
   `ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
      next
end

However, removing those lines makes that both GPUs and vGPUs can be used at the same time, which is not correct.

On the other hand, the configuration modification that @JungleCatSW commented avoids adding the UUID to the GPU device when it works as physical GPU for PCI-Passthrough, but it does not properly manage the use of vGPUs since, as he indicated, OpenNebula use this field in order to use the vGPU.

# The uuid is based on the address to get always the same
if CONF[:nvidia_vendors].include?(dev[:vendor])

With the patch I propose, GPUs and vGPUs should be listed correctly depending on whether GPU virtualization is enabled or not with NVIDIA drivers (as indicated in the official documentation). Similarly, it is also controlled that the UUID is added only to the vGPUs, leaving the GPUs configured as a usual PCI device.

thereiam · 2022-10-09T23:12:49Z

Hello,
I had the same issue and wanted to add here that it worked for me to correct the VM error after applying the patch for pci.rb. As I could not afford to undeploy the host, I managed to remove the UUID attribute using the onedb update-body host --id 0 command.

tinova added this to the Release 6.4.2 milestone Sep 14, 2022

tinova added Category: Drivers - Monitor Status: Pending Type: Bug Priority: Normal Sponsored labels Sep 14, 2022

tinova assigned vickmp Sep 20, 2022

tinova added Status: Pending Status: Accepted and removed Status: Pending labels Sep 20, 2022

tinova assigned xorel Sep 27, 2022

xorel added a commit that referenced this issue Sep 27, 2022

B #5968: Don't skip when mdev_bus doesn't exist

92e6757

rsmontero pushed a commit that referenced this issue Oct 6, 2022

B #5968: Fix GPU/vGPU list and handle (#2299)

3f300f3

(cherry picked from commit bd4f53e)

vickmp added a commit to OpenNebula/docs that referenced this issue Oct 13, 2022

B OpenNebula/one#5968: Fix GPU/vGPU listing and handling

ed6f049

rsmontero pushed a commit to OpenNebula/docs that referenced this issue Oct 13, 2022

B OpenNebula/one#5968: Fix GPU/vGPU listing and handling (#2402)

c57fbe3

tinova modified the milestones: Release 6.4.2, Release 6.4.3 Oct 17, 2022

rsmontero closed this as completed Oct 19, 2022

rsmontero modified the milestones: Release 6.4.3, Release 6.4.2 Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nVidia GPUs not listed when configured for PCI-Passthrough #5968

nVidia GPUs not listed when configured for PCI-Passthrough #5968

gavin-cudo commented Sep 8, 2022 •

edited

Loading

gavin-cudo commented Sep 14, 2022

JungleCatSW commented Sep 28, 2022 •

edited

Loading

cirquit commented Sep 30, 2022

JungleCatSW commented Sep 30, 2022

cirquit commented Oct 4, 2022

JungleCatSW commented Oct 4, 2022 •

edited

Loading

cirquit commented Oct 6, 2022

vickmp commented Oct 6, 2022

thereiam commented Oct 9, 2022

nVidia GPUs not listed when configured for PCI-Passthrough #5968

nVidia GPUs not listed when configured for PCI-Passthrough #5968

Comments

gavin-cudo commented Sep 8, 2022 • edited Loading

gavin-cudo commented Sep 14, 2022

JungleCatSW commented Sep 28, 2022 • edited Loading

cirquit commented Sep 30, 2022

JungleCatSW commented Sep 30, 2022

cirquit commented Oct 4, 2022

JungleCatSW commented Oct 4, 2022 • edited Loading

cirquit commented Oct 6, 2022

vickmp commented Oct 6, 2022

thereiam commented Oct 9, 2022

gavin-cudo commented Sep 8, 2022 •

edited

Loading

JungleCatSW commented Sep 28, 2022 •

edited

Loading

JungleCatSW commented Oct 4, 2022 •

edited

Loading