Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nVidia GPUs not listed when configured for PCI-Passthrough #5968

Closed
gavin-cudo opened this issue Sep 8, 2022 · 9 comments
Closed

nVidia GPUs not listed when configured for PCI-Passthrough #5968

gavin-cudo opened this issue Sep 8, 2022 · 9 comments

Comments

@gavin-cudo
Copy link

gavin-cudo commented Sep 8, 2022

Description
nVidia GPUs are not listed under a PCI devices on a host configured with PCI-Passthrough.

To Reproduce
Configure a host with nVidia GPUs for PCI-Passthrough as per the documentation at https://docs.opennebula.io/6.4/open_cluster_deployment/kvm_node/pci_passthrough.html

Set the filter under /var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf on the frontend to be:

:filter:
  - '*:*'
:short_address: []
:device_name: []

Expected behavior
All PCI devices are listed under a onehost show <host_id> including nVidia GPUs.

Actual behavior
All PCI devices are listed except nVidia GPUs.

Details

  • Hypervisor: KVM
  • Version: 6.4.0 CE and 6.4.0 Enterprise

Additional context
GPUs are listed fine on the host with:

lspci -nn -d 10de:*
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
02:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
81:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
81:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
82:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
82:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
83:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
83:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
84:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c2:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c3:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c3:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c4:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c4:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)

vfio driver is confirmed working as seen below:

lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 147e
	Flags: fast devsel, IRQ 255, NUMA node 0
	Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 38060000000 (64-bit, prefetchable) [size=256M]
	Memory at 38070000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at f5000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Resizable BAR <?>
	Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00] Lane Margining at the Receiver <?>
	Capabilities: [e00] Data Link Feature <?>
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau
cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.4.0-125-generic root=UUID=c01382e3-cd50-4103-a06c-576e6bafe9ce ro iommu=pt amd_iommu=on amdgpu.runpm=0 kvm_amd.sev=1 modprobe.blacklist=nouveau nouveau.modeset=0 nouveau.runpm=0 nvidia-drm.modeset=1 pcie_aspm=off radeon.modeset=0 radeon.runpm=0 vfio-pci vfio_iommu_type1.allow_unsafe_interrupts=1

The above configuration was known to be working on version 6.2.0.

@gavin-cudo
Copy link
Author

After further investigation, commenting out lines 114 to 118 in https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/node-probes.d/pci.rb#L114 allows the GPUs to be listed (albeit without the names showing, only IDs).

The offending lines are:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
`ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
next
end

So it looks like a bug introduced in 6.4 when vGPU support was added with commit 7f71959

@JungleCatSW
Copy link
Contributor

JungleCatSW commented Sep 28, 2022

@xorel can you review the changes on this branch as it fixes a further issue where VMs fail as the UUID is still set in the host xml meaning that vGPU is always enabled even on hosts that only support passthrough.

#5982

@cirquit
Copy link

cirquit commented Sep 30, 2022

@JungleCatSW @gavin-cudo Do you have a workaround to actually get PCI passthrough to work on 6.4? We found ourselves in the same boat with an invisible GPU until we commented out the filtering in pci.rb.

We face a similar problem to this ON Forum Post, with the full error message (04:00.00 is our host PCI address):

Fri Sep 30 17:45:23 2022: DEPLOY: Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error: Failed to create domain from /var/lib/one/datastores/110/1814/deployment.0 error: device not found: mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found Could not create domain from /var/lib/one/datastores/110/1814/deployment.0 ExitCode: 255

@JungleCatSW
Copy link
Contributor

@cirquit see #5982

You just need to replace one line in each pci.rb and pci.conf

@cirquit
Copy link

cirquit commented Oct 4, 2022

@JungleCatSW Unfortunately, applying this change only fixes the invisibility of the PCI device, but not the passthrough error when booting a new VM with a passthrough GPU (not a vGPU). It looks to me that #5968 and the passthrough problem are related, as ON currently wants to get a mediated device (vGPU) instead of a PCI passthrough device.

Did I maybe miss a configuration from the official PCI passthrough documentation which enables vGPUs by default?

(formatted for clarity, taken from the GUI when spawning a new VM)

Driver Error
Tue Oct 4 10:08:29 2022: DEPLOY:
Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error:
Failed to create domain from /var/lib/one/datastores/110/1821/deployment.0 error: device not found:
mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found
Could not create domain from /var/lib/one/datastores/110/1821/deployment.0
ExitCode: 255

@JungleCatSW
Copy link
Contributor

JungleCatSW commented Oct 4, 2022

@cirquit We had the same issue, once you have added a host with the old pci.rb and pci.conf the UUID gets stored, so even when you correct them the pci data just gets merged with the old incorrect data.

try running:
$ onehost show -x <hostid>

scroll up and look for the pci section to see if there is a UUID field:

      <PCI>
        ...
        <DEVICE><![CDATA[228b]]></DEVICE>
        <DEVICE_NAME><![CDATA[Device]]></DEVICE_NAME>
        <UUID><![CDATA[              Is this here ?????  ]]></UUID>
        <DOMAIN><![CDATA[0000]]></DOMAIN>
        ...
      </PCI>

The way ONE knows whether it is using passthrough or vGPU is whether the UUID field exists in the PCI section of the host.

If you enroll a new host it should work, but to clear an existing host you have to:

  • delete it
  • set the pci filter to no devices in pci.conf :filter: '0:0' # no devices
  • add it
  • delete it
  • set the pci filter to all devices or nvidia devices in pci.conf :filter: '*:*' # all devices
  • add it again

you can use $ onehost show -x <hostid> each time you add it to check the xml is correclty being removed and then refreshed

let me know if that works for you

@cirquit
Copy link

cirquit commented Oct 6, 2022

@JungleCatSW Thanks for the detailed explanation!

It worked out exactly as you said. One interesting detail is when the PCI device was added as a vGPU it did not follow the natural ordering of the device address (04:00.0) in the PCI tab or the onehost show <id> and was always at the end of the list. When added correctly, it follows the ordering.

For other people who find this issue and have problems with GPU PCI passthrough with KVMs, make sure that you have the correct name and group rights on your /dev/vfio/* directory, they have to match the user and group which is defined in /etc/libvirt/qemu.conf on the host, otherwise you will get a permission denied via the ON frontend when accessing the /dev/vfio directory. In my case, it was a chown oneadmin:oneadmin -R /dev/vfio.

Also, in my case, I needed to reduce the memory size of the VM by ~ 2GB compared to a no-PCI-passthrough VM, as I would get an OOM by qemu. In my case, the host and VM became unresponsive via SSH and only got back after a few hours when (I presume) the qemu process was terminated by the OS.

rsmontero pushed a commit that referenced this issue Oct 6, 2022
@vickmp
Copy link
Member

vickmp commented Oct 6, 2022

The problem should be solved with this patch 3f300f3

The source of the issue comes from forcing the use of vGPU, avoiding the use of the physical GPU for PCI-Passthrough. As @gavin-cudo commented, one of the problems resided here:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
   `ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
      next
end

However, removing those lines makes that both GPUs and vGPUs can be used at the same time, which is not correct.

On the other hand, the configuration modification that @JungleCatSW commented avoids adding the UUID to the GPU device when it works as physical GPU for PCI-Passthrough, but it does not properly manage the use of vGPUs since, as he indicated, OpenNebula use this field in order to use the vGPU.

# The uuid is based on the address to get always the same
if CONF[:nvidia_vendors].include?(dev[:vendor])

With the patch I propose, GPUs and vGPUs should be listed correctly depending on whether GPU virtualization is enabled or not with NVIDIA drivers (as indicated in the official documentation). Similarly, it is also controlled that the UUID is added only to the vGPUs, leaving the GPUs configured as a usual PCI device.

@thereiam
Copy link

thereiam commented Oct 9, 2022

Hello,
I had the same issue and wanted to add here that it worked for me to correct the VM error after applying the patch for pci.rb. As I could not afford to undeploy the host, I managed to remove the UUID attribute using the onedb update-body host --id 0 command.

vickmp added a commit to OpenNebula/docs that referenced this issue Oct 13, 2022
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Oct 13, 2022
@tinova tinova modified the milestones: Release 6.4.2, Release 6.4.3 Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment