nvidia-bug-report.sh requested for further information regarding why the script failed. #1

KrutavShah · 2021-02-05T05:43:01Z

Hello, the README says that the script is currently not working right now, so I would like if you can run nvidia-bug-report.sh to generate a report of all the messages and errors put out by vgpu manager and vgpud. This will help in figuring out what went wrong. Thank you.

The text was updated successfully, but these errors were encountered:

DualCoder · 2021-02-13T14:59:02Z

Here is the requested file: nvidia-bug-report.log.gz

The interesting section is this:

Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 38512783-4893-47f7-9179-b0594167e86b GPU PCI id 00:01:00.0 config params vgpu_type_id=50
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=50
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: notice: vmiop_env_log: Successfully updated env symbols!
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: NVOS status 0x56
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: Assertion Failed at 0xf69873bf:293
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: 11 frames returned by backtrace
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv005021vgpu+0x18) [0x7ff3f69cc3c8]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa3e3b) [0x7ff3f6982e3b]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa83bf) [0x7ff3f69873bf]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa98c7) [0x7ff3f69888c7]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x413e72]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x4140e9]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x40e9d7]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x40c2c9]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x40bc7c]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7ff3f6e7109b]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: vgpu() [0x4033ba]
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (error setting vGPU configuration information from RM)
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_log: display_init failed for inst: 0
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_env_log: (0x0): vmiope_process_configuration: plugin registration error
Feb 13 15:20:08 Debian-dom0 nvidia-vgpu-mgr[1562]: error: vmiop_env_log: (0x0): vmiope_process_configuration failed with 0x1f
Feb 13 15:20:08 Debian-dom0 kernel: [nvidia-vgpu-vfio] 38512783-4893-47f7-9179-b0594167e86b: start failed. status: 0x1

According to nvstatuscodes.h the status code 0x56 is NV_STATUS_CODE(NV_ERR_NOT_SUPPORTED, 0x00000056, "Call not supported"). As far as I can tell this is returned by the code inside the nv-kernel.o_binary file and I have not been able to figure out why.

KrutavShah · 2021-02-13T16:43:58Z

error 1 (error setting vGPU configuration information from RM)
As far as I know, this is a pretty typical error when you’re using the wrong graphics card. Because Nvidia only officially supports Red Hat Linux, I used that for these tests and ran the Red Hat hypervisor on top of a KVM hypervisor. The level 1 hypervisor spoofed the PCI ID for the Red Hat to be able to detect a “Tesla P4,” and what happens is that instead of getting loads of errors, I usually get the same error 1. So far, I haven’t looked at your whole bug report, but I will compare it to some of my previous testing and try to dig up a few details. Right now I have a feeling that it has to do with ECC memory, a feature that has to be disabled for vGPU to work in any way. However on GeForce, you can’t turn on or off ECC so there needs to be some additional modifications. I’ll let you know when more information surfaces.

DualCoder · 2021-02-22T22:08:06Z

I am closing this since the bug report has been provided and the new README explains what causes the failure.

DualCoder closed this as completed Feb 22, 2021

dgyulaid mentioned this issue Jul 4, 2021

GTX970 on Proxmox #57

Closed

ArsBinarii mentioned this issue Aug 21, 2021

can't unlock GP104 gtx1080 #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-bug-report.sh requested for further information regarding why the script failed. #1

nvidia-bug-report.sh requested for further information regarding why the script failed. #1

KrutavShah commented Feb 5, 2021

DualCoder commented Feb 13, 2021

KrutavShah commented Feb 13, 2021 •

edited

Loading

DualCoder commented Feb 22, 2021

nvidia-bug-report.sh requested for further information regarding why the script failed. #1

nvidia-bug-report.sh requested for further information regarding why the script failed. #1

Comments

KrutavShah commented Feb 5, 2021

DualCoder commented Feb 13, 2021

KrutavShah commented Feb 13, 2021 • edited Loading

DualCoder commented Feb 22, 2021

KrutavShah commented Feb 13, 2021 •

edited

Loading