Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eGPU kernel modules failure - Chipset Setup Function Error! #568

Open
2 tasks done
KernelPryanic opened this issue Oct 31, 2023 · 3 comments
Open
2 tasks done

eGPU kernel modules failure - Chipset Setup Function Error! #568

KernelPryanic opened this issue Oct 31, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@KernelPryanic
Copy link

KernelPryanic commented Oct 31, 2023

NVIDIA Open GPU Kernel Modules Version

535.113.01

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Fedora release 38 (Thirty Eight)

Kernel Release

Linux fedora 6.5.8-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 20 15:53:48 UTC 2023 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA Corporation GA104 [GeForce RTX 3070 Ti] (rev a1)

Describe the bug

I'm trying to use the open version of Nvidia driver because of RmInitAdapter failed! issue with the proprietary one, but I'm getting errors from the kernel. In the attached logs artifact it's around 7000 line.

Oct 31 17:53:41 fedora kernel: NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
Oct 31 17:53:44 fedora kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:52:00.0 on minor 1
Oct 31 17:53:44 fedora systemd[1]: nvidia-fallback.service - Fallback to nouveau as nvidia did not load was skipped because of an unmet condition check (ConditionPathExists=!/sys/module/nvidia).
Oct 31 17:54:06 fedora kernel: NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000)
Oct 31 17:54:06 fedora kernel: NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197

Laptop specs:

  • Manufacturer: LENOVO
  • Product Name: 21CBCTO1WW
  • Version: ThinkPad X1 Carbon Gen 10
  • BIOS version: N3AET77W (1.42)

GRUB params:

GRUB_CMDLINE_LINUX="resume=/dev/mapper/vg--main-swap rd.luks.uuid=luks-dbbd65e4-65f3-4956-85f0-8d9e919e733c rd.lvm.lv=vg-main/root rd.lvm.lv=vg-main/swap rhgb quiet nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1"

Related Nvidia thread: https://forums.developer.nvidia.com/t/driver-cant-detect-egpu/271201

To Reproduce

Boot the latest Fedora kernel with the open source driver and eGPU.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report-535-open.log.gz

More Info

eGPU works with Windows and with nouveau driver.

@KernelPryanic KernelPryanic added the bug Something isn't working label Oct 31, 2023
@ttabi
Copy link

ttabi commented Nov 1, 2023

"Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver."

According to the forum you linked, this problem also occurs with the proprietary driver, correct?

@KernelPryanic
Copy link
Author

KernelPryanic commented Nov 1, 2023

@ttabi I cannot confirm that this issue also occurs with the proprietary driver, because the proprietary driver has the different issue RmInitAdapter failed! (0x26:0x56:1482).

@FlyGoat
Copy link

FlyGoat commented Dec 17, 2023

I'm running into similar problem on my eGPU setup and tried to debug a little bit.

NVRM objClInitPcieChipset: *** Chipset Setup Function Error!

Is not fatal at all, so the real problem is:

NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000)
Oct 31 17:54:06 fedora kernel: NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197

Which means emulated x86 call to VGABios failed.
I enabled #define IO_LOG(port, val) in vbioscall.c and it seems like it's trying to access legacy vga io/mem resource, which is broken on my eGPU platform.

My workaround is simply comment out primay_vga detection logic.

diff --git a/src/nvidia/arch/nvalloc/unix/src/dynamic-power.c b/src/nvidia/arch/nvalloc/unix/src/dynamic-power.c
index 934bff1..c6e97c2 100644
--- a/src/nvidia/arch/nvalloc/unix/src/dynamic-power.c
+++ b/src/nvidia/arch/nvalloc/unix/src/dynamic-power.c
@@ -951,11 +951,13 @@ void NV_API_CALL rm_init_dynamic_power_management(
     // Legacy case: check if device is primary and driven by VBIOS or fb driver.
     nv->primary_vga = NV_FALSE;
 
+#if 0
     //
     // Below function always return NV_OK and depends upon kernel flags
     // IORESOURCE_ROM_SHADOW & PCI_ROM_RESOURCE for Primary VGA detection.
     //
     nv_set_primary_vga_status(nv);
+#endif
 
     // UEFI case: where console is driven by GOP driver.
     bUefiConsole = rm_get_uefi_console_status(nv);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants