New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.8 rc1: Module is failing to build #594
Comments
Okay, it appears that the GPL error is also present on 545, if patching the Would be cool, if you would fix this with the upcoming release and including the 4070 Super support so that i can finally use the 545.xx drivers :) |
I'm sorry, but we don't accept bug reports -rc kernels. Unfortunately, the template for reporting build bugs is missing this clause and checkmark: Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels. Please close this bug and try again when 6.8 is released. Nvidia already does testing with -rc kernels and we have fixes for these already in the queue. |
Tabi, don't take me wrong but there will be multiple people which will run into this issue, when they are doing the kernel testing. It is good, that NVIDIA is doing internal testing with the RC Kernel, but we don't have access to your roadmap or issuetracker, so it should be good to have this publicly visible. Also, I have reported in 4 different kernel versions in the past and it just kept be open and it was fine. Feel free to close it, if you want to do so. Edit: Edit2: |
The reason we don't want bug reports for -rc kernels is because we already know about the issues. The problem exists with every kernel, so it's just noise to file these reports. This bug tracker should be used only for bugs that Nvidia doesn't already know about. Also, we are in the process of fixing the template. |
there is so much to to this that i barely know where to begin, so nvidias official take is kernel development is not support on nvidia devices? you either wait until nvidia release a driver 3 months down the road or hunt on obscure reddit posts for user workarounds to get it running, "bugs Nvidia doesnt already know about." well wow if only you didnt keep such things internal and actually disclosed your known bugs or even responded to things in your own dev forum about it. this would solve itself in many cases if your bugs wasnt internal so we actually knew what you know... |
If they have already internally a patch, they could share it to the community. It was mostly never something required to have changes on the precompiled utils binaries. @NVIDIA Might just consider a own repo, where you would provide the patches. |
The reason for the policy against reporting bugs on -rc kernels is to avoid pointless arguments like this one. Reporting bugs like these just waste everyone's time. Nvidia already aggressively tests -rc kernels and even linux-next. So we sometimes update our code for changes that haven't made it into an -rc kernel yet. However, we can't catch everything, and we do have a development process that we stick to. So you'll just have to be patient. |
Is this patch for the kernel or the driver? |
The following patch needs to be applied on the kernel, this is required for 545 as well as 550.
This one seems to help for the 545 driver and 6.8: We didnt check, how they did fix it in the 550 series yet, but the above workaround works. |
Since I can't compile the kernel yet (which is dangerous for a noob like me), I request you to submit this workaround to the official kernel, or patch the nvidia driver itself to fix the problem. |
Correct: the The Are you still seeing problems with |
Hi @aritger , yes, i compiled today 6.8 rc2 and the rcu_read_unlock and rcu_read_lock GPL issue still affects the 550 drivers. DRM_UNLOCKED got fixed in the NVIDIA 550 and the BUG_func got also fixed in rc2. |
Thanks, @ptr1337. Can you tell if the __rcu_read_unlock/__rcu_read_lock problem is new with 6.8 rc's versus 6.7? Are you using the same kconfig between 6.7 and 6.8-rcN? I vaguely recall some scenarios in the past where some debug kconfig knobs (maybe CONFIG_DEBUG_KMEMLEAK?) cause common utility macros to call EXPORT_SYMBOL_GPL __rcu_read_unlock/__rcu_read_lock functions. If you're hitting the problem that I'm remembering, then the options were: (a) disable whatever kconfig knob causes the indirect calls to __rcu_read_unlock/__rcu_read_lock. (b) use the open kernel modules, rather than the closed kernel modules. If what you're seeing is new with 6.8-rcN, then I'd like to investigate further. |
Oh, sorry: we actually do have a new problem in the 6.8-rcs with __rcu_read_unlock/__rcu_read_lock, due to use of the macro pfn_valid() which in turn calls those EXPORT_GPL_SYMBOLS. It required a bit of detangling, but our next 550 release should remove the pfn_valid() use. Thanks for your patience. |
Hi! Great to hear, that it will be fixed in the upcoming 550 Release (likely with the 4080 Super launch?). Currently we patch for the RC Kernel Looking forward for the next release. 6.7 is completly fine btw. |
Any ETA for the release with the pfn_valid() fix? |
Now it happens with linux 6.7.3 too. |
Now same problem with Linux 6.1.76 as well. |
This commit now breaks binary NVIDIA drivers on Linux 6.7.3, 6.6.15 and 6.1.76:
A new beta couldn't come earlier, please. |
I can confirm this is reproducible with |
You can "workaround" it with the above patch |
Thank you! I saw the patch. Just wanted to raise the issue for 6.6.x as well since the initial discussion was about not accepting bug reports for -rc kernels. |
Kinda off-topic for the open source variant (not affected that I can see), but for those that need a patch for the GPL __rcu_read_lock issue applied to the drivers themselves rather than kernel using the blob variant, see: https://forums.developer.nvidia.com/t/280908/19 Unsure how NVIDIA is planning to fix this, but that's the simplest patch I could come up with as a quick fix. |
A Gentoo developer wrote a patch which doesn't involve patching the kernel (which many can't do and which Fedora/RedHat have outright refused to): https://bugs.gentoo.org/923456 Courtesy of Ionen Wolkens, Gentoo: |
550.54.14 drivers have just been released: https://www.nvidia.com/Download/driverResults.aspx/218826/en-us/ I wonder if the issue has been addressed. There's nothing in the release notes. Indeed the issue has seemingly been addressed! Hooray! Thanks a lot! NVIDIA has also updated the 470th branch for older GPUs. |
Yes, it appears to be fixed. Will close this now. |
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
Patch out b448de2 "mm/sparsemem: fix race in accessing memory_section->usage" This commit causes build failures for NVIDIA drivers. These errors manifest as "ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'" Note that this behavior is behind CONFIG_SPARSEMEM_VMEMMAP which is enabled in CBL-Mariner 2.0 Tracking github issue: NVIDIA/open-gpu-kernel-modules#594
Patch out b448de2 "mm/sparsemem: fix race in accessing memory_section->usage" This commit causes build failures for NVIDIA drivers. These errors manifest as "ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'" Note that this behavior is behind CONFIG_SPARSEMEM_VMEMMAP which is enabled in CBL-Mariner 2.0 Tracking github issue: NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
This fixes compilation issues with Nvidia kernel modules as introduced by commit 3a01daace71b521563c38bbbf874e14c3e58adb7. Technically the Nvidia modules need to be updated for this, but this should get them building for now. References: - https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908 - NVIDIA/open-gpu-kernel-modules#594
NVIDIA Open GPU Kernel Modules Version
545.29.06
Operating System and Version
CachyOS (ArchLinux based)
Kernel Release
6.8.0rc1
Build Command
Terminal output/Build Log
More Info
Error here:
Just for Info:
535 seems also affected and fails due GPL Symbol Error:
Can be "workarounded" with following patch at the kernel, but this is license violation:
The text was updated successfully, but these errors were encountered: