dkms: fix building nvidia open kernel modules against clang and thin/full lto compiled kernel#417
Conversation
|
Massive thanks for the fix. Can I trouble you for a simple test? Say, for any/all of the Archlinux targets:
|
|
Thank you @evelikov . Besides all the external modules I have compiling and installing normally on my system. It builds and install the dkms_test normally but differs only by outputting the command used to build the module. All seems fine. |
|
Something is causing the exact build command to be printed, instead of the pretty message. This should only happen when the verbose flag is set. Either by:
I'm leaning that there's a config file enabling this. Alternatively... does the test suit work on your end, with normal (non clang/lto) kernel? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
First, sorry about that, right now the arch linux package is sitting on version 3.0.12, I had to build 3.0.13 myself. Now the tests passes but with some caveats: Tests output: dkms_tests_output.txt Here are the changes I made to the run_test.sh (it isn't a suggestion to change to exact this, as some things I had to hardcode because I couldn't get around it): @@ -496,9 +496,9 @@
echo 'Checking modinfo'
run_with_expected_output sh -c "modinfo /lib/modules/${KERNEL_VER}/${expected_dest_loc}/dkms_test.ko${mod_compression_ext} | head -n 4" << EOF
filename: /lib/modules/${KERNEL_VER}/${expected_dest_loc}/dkms_test.ko${mod_compression_ext}
-version: 1.0
-description: A Simple dkms test module
license: GPL
+description: A Simple dkms test module
+version: 1.0
EOF
if (( NO_SIGNING_TOOL == 0 )); then
@@ -621,9 +621,9 @@
echo 'Checking modinfo'
run_with_expected_output sh -c "modinfo /lib/modules/${KERNEL_VER}/${expected_dest_loc}/dkms_test.ko${mod_compression_ext} | head -n 4" << EOF
filename: /lib/modules/${KERNEL_VER}/${expected_dest_loc}/dkms_test.ko${mod_compression_ext}
-version: 1.0
-description: A Simple dkms test module
license: GPL
+description: A Simple dkms test module
+version: 1.0
EOF
if (( NO_SIGNING_TOOL == 0 )); then
@@ -1419,7 +1419,7 @@
Cleaning build area...
Building module(s)...(bad exit status: 2)
Failed command:
-make -j1 KERNELRELEASE=${KERNEL_VER} all
+make -j1 KERNELRELEASE=${KERNEL_VER} all CC=clang OBJCOPY=llvm-objcopy LD=ld.lld
Error! Bad return status for module build on kernel: ${KERNEL_VER} (${KERNEL_ARCH})
Consult /var/lib/dkms/dkms_failing_test/1.0/build/make.log for more information.
dkms autoinstall on ${KERNEL_VER}/${KERNEL_ARCH} failed for dkms_failing_test(10)
@@ -1438,7 +1438,7 @@
Cleaning build area...
Building module(s)...(bad exit status: 2)
Failed command:
-make -j1 KERNELRELEASE=${KERNEL_VER} all
+make -j1 KERNELRELEASE=${KERNEL_VER} all CC=clang OBJCOPY=llvm-objcopy LD=ld.lld
Error! Bad return status for module build on kernel: ${KERNEL_VER} (${KERNEL_ARCH})
Consult /var/lib/dkms/dkms_failing_test/1.0/build/make.log for more information.
dkms autoinstall on ${KERNEL_VER}/${KERNEL_ARCH} failed for dkms_failing_test(10)
@@ -1587,7 +1587,7 @@
Cleaning build area...
Building module(s)...(bad exit status: 2)
Failed command:
-make -j1 KERNELRELEASE=${KERNEL_VER} all
+make -j1 KERNELRELEASE=${KERNEL_VER} all CC=clang OBJCOPY=llvm-objcopy LD=ld.lld
Error! Bad return status for module build on kernel: ${KERNEL_VER} (${KERNEL_ARCH})
Consult /var/lib/dkms/dkms_failing_test/1.0/build/make.log for more information.The explanation, my modinfo shows first the license, then the description and only then the version: And about the variables CC, LD and OBJCOPY, right now the tests doesn't expect to encounter them, this isn't inherently from my PR, as we already been exposing the CC, and LD before (so the tests will fail on a clang compiled kernel even without my patch, so we may have to fix that. I even tried, but the said variables aren't exposed so we can't do something like One suggestion would be to use in the run_test.sh a mechanism similar to what is done in the dkms.in right now to tell if the kernel was compiled using clang. Or we could change the dkms.in make_commands to include the CC, LD, etc. even when compiling using gcc, that would make the changes needed in the run_test.sh minimal. Other than that the tests passes normally. |
|
Don't recall personally seeing varying order of the modinfo output... even tough it was reported before. Let me apply a quick hack for dkms and will set a fix for kmod upstream in a bit. Looking at the make messages, I think we can strip the extra CC/LD/OBJDUMP entries in Let me prep a PR with the above two fixes, then you can rebase (+ hopefully add some tests) your work on top. |
|
PR #422 should address the issues - please give it a test. Thanks in advance. Looking at the kernel documentation - I wonder if we shouldn't just use the |
|
Yes, #422 fixes the failing tests for clang.
Yes that would be better indeed (also no change in the tests would be needed, unless you want to remove the CC, LD, bits). Would you like me to make this changes reflect this PR? We would still have to check for clang to use the correct strip though (even though LLVM=1 sets the STRIP envar). |
Whichever you're comfortable with, really.
We do check the STRIP variable (array really) from the module's dkms.conf and make it an in-dkms variable. Even so we can technically honour the STRIP set by |
da36723 to
69fb5e4
Compare
|
Sorry about that, I was trying not to create a bunch of commits while rebasing my local branch lol 😅
I didn't understand this part, for example, without these changes: - [[ ${strip[$count]} != no ]] && strip -g "$built_module"
+ if [[ ${strip[$count]} != no ]] && [[ ${CC} == "clang" ]]; then
+ llvm-strip -g "$built_module"
+ elif [[ ${strip[$count]} != no ]]; then
+ strip -g "$built_module"
+ fiEven when using the LLVM=1 the wrong strip command will be picked. That's what I meant by still have to check for clang to use the right strip command.
If I understood the above incorrectly or you have any suggestion I'll be more than happy to implement and update this PR 😄 |
|
OK, nvm, I understand now: case ${STRIP[$index]} in
[nN]*)
strip[$index]="no"
;;
[yY]*)
strip[$index]="yes"
;;
'')
strip[$index]=${strip[0]:-yes}
;;
esacThe STRIP envar and the dkms.conf's STRIP array are two different things (got a little confused there). I get it now. Well, if you don't like the approach of using the same logic as before (checking if the module should be stripped): if [[ ${strip[$count]} != no ]] && [[ ${CC} == "clang" ]]; then
llvm-strip -g "$built_module"
elif [[ ${strip[$count]} != no ]]; then
strip -g "$built_module"
fiI can think of something else, but really I don't find it deemed necessary. The result would be the same, if the STRIP of dkms.conf is set "yes" the modules will be stripped, if set to "no", no stripping, and if empty the modules will be stripped, all paths leads to the right strip command being used and the LLVM=1 STRIP being honored no matter what. Now, if I understood all wrong and by "honoring the STRIP set by LLVM=1" you meant that if clang is being used, automatically we will use LLVM=1and inherently STRIP=llvm-strip will be set, meaning that the modules should be stripped by llvm-strip no matter what, I don't think it's a good idea, having control of when to strip modules in dkms.conf is better. (basically the way it's right now). |
Right now there's some issues when trying to compile open-gpu-kernel-modules against a kernel compiled using clang+Thin/Full LTO. Also the script will pick the wrong version of the strip command, is also necessary to set the OBJCOPY environment variable. Fixes dkms-project#416.
|
Sorry for the radio silence - I've been offline for the last few weeks. Will have a look later today/tomorrow and if there's only minor nits, will amend and push. |
|
@evelikov Would be cool, if you could review this in the near future. |
|
@evelikov friendly ping 😅 |
|
Were using this now too on CachyOS as default, mainly due the nvidia-open-dkms issue. Has been used for 1 months and no issues reported. |
|
Thanks for the poke, I was taking a short holiday from dkms. Checking through now, will merge later tonight/early tomorrow.... I'm thinking how we can easily test this. If anyone has patches that would be welcome. |
|
This patch certainly works for me. |
Right now there's some issues when trying to compile open-gpu-kernel-modules against a kernel compiled using Clang and Thin/Full LTO.
Also the script will pick the wrong version of the
stripcommand otherwise the module will fail to compile successfully:Is also necessary to set the
OBJCOPYenvironment variable, otherwise even though the modules will compile the system will refuse to boot on it.Fixes #416.