-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA version number truncated #3893
Comments
Confirmed.
The problem, as confirmed here, lies in the server comparison of the reported/truncated version number, with the minimum version required by the project. |
Unfortunately, it's not so easy to fix: |
Yes, I see (from sched_request...) that I'm now reporting
Surely we should have designed-in some form of consistency? |
Hm, I believe I found the reason. I can make a fix to set proper in request but it will still shows incorrect version (two numbers after point instead of three) in log.... |
Yes, I have the hardware, and I'm currently running v7.17.0 from @LocutusOfBorg PPA. I also have the ability to compile (client-only) from source. The mis-matched driver versions are only causing a problem at Asteroids. My other GPU project - GPUGrid - doesn't have a problem with work fetch, I'm assuming because it doesn't have such a stringent minimum driver version number test to satisfy. But we can't easily see what the plan_class requirements are, without the assistance of a project administrator. |
I wonder, Since we have both the CUDA and OpenCL version number of the
driver and both will be the same as you cannot have two different driver
versions on your system, can't we do a sanity check with that and if found
that CUDA is truncated again use the OpenCL version number?
|
@Ageless93, in case of difference, whom we have to believe? |
@AenBleidd, I think the higher number, especially since CUDA truncating happened before. But the sanity check can go either way. |
The basic problem is that BOINC encodes the driver version into an int of the form MMmm: The solution is to store the version as a string everywhere. We should do this for all version numbers, not just video drivers. |
The client handles it properly when reporting its own version number to a server: separate fields for major, minor, revision. We could use that, or we could use a single string with dividers between the fields. But there's far too much scope for errors over time with assumed fixed-width numeric fields. https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#GPUapps is not as you say: we already have |
And although not documented, I'm also already reporting
|
Possible suggestion for a temporary workround, while we design a comprehensive solution: Cap the client reporting of the CUDA minor version at 99 That would prevent a future project requirement of a minor version with three digits (but that's simply an incentive to finish doing the proper job in a timely fashion). Reporting the current NVidia driver 440.100 as 44099 in RPCs would at least allow Asteroids@Home to issue work again. |
Change minor version to 99 if actual minor version is > 99 This fixes BOINC#3893 Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
@RichardHaselgrove, @davidpanderson, fix is ready for review and testing |
Hi, I just want to let you know that version numbers of Nvidia Linux drivers can also have a "Releaseversion". I stumbled upon this lately when I had to configure correct drivers for a Tensorflow project. Here's a sample output of one of these boxes:
Please also see the "CUDA Toolkit and Compatible Driver Versions" table here: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions (you have to scroll down a little bit because this table has no html anchor). |
@cwallbaum, could you please verify how BOINC reports this version on this particular machine? |
It's not the latest, but at least not a very old build from Gianfranco's PPA:
|
I believe it's fine enough. @cwallbaum, thanks for quick test |
The thing that strikes me though is that here too OpenCL reports the whole driver version, 440.33.01 vs 440.33 for CUDA. |
@Ageless93, there is a possibility that OpenCL driver could be not installed on the system |
As reported via the BOINC forums, it looks as if the client (7.6.16, Linux version) is truncating the CUDA driver version:
440.10 is a different driver version than 440.100
According to an earlier thread, it's been doing this for a while, as there it was BOINC 7.9.3 that made 390.13 from 390.132
The text was updated successfully, but these errors were encountered: