-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA failed to initialize on Ubuntu #68
Comments
CUDA 11.5 should work fine. Most testing is done on Windows under CUDA 11.5, and Linux under 11.4, but Linux under 11.5 should work. Please indicate the exact version of Ceres being used, the GPU model, and the specific error message. |
Initially I tried 0.94 with from the provided Releases, but I am
reproducing the same error using the latest 0.95-rc8 as of the time of
writing this response. I am pulling the latest code and compiling it in
Debug mode.
nvidia-smi gives me this information:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 50% 49C P5 25W / 250W | 529MiB / 4096MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1263 G /usr/lib/xorg/Xorg 173MiB |
| 0 N/A N/A 1680 G /usr/bin/gnome-shell 59MiB |
....
+-----------------------------------------------------------------------------+
(so, I upgraded to 11.6 since the initial reporting of the problem)
Nibbler with LC0 works fine:
![image](https://user-images.githubusercontent.com/97351553/149645798-9a1ce27c-b6f6-488b-b2c5-93cb106ed384.png)
I ran ./Ceres SYSBENCH and got this output:
|=========================================================|
| Ceres - A Monte Carlo Tree Search Chess Engine |
| |
| (c) 2020- David Elliott and the Ceres Authors |
| With network backend code from Leela Chess Zero. |
| Use help to list available commands. |
| |
| Version 0.95-RC8 with PGO: NA |
| Runtime .NET 5.0.13 and Cuda 11.60 |
|=========================================================|
Ceres user settings loaded from file
/home/alex/src/Ceres/artifacts/debug/net5.0/Ceres.json
…-----------------------------------------------------------------------------------
CPU BENCHMARK
418,801 ops/second, 0 bytes alloc/op : MGPosition.FromPosition
94,264 ops/second, 1,127 bytes alloc/op : MGChessPositionFromFEN
15,142,286 ops/second, 0 bytes alloc/op :
MGChessMoveToLZPositionMove
6,848,370 ops/second, 0 bytes alloc/op : ZobristHash
CERES CPU BENCHMARK SCORE: 14
-----------------------------------------------------------------------------------
GPU BENCHMARK (benchmark net: LC0:/home/alex/Programs/lc0/weights_611245)
ID Name Ver SMClk GPU% Mem% Temp Throttle
Reasons NPS 1 NPS Batch
-- ----------------------- --- ----- ---- ---- ----
---------------- ----- ---------
CUDA device 0: NVIDIA GeForce GTX 970 SMs: 13 Mem: 3gb
Error when initializing CUDA. Did you install NVidia's CUDA?
https://developer.nvidia.com/cuda-zone
ErrorInvalidPtx
at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage,
CUJITOption[] options, Object[] values)
at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String
kernelName)
at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly,
CudaContext context, String resource, String kernelName) in
/home/alex/src/Ceres/src/Ceres.Base/CUDA/CUDADevice.cs:line 115
at Ceres.Base.CUDA.CUDADevice.GetKernel(Assembly assembly, String
resource, String kernelName) in
/home/alex/src/Ceres/src/Ceres.Base/CUDA/CUDADevice.cs:line 94
at
Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext
context) in
/home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line
121
at
Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext
context, Net net, LC0LegacyWeights weights, Boolean saveActivations,
NNBackendCUDALayers referenceLayers) in
/home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line
111
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in
/home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line
352
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net
net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming,
Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA
referenceBackend) in
/home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line
274
Unhandled exception. System.NullReferenceException: Object reference not
set to an instance of an object.
at
Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat
batch) in
/home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line
309
at
Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat
positions, Int32 numPositions, Boolean retrieveSupplementalResults) in
/home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line
214
at
Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat
positions, Boolean retrieveSupplementalResults) in
/home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line
225
at
Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat
positions, Boolean retrieveSupplementalResults) in
/home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149
at Ceres.Chess.NNEvaluators.NNEvaluatorBenchmark.EstNPS(NNEvaluator
evaluator, Boolean computeBreaks, Int32 bigBatchSize, Boolean
estimateSingletons, Int32 numWarmups) in
/home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/NNEvaluatorBenchmark.cs:line
120
at Ceres.Commands.FeatureBenchmark.DumpGPUBenchmark() in
/home/alex/src/Ceres/src/Ceres/Commands/FeatureBenchmark.cs:line 118
at Ceres.Commands.FeatureBenchmark.DumpBenchmark() in
/home/alex/src/Ceres/src/Ceres/Commands/FeatureBenchmark.cs:line 46
at Ceres.Commands.DispatchCommands.ProcessCommand(String cmd) in
/home/alex/src/Ceres/src/Ceres/Commands/DispatchCommands.cs:line 212
at Ceres.Program.Main(String[] args) in
/home/alex/src/Ceres/src/Ceres/Program.cs:line 105
Aborted (core dumped)
And here's the error log from Nibbler, running the same version:
![image](https://user-images.githubusercontent.com/97351553/149645821-d0d5e6e4-000c-4156-af1d-aaace3a240df.png)
Sincerely,
Alex Tarra
On Sat, Jan 15, 2022 at 5:57 AM dje-dev ***@***.***> wrote:
CUDA 11.5 should work fine. Most testing is done on Windows under CUDA
11.5, and Linux under 11.4, but Linux under 11.5 should work. Please
indicate the exact version of Ceres being used, the GPU model, and the
specific error message.
—
Reply to this email directly, view it on GitHub
<#68 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXGXPAPY36EAS2NFIWYSQYDUWFOJPANCNFSM5LQXMN6A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
New version of Cuda and Ceres, same old issue. I had to manually convert all projects to Net6.0 to get running. |=========================================================| Ceres user settings loaded from file /home/alex/temp/Ceres-0.97RC3/artifacts/release/net6.0/Ceres.json Network evaluation configured to use: Entering UCI command processing mode. Loaded network weights: 0 10x128 WDL MLH from ./Networks/weights_run2_703810.pb.gz CUDA device 0: NVIDIA GeForce GTX 970 Compute: 5.2 SMs: 13 Mem: 3gb Aborted |
I have a working LC0 chess engine with the Cuda backend. Ceres starts, but failes to initialize Cuda library.
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
Could it be just some variable setting missing maybe?
The text was updated successfully, but these errors were encountered: