Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA failed to initialize on Ubuntu #68

Open
alextarra opened this issue Jan 8, 2022 · 4 comments
Open

CUDA failed to initialize on Ubuntu #68

alextarra opened this issue Jan 8, 2022 · 4 comments

Comments

@alextarra
Copy link

I have a working LC0 chess engine with the Cuda backend. Ceres starts, but failes to initialize Cuda library.

| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |

Could it be just some variable setting missing maybe?

@dje-dev
Copy link
Owner

dje-dev commented Jan 15, 2022

CUDA 11.5 should work fine. Most testing is done on Windows under CUDA 11.5, and Linux under 11.4, but Linux under 11.5 should work. Please indicate the exact version of Ceres being used, the GPU model, and the specific error message.

@alextarra
Copy link
Author

alextarra commented Jan 16, 2022 via email

@alextarra
Copy link
Author

Tried version 0.96 compiled from source code locally, same issue.
image

@alextarra
Copy link
Author

New version of Cuda and Ceres, same old issue. I had to manually convert all projects to Net6.0 to get running.

|=========================================================|
| Ceres - A Monte Carlo Tree Search Chess Engine |
| |
| (c) 2020- David Elliott and the Ceres Authors |
| With network backend code from Leela Chess Zero. |
| Use help to list available commands. |
| |
| Version 0.97RC3 with PGO: NA |
| Runtime .NET 6.0.10 and Cuda 11.80 |
|=========================================================|

Ceres user settings loaded from file /home/alex/temp/Ceres-0.97RC3/artifacts/release/net6.0/Ceres.json

Network evaluation configured to use:

Entering UCI command processing mode.
go

Loaded network weights: 0 10x128 WDL MLH from ./Networks/weights_run2_703810.pb.gz

CUDA device 0: NVIDIA GeForce GTX 970 Compute: 5.2 SMs: 13 Mem: 3gb
Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone
ErrorInvalidPtx
at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage, CUJITOption[] options, Object[] values)
at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String kernelName)
at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly, CudaContext context, String resource, String kernelName) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Base/CUDA/CUDADevice.cs:line 114
at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext context) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 152
at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext context, Int32 deviceComputeCapabilityMajor, Net net, LC0LegacyWeights weights, Boolean saveActivations, NNBackendCUDALayers referenceLayers) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 141
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 357
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 276
CUDA device 0: NVIDIA GeForce GTX 970 Compute: 5.2 SMs: 13 Mem: 3gb
Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone
ErrorInvalidPtx
at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage, CUJITOption[] options, Object[] values)
at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String kernelName)
at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly, CudaContext context, String resource, String kernelName) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Base/CUDA/CUDADevice.cs:line 114
at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext context) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 152
at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext context, Int32 deviceComputeCapabilityMajor, Net net, LC0LegacyWeights weights, Boolean saveActivations, NNBackendCUDALayers referenceLayers) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 141
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 357
at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 276
Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.) (Object reference not set to an instance of an object.)
---> System.NullReferenceException: Object reference not set to an instance of an object.
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat batch) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 308
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 214
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 225
at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149
at Ceres.MCTS.Params.NNEvaluatorSet.b__18_3() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.MCTS/Iteration/Params/NNEvaluatorSet.cs:line 145
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.WaitAll(Task[] tasks)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
--- End of stack trace from previous location ---
at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
at Ceres.Features.UCI.UCIManager.InitializeEngineIfNeeded() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Features/UCI/UCIManager.cs:line 616
at Ceres.Features.UCI.UCIManager.PlayUCI() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Features/UCI/UCIManager.cs:line 337
at Ceres.Commands.DispatchCommands.ProcessCommand(String cmd) in /home/alex/temp/Ceres-0.97RC3/src/Ceres/Commands/DispatchCommands.cs:line 74
at Ceres.Program.Main(String[] args) in /home/alex/temp/Ceres-0.97RC3/src/Ceres/Program.cs:line 103
---> (Inner Exception #1) System.NullReferenceException: Object reference not set to an instance of an object.
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat batch) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 308
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 214
at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 225
at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149
at Ceres.MCTS.Params.NNEvaluatorSet.b__18_4() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.MCTS/Iteration/Params/NNEvaluatorSet.cs:line 146
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---

Aborted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants