Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when training a classification image model: error == cudaSuccess (48 vs. 0) #2069

Closed
PabloDIGITS opened this issue Jun 21, 2018 · 2 comments

Comments

@PabloDIGITS
Copy link

Hi, I'm trying to train a classification image model, and I'm using digits from a container with docker and NVIDIA GPU CLOUD. The error log that Digits shows is:

Test net output #2: loss = 0 (* 1 = 0 loss)
Test net output #3: loss1/accuracy = 0
Test net output #4: loss1/accuracy-top5 = 0
Test net output #5: loss1/loss = 0 (* 0.3 = 0 loss)
Test net output #6: loss2/accuracy = 0
Test net output #7: loss2/accuracy-top5 = 0
Test net output #8: loss2/loss = 0 (* 0.3 = 0 loss)
Initial Test completed in 0.0554059s
Restarting 4 internal thread(s) on device 0
Starting 1 internal thread(s) on device 0
Data Reader threads: 3, out queues: 12, depth: 2
Starting 3 internal thread(s) on device 0
Opened lmdb /workspace/jobs/20180619-094203-ee39/train_db
Opened lmdb /workspace/jobs/20180619-094203-ee39/train_db
Output data size: 2, 3, 224, 224
Parser threads: 3 (auto)
Transformer threads: 4 (auto)
NVML succeeded to set CPU affinity on device 0, thread 88
Opened lmdb /workspace/jobs/20180619-094203-ee39/train_db
Check failed: error == cudaSuccess (48 vs. 0) no kernel image is available for execution on the device

I'm following this tutorial: https://github.com/dusty-nv/jetson-inference. The error shows up in the chapter Creating Image Classification Model with DIGITS
I'm using Ubuntu 16.04 LTS and a GeForce GTX 860M
What is the meaning of the error and what I'm doing wrong? Thanks!
seleccion_001

@lwohlhart
Copy link

Had the same issue;
For me it turned out to be some incompatibility with my Quadro K2200 and the CUDA-9 drivers which come with the docker:
I resolved it by running an earlier release (5.0) of the docker with CUDA-8 packaged

docker run --runtime=nvidia --name digits5 -d -p 5000:5000 -v digits-jobs:/jobs nvidia/digits:5.0

If you want to try releases prior to 6.0 you (released on NGC) go back to the deprecated project
https://github.com/NVIDIA/nvidia-docker/wiki/DIGITS

@rmoralesd
Copy link

Hello
I have the same issue with the same GPU but using ubuntu18. Did you solved it?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants