Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

DIGITS running from nvidia-docker gives "ERROR: Check failed:" for AlexNet model with CUDA 8, cuDNN 5.1 #221

Closed
xhuvom opened this issue Oct 18, 2016 · 5 comments

Comments

@xhuvom
Copy link

xhuvom commented Oct 18, 2016

I'm using GTX 1080 on Ubuntu 16.04 with CUDA 8.0 and cuDNN 5.1. The DIGITS is running from nvidia-docker to my localhost. While training my dataset with AlexNet/ GoogLeNet, DIGITS gives the following error :

ERROR: Check failed: status == CURAND_STATUS_SUCCESS (201 vs. 0) CURAND_STATUS_LAUNCH_FAILURE

This network produces output loss
This network produces output loss1/accuracy
This network produces output loss1/loss
This network produces output loss2/accuracy
This network produces output loss2/loss
Network initialization done.
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Ignoring source layer train-data
Ignoring source layer label_train-data_1_split
Test net output #0: accuracy = 0.291667
Test net output #1: loss = 1.20791 (* 1 = 1.20791 loss)
Test net output #2: loss1/accuracy = 0.395833
Test net output #3: loss1/loss = 3.7042 (* 0.3 = 1.11126 loss)
Test net output #4: loss2/accuracy = 0.291667
Test net output #5: loss2/loss = 1.29856 (* 0.3 = 0.389567 loss)
Check failed: status == CURAND_STATUS_

## What could be the possible issue? I'm also facing the same problem with DetectNet which returns "IO error". Hitting nvidia-smi returns following status:

+-----------------------------------------------------------------------------+

> | NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
> |===============================+======================+======================|
> |   0  GeForce GTX 1080    On   | 0000:01:00.0      On |                  N/A |
> |  0%   45C    P2    52W / 270W |    488MiB /  8110MiB |      0%      Default |
> +-------------------------------+----------------------+----------------------+
>                                                                                
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU Memory |
> |  GPU       PID  Type  Process name                               Usage      |
> |=============================================================================|
> |    0      1030    G   /usr/lib/xorg/Xorg                             165MiB |
> |    0      1817    G   compiz                                          42MiB |
> |    0      2077    G   /usr/lib/firefox/firefox                         2MiB |
> |    0      2444    C   /usr/bin/python2                               111MiB |
> |    0      3554    C   /usr/bin/caffe                                 163MiB |

+-----------------------------------------------------------------------------+
@xhuvom
Copy link
Author

xhuvom commented Oct 18, 2016

The training runs fine with Lenet (28x28) greyscale model, but causes problem with AlexNet or GoogleNet. Is it strange? Is the problem is for nvidia-docker with CUDA 8.0??

@flx42
Copy link
Member

flx42 commented Oct 18, 2016

The DIGITS/caffe images we provide are based on CUDA 7.5, so they can't run on Pascal boards like the 1080. Hopefully we will have deb packages for caffe and DIGITS on CUDA 8.0 soon.

@xhuvom
Copy link
Author

xhuvom commented Oct 18, 2016

How strange!! Anyways, please provide me a solid link on installing caffe for CUDA 8.0 without docker. Thanks!!

@flx42
Copy link
Member

flx42 commented Oct 18, 2016

Look at the DIGITS instructions for caffe

@lukeyeager
Copy link
Member

Looks like you're done with Docker and have moved onto building from source: NVIDIA/DIGITS#1186.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants