Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not learn with CUDA #29

Closed
vankhoa21991 opened this issue May 29, 2019 · 8 comments
Closed

Can not learn with CUDA #29

vankhoa21991 opened this issue May 29, 2019 · 8 comments

Comments

@vankhoa21991
Copy link

vankhoa21991 commented May 29, 2019

Hello, when I do the cmake, I had to remove everything in the exec folder except the n2d2.cpp because if not it will lead to this error:
"add_executable cannot create target "n2d2" because another target with the same name already exists."

Then I did the make like usual. But when I test the model I ran on this error with CUDA, and if I set to Frame only, the model does not learn. Do you know what is the problem that I made? Thanks

sudo ./build/bin/n2d2 models/mnist24_16c4s2_24c5s2_150_10.ini -learn 40000000 -log 100000
Option -log: number of steps between logs [100000]
Option -learn: number of backprop learning steps [40000000]
Loading network configuration file models/mnist24_16c4s2_24c5s2_150_10.ini
Layer: conv1 [Conv(Frame_CUDA)]
Notice: Could not open configuration file: conv1.cfg

Shared synapses: 256

Virtual synapses: 30976

Inputs dims: 24 24 1

Outputs dims: 11 11 16

Warning: No monitor could be added to Cell: conv1
Layer: conv2 [Conv(Frame_CUDA)]
Notice: Could not open configuration file: conv2.cfg

Shared synapses: 2250

Virtual synapses: 36000

Inputs dims: 11 11 16

Outputs dims: 4 4 24

Warning: No monitor could be added to Cell: conv2
Layer: fc1 [Fc(Frame_CUDA)]
Notice: Could not open configuration file: fc1.cfg

Synapses: 57600

Inputs dims: 4 4 24

Outputs dims: 1 1 150

Warning: No monitor could be added to Cell: fc1
Layer: fc1.drop [Dropout(Frame_CUDA)]
Notice: Could not open configuration file: fc1.drop.cfg

Inputs dims: 1 1 150

Outputs dims: 1 1 150

Warning: No monitor could be added to Cell: fc1.drop
Layer: fc2 [Fc(Frame_CUDA)]
Notice: Could not open configuration file: fc2.cfg

Synapses: 1500

Inputs dims: 1 1 150

Outputs dims: 1 1 10

Warning: No monitor could be added to Cell: fc2
Layer: softmax [Softmax(Frame_CUDA)]
Notice: Could not open configuration file: softmax.cfg

Inputs dims: 1 1 10

Outputs dims: 1 1 10

Target: softmax (target value: 1 / default value: 0 / top-n value: 1)
Warning: No monitor could be added to Cell: softmax
Total number of neurons: 2640
Total number of nodes: 2640
Total number of synapses: 61606
Total number of virtual synapses: 126076
Total number of connections: 126076
Notice: Unused section softmax.Target in INI file
CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED (1) in /home/kevin/IMRA_le/3_Program/SNN/N2D2/include/CudaContext.hpp:58
Time elapsed: 1.79893 s
Error: CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED (1) in /home/kevin/IMRA_le/3_Program/SNN/N2D2/include/CudaContext.hpp:58

@vankhoa21991
Copy link
Author

When i do nvidia-smi, i got this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:03:00.0 On | N/A |
| 22% 51C P0 75W / 250W | 526MiB / 12211MiB | 1% Default |
+-------------------------------+----------------------+----------------------+

@olivierbichler-cea
Copy link
Contributor

Hello,
Do you have CuDNN properly installed? What is your CuDNN version?

@vankhoa21991
Copy link
Author

This is the result from cmake
sudo cmake -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6 ..
-- cotire 1.8.0 loaded.
-- No PugiXML found
-- MongoDB not found.
-- CuDNN library status:
-- version: 7.4.1
-- include path: /usr/local/cuda/include
-- libraries: /usr/local/cuda/lib64/libcudnn.so
-- Configuring done
-- Generating done

@olivierbichler-cea
Copy link
Contributor

It looks like your driver version is not compatible with your CuDNN version, according to the CuDNN support matrix: https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html

@vankhoa21991
Copy link
Author

Thank you, now I'm having cudnn 7.4.1, CUDA 9.2, driver 390.116. Should I downgrade the driver to 384.11 or downgrade the CUDA to 9.0? It looks like my driver is not in this table.

@olivierbichler-cea
Copy link
Contributor

olivierbichler-cea commented Jun 7, 2019

According to the table, you should upgrade your driver to r396.26. I recommend to upgrade it if you can, instead of downgrading other things.

@olivierbichler-cea
Copy link
Contributor

The learning in Frame only should work we the latest version of N2D2. There was a bug that has been corrected since.

@olivierbichler-cea
Copy link
Contributor

Closing the issue, as this is a driver problem. Please feel free to re-open it if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants