syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

GaryKT · 2017-03-04T01:14:51Z

Issue summary

GPU Cuda crash when trying basic setup with train.py.
Crash occurs within a few seconds of launching python script
Machine has Nvidia with 8GB VRAM and 64GB RAM.

python train.py --solver test_solver.prototxt
....
I0303 20:08:31.785871 42944 layer_factory.cpp:58] Creating layer data
I0303 20:08:32.127293 42944 db_lmdb.cpp:40] Opened lmdb test_lmbd
I0303 20:08:32.148350 42944 net.cpp:86] Creating Layer data
I0303 20:08:32.148350 42944 net.cpp:382] data -> data
I0303 20:08:32.149355 42944 net.cpp:382] data -> label
I0303 20:08:32.149355 42944 data_transformer.cpp:25] Loading mean file from: test_mean_image.binaryproto
I0303 20:08:32.176491 42944 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.
I0303 20:08:32.177494 42944 data_layer.cpp:45] output data size: 5000,3,200,200
F0303 20:08:36.003052 42944 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

Steps to reproduce

python train.py --solver test_solver.prototxt

Your system configuration

Operating system: Windows 10
Compiler: VS 2015
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: ?
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 3.5.2

willyd · 2017-03-04T01:37:20Z

You should lower your batch size. 5000 x 3 x 200 x 200 x (4 bytes) = 2.2 GB. This is only your data layer output blob. It does not take into account the memory already used by other processes and the blobs and params of your network.

GaryKT · 2017-03-04T04:22:15Z

Ok, so we cannot run 5000 images at 200x200 even with close to 8GB GPU free and 64GB RAM / 55GB free? Is there a way to change the settings for different memory handling or e.g. off loading some of the data to disk or main memory dynamically?

Is this happening during a malloc up-front?

I was planning on processing much larger image datasets ultimately.
Any suggestions?

GaryKT · 2017-03-04T04:24:23Z

We put the batch size down to 50, now there error is:
I0303 23:23:01.826825 30044 net.cpp:257] Network initialization done.
I0303 23:23:01.826825 30044 solver.cpp:56] Solver scaffolding done.
F0303 23:23:01.868011 30044 parallel.cpp:135] Check failed: result == ncclSuccess (2 vs. 0) system error
*** Check failure stack trace: ***

Any idea what this error means?

willyd · 2017-03-04T18:58:08Z

Nope. I don't have the resources or time to debug this right now. If you only have one GPU it would be wiser to disable NCCL in your build and avoid using the train.py script. Use the caffe.exe or a caffe.SGDSolver instance. Otherwise, you can build caffe in debug mode and see if you get a little more information.

GaryKT · 2017-03-05T04:50:14Z

Ok just trying to follow the provided tutorials. Will try the other tools you mention. If there is any instructions, do let me know.

But we should be able to train more images or not? (just to understand how this is structured)

Thanks again.

GaryKT · 2017-03-05T20:14:35Z

With caffe.exe:

with 5000 batches, same error as with Python (F0303 20:08:36.003052 42944 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0) out of memory)
with 50 batches no error - it's still training. Will have to test more with settings etc.

Will try debug mode and without NCCL as well.
If there is any trick to get to train larger batches, let me know.

willyd · 2017-03-08T22:38:56Z

You can use the iter_size to and keep 50 as batch_size. iter_size acts as a multiplier on the batch size by effectively doing multiple forward passes and accumulating the gradients.

willyd · 2017-03-14T02:07:15Z

Closing as the only remaining issue is now tracked by #5401.

willyd added the windows label Mar 4, 2017

willyd mentioned this issue Mar 14, 2017

Using train.py results in cryptic nccl error under windows #5401

Closed

willyd closed this as completed Mar 14, 2017

k4yt3x mentioned this issue Dec 17, 2018

Check failed: error == cudaSuccess (2 vs. 0) out of memory k4yt3x/video2x#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

GaryKT commented Mar 4, 2017 •

edited

willyd commented Mar 4, 2017

GaryKT commented Mar 4, 2017

GaryKT commented Mar 4, 2017

willyd commented Mar 4, 2017

GaryKT commented Mar 5, 2017

GaryKT commented Mar 5, 2017

willyd commented Mar 8, 2017

willyd commented Mar 14, 2017

syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

Comments

GaryKT commented Mar 4, 2017 • edited

Issue summary

Steps to reproduce

Your system configuration

willyd commented Mar 4, 2017

GaryKT commented Mar 4, 2017

GaryKT commented Mar 4, 2017

willyd commented Mar 4, 2017

GaryKT commented Mar 5, 2017

GaryKT commented Mar 5, 2017

willyd commented Mar 8, 2017

willyd commented Mar 14, 2017

GaryKT commented Mar 4, 2017 •

edited