New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353
Comments
You should lower your batch size. 5000 x 3 x 200 x 200 x (4 bytes) = 2.2 GB. This is only your data layer output blob. It does not take into account the memory already used by other processes and the blobs and params of your network. |
Ok, so we cannot run 5000 images at 200x200 even with close to 8GB GPU free and 64GB RAM / 55GB free? Is there a way to change the settings for different memory handling or e.g. off loading some of the data to disk or main memory dynamically? Is this happening during a malloc up-front? I was planning on processing much larger image datasets ultimately. |
We put the batch size down to 50, now there error is: Any idea what this error means? |
Nope. I don't have the resources or time to debug this right now. If you only have one GPU it would be wiser to disable NCCL in your build and avoid using the train.py script. Use the caffe.exe or a caffe.SGDSolver instance. Otherwise, you can build caffe in debug mode and see if you get a little more information. |
Ok just trying to follow the provided tutorials. Will try the other tools you mention. If there is any instructions, do let me know. But we should be able to train more images or not? (just to understand how this is structured) Thanks again. |
With caffe.exe:
Will try debug mode and without NCCL as well. |
You can use the iter_size to and keep 50 as batch_size. iter_size acts as a multiplier on the batch size by effectively doing multiple forward passes and accumulating the gradients. |
Closing as the only remaining issue is now tracked by #5401. |
Issue summary
GPU Cuda crash when trying basic setup with train.py.
Crash occurs within a few seconds of launching python script
Machine has Nvidia with 8GB VRAM and 64GB RAM.
python train.py --solver test_solver.prototxt
....
I0303 20:08:31.785871 42944 layer_factory.cpp:58] Creating layer data
I0303 20:08:32.127293 42944 db_lmdb.cpp:40] Opened lmdb test_lmbd
I0303 20:08:32.148350 42944 net.cpp:86] Creating Layer data
I0303 20:08:32.148350 42944 net.cpp:382] data -> data
I0303 20:08:32.149355 42944 net.cpp:382] data -> label
I0303 20:08:32.149355 42944 data_transformer.cpp:25] Loading mean file from: test_mean_image.binaryproto
I0303 20:08:32.176491 42944 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.
I0303 20:08:32.177494 42944 data_layer.cpp:45] output data size: 5000,3,200,200
F0303 20:08:36.003052 42944 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Steps to reproduce
python train.py --solver test_solver.prototxt
Your system configuration
Operating system: Windows 10
Compiler: VS 2015
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: ?
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 3.5.2
The text was updated successfully, but these errors were encountered: