Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncedmem.cpp:78 Check failed: error == cudaSuccess (2 vs. 0) out of memory #5353

Closed
GaryKT opened this issue Mar 4, 2017 · 8 comments
Closed
Labels

Comments

@GaryKT
Copy link

GaryKT commented Mar 4, 2017

Issue summary

GPU Cuda crash when trying basic setup with train.py.
Crash occurs within a few seconds of launching python script
Machine has Nvidia with 8GB VRAM and 64GB RAM.

python train.py --solver test_solver.prototxt
....
I0303 20:08:31.785871 42944 layer_factory.cpp:58] Creating layer data
I0303 20:08:32.127293 42944 db_lmdb.cpp:40] Opened lmdb test_lmbd
I0303 20:08:32.148350 42944 net.cpp:86] Creating Layer data
I0303 20:08:32.148350 42944 net.cpp:382] data -> data
I0303 20:08:32.149355 42944 net.cpp:382] data -> label
I0303 20:08:32.149355 42944 data_transformer.cpp:25] Loading mean file from: test_mean_image.binaryproto
I0303 20:08:32.176491 42944 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.
I0303 20:08:32.177494 42944 data_layer.cpp:45] output data size: 5000,3,200,200
F0303 20:08:36.003052 42944 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

Steps to reproduce

python train.py --solver test_solver.prototxt

Your system configuration

Operating system: Windows 10
Compiler: VS 2015
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: ?
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 3.5.2

@willyd
Copy link
Contributor

willyd commented Mar 4, 2017

You should lower your batch size. 5000 x 3 x 200 x 200 x (4 bytes) = 2.2 GB. This is only your data layer output blob. It does not take into account the memory already used by other processes and the blobs and params of your network.

@GaryKT
Copy link
Author

GaryKT commented Mar 4, 2017

Ok, so we cannot run 5000 images at 200x200 even with close to 8GB GPU free and 64GB RAM / 55GB free? Is there a way to change the settings for different memory handling or e.g. off loading some of the data to disk or main memory dynamically?

Is this happening during a malloc up-front?

I was planning on processing much larger image datasets ultimately.
Any suggestions?

@GaryKT
Copy link
Author

GaryKT commented Mar 4, 2017

We put the batch size down to 50, now there error is:
I0303 23:23:01.826825 30044 net.cpp:257] Network initialization done.
I0303 23:23:01.826825 30044 solver.cpp:56] Solver scaffolding done.
F0303 23:23:01.868011 30044 parallel.cpp:135] Check failed: result == ncclSuccess (2 vs. 0) system error
*** Check failure stack trace: ***

Any idea what this error means?

@willyd
Copy link
Contributor

willyd commented Mar 4, 2017

Nope. I don't have the resources or time to debug this right now. If you only have one GPU it would be wiser to disable NCCL in your build and avoid using the train.py script. Use the caffe.exe or a caffe.SGDSolver instance. Otherwise, you can build caffe in debug mode and see if you get a little more information.

@willyd willyd added the windows label Mar 4, 2017
@GaryKT
Copy link
Author

GaryKT commented Mar 5, 2017

Ok just trying to follow the provided tutorials. Will try the other tools you mention. If there is any instructions, do let me know.

But we should be able to train more images or not? (just to understand how this is structured)

Thanks again.

@GaryKT
Copy link
Author

GaryKT commented Mar 5, 2017

With caffe.exe:

  1. with 5000 batches, same error as with Python (F0303 20:08:36.003052 42944 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0) out of memory)
  2. with 50 batches no error - it's still training. Will have to test more with settings etc.

Will try debug mode and without NCCL as well.
If there is any trick to get to train larger batches, let me know.

@willyd
Copy link
Contributor

willyd commented Mar 8, 2017

You can use the iter_size to and keep 50 as batch_size. iter_size acts as a multiplier on the batch size by effectively doing multiple forward passes and accumulating the gradients.

@willyd
Copy link
Contributor

willyd commented Mar 14, 2017

Closing as the only remaining issue is now tracked by #5401.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants