Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LevelDB memory consumption problem (out of files) #13

Closed
reedscot opened this issue Dec 10, 2013 · 2 comments
Closed

LevelDB memory consumption problem (out of files) #13

reedscot opened this issue Dec 10, 2013 · 2 comments

Comments

@reedscot
Copy link

When running Caffe on the ImageNet data, I observed that the memory usage (seen via top command) inexorably increases to almost 100%. With batchsize=256, this happens in around 2500 iterations. When I set the batchsize to 100, training was faster but by around 5000 iterations the memory consumption again increased to almost 100%. At that point the training slows down dramatically and in fact the loss does not change at all. I suspect the slowdown may be due to thrashing. I am wondering if there is a memory leak or something in Caffe that is unintentionally allocating more and more memory at each iteration.

The same issue occurs on MNIST, although the dataset is much smaller so the training can actually complete without issues.

I ran the MNIST data through the valgrind tool with --leak-check=full, and indeed some memory leaks were reported. These could be benign if the amount of leaked memory is constant, but maybe it is scaling with respect to the number of batches which could explain the forever-increasing memory consumption.

Any idea what could be the problem?

Update (12/13/2013): The problem may be in LevelDB. I was able to make it work by modifying src/caffe/layers/data_layer.cpp by setting options.max_open_files = 100. I think the default was 1000, which was just too much memory on the machine I was using. I also wonder whether it could be improved by setting ReadOptions::fill_cache=false, since Caffe seems to scan over the whole training set.

@shelhamer
Copy link
Member

Symptom of the same leveldb number of open files issue as #38.

@shelhamer
Copy link
Member

Thanks for the update and suggested fix.

shelhamer added a commit that referenced this issue Feb 25, 2014
Set leveldb options.max_open_files = 100 and Fix #13 and #38
shelhamer pushed a commit to shelhamer/caffe that referenced this issue Feb 26, 2014
shelhamer added a commit to shelhamer/caffe that referenced this issue Feb 26, 2014
Set leveldb options.max_open_files = 100 and Fix BVLC#13 and BVLC#38
shelhamer added a commit that referenced this issue Feb 26, 2014
Set leveldb options.max_open_files = 100 and Fix #13 and #38
shelhamer added a commit that referenced this issue Feb 26, 2014
Set leveldb options.max_open_files = 100 and Fix #13 and #38
mitmul pushed a commit to mitmul/caffe that referenced this issue Sep 30, 2014
happynear pushed a commit to happynear/caffe that referenced this issue Feb 26, 2016
Merge bvlc_win branch with master branch
andpol5 pushed a commit to andpol5/caffe that referenced this issue Aug 24, 2016
mbassov pushed a commit to mbassov/caffe that referenced this issue Nov 10, 2017
DEV-XXXXX: Update readme to track layer additions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants