Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? #2111

diPDew · 2016-05-11T18:58:12Z

I just switched from Caffe to MXNet. When training a GoogleNet model using the provided python scripts, I observe that MXNet always consumes much more system memory than Caffe in GPU mode. For instance, MXNet can easily eat 10GB RAM during training, while Caffe only takes less than 1GB.

I'm not sure if I didn't compile the MXNet code correctly. But before compilation, the only change I made in the config.mk is to enable CUDA.

Anyone could comment on that? Is there anything I need to set properly for MXNet in order to reduce the memory usage?

The text was updated successfully, but these errors were encountered:

antinucleon · 2016-05-11T19:39:23Z

Memory cost is related to Batch size. It is not normal, because in same batch size, MXNet will use much fewer memory.

diPDew · 2016-05-11T19:40:34Z

Thanks for the reply @antinucleon. Forgot to mention that I set the batch_size as 20 for both MXNet and Caffe.

tqchen · 2016-05-11T19:44:41Z

I think @EasonD3 means the CPU memory consumption. This could due to the memory needed for the recordIO pipeline due to current setting of caching queues.

We tuned the queue size for faster prefetching and decoding speed, maybe the setting was a bit large to eat up a bit more RAM

antinucleon · 2016-05-11T19:46:11Z

@tqchen Yes. I just checked GPU feature memory in Inception BN is 861 MB when batch size is 20.

diPDew · 2016-05-11T19:48:26Z

@antinucleon Thanks for the number. GPU-wise, the memory consumption is roughly the same as my side. But my issue is with the system memory. I should've mentioned more clearly in my post.

@tqchen Thanks. If I'd like to tune the RAM usage, can you advise how to do that?

diPDew · 2016-05-11T19:50:20Z

@tqchen Speaking of the queue size as you pointed out, I also observe that the latest MXNet code consumes about 50% more RAM compared to an older version of 2~3 months ago.

antinucleon · 2016-05-11T19:51:53Z

Thanks for pointing out it. Recently there is a refactor of IO. I will check it after I finish my job today,

tqchen · 2016-05-11T19:54:41Z

Some quick things to try

preprocess_threads
https://github.com/dmlc/mxnet/blob/master/src/io/iter_image_recordio.cc#L325
prefetch_buffer

diPDew · 2016-05-11T21:29:01Z

@tqchen Thanks for the hints. I just tested by training 15000 color images of size 224x224 with the Inception-BN model. I set preprocess_threads=1 and prefetch_threads=1 in ImageRecordIter. By modifying the line of code in iter_image_recordio.cc to i) iter_.set_max_capacity(1); and ii) iter_.set_max_capacity(2);, when comparing with the default which is iter_.set_max_capacity(4);, the RAM consumption decreased from >15GB down to i) ~7.5GB and ii) ~10.5GB for the above two cases, respectively. So still, the RAM consumption is quite a lot.

KeyKy · 2017-08-12T23:43:45Z

I get the same problem when training ssd or imagenet using .rec file!

szha · 2017-11-12T00:26:27Z

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

diPDew changed the title ~~Is it normal that MXNet consumes much more system memory than Caffe in GPU mode?~~ Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? May 11, 2016

Godricly mentioned this issue Aug 10, 2016

Why mxnet need so much memory? #2969

Closed

KeyKy mentioned this issue Aug 14, 2017

out of memory when training imagenet with .rec file. #7448

Closed

szha closed this as completed Nov 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? #2111

Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? #2111

diPDew commented May 11, 2016 •

edited

Loading

antinucleon commented May 11, 2016

diPDew commented May 11, 2016 •

edited

Loading

tqchen commented May 11, 2016

antinucleon commented May 11, 2016 •

edited

Loading

diPDew commented May 11, 2016

diPDew commented May 11, 2016

antinucleon commented May 11, 2016

tqchen commented May 11, 2016

diPDew commented May 11, 2016

KeyKy commented Aug 12, 2017

szha commented Nov 12, 2017

Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? #2111

Is it normal that MXNet consumes much more system memory than Caffe during training in GPU mode? #2111

Comments

diPDew commented May 11, 2016 • edited Loading

antinucleon commented May 11, 2016

diPDew commented May 11, 2016 • edited Loading

tqchen commented May 11, 2016

antinucleon commented May 11, 2016 • edited Loading

diPDew commented May 11, 2016

diPDew commented May 11, 2016

antinucleon commented May 11, 2016

tqchen commented May 11, 2016

diPDew commented May 11, 2016

KeyKy commented Aug 12, 2017

szha commented Nov 12, 2017

diPDew commented May 11, 2016 •

edited

Loading

diPDew commented May 11, 2016 •

edited

Loading

antinucleon commented May 11, 2016 •

edited

Loading