Memory consumption of RNNs in version v0.9 #4795

tdomhan · 2017-01-24T17:18:08Z

Hi,
I noticed that the memory consumption is considerably higher in version v0.9 compared to v0.8. This is probably due to the move to nnvm and not the same memory optimizations being applied. This can easily be reproduced with the rnn example in mxnet (example/rnn/lstm_bucketing.py) with larger hidden size. Namely, I set the following parameters:

    num_hidden = 1024
    num_embed = 512
    num_lstm_layer = 4

With this I get a GPU memory consumption of:

v0.9: 3.6 GB
v0.8: 2.3 GB

v0.9 was e1cafff
v0.8 was 67bee19

Tobi

The text was updated successfully, but these errors were encountered:

piiswrong · 2017-01-24T17:40:39Z

Could you try again with latest master?

@tqchen

tdomhan · 2017-01-26T14:51:56Z

I'll try with master and let you know.
Are there any recent commits that affect memory allocation? Quickly skimming through the commits I couldn't see any.

tdomhan · 2017-02-10T14:30:12Z

I can confirm that this regression has not changed with the latest commit (9ebd906).

eric-haibin-lin · 2017-02-17T00:53:21Z

Hi @tdomhan

What dataset are you using? I cannot reproduce your problem. Here is my setup:

Hardware: AWS p2.xlarge

Dataset: ptb dataset

Configuration:

buckets = [10, 20, 30, 40, 50, 60]
num_hidden = 1024                                                                   
num_embed = 512                                                                     
num_lstm_layer = 4

Memory consumption:

v0.9: 1179 MB
v0.8: 1540 MB

v0.9 was e1cafff
v0.8 was 67bee19

tdomhan · 2017-02-17T05:12:02Z

Hi Haibin,
thanks for looking into this. I was using the default ptb data set that you download with the script in the rnn example. As for code I was using the 0.8 version of the example. I didn't change the buckets. Which code/sample did you use for getting these numbers?

Tobi

tdomhan · 2017-02-20T17:51:00Z

Alright I was able to reproduce the issue. So basically I took the example code from 0.8 (not that it should matter, but just to be consistent). Then it's important to set the buckets correctly to reproduce this. Basically I had:

buckets = []

Which leads the code to generate the buckets automatically and is equivalent to setting:

buckets = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53,
 55, 58, 63, 78, 82]

With this I get:

v0.9 1840MB initially, 3744MiB after a while (then stops increasing)
v0.8 2310MiB initially, 2454MiB eventually, (then stops increasing)

I think the growth of memory over time is a result of not seeing all bucket sizes initially. Once each bucket size has been observed the memory will not grow anymore (this behavior also has to do with the way memory sharing is implemented, which is relates this issue to issue #5035). I'm not entirely sure if this issue is exactly the same as #5035, with different behavior observed in 0.8 vs 0.9 due to the change in the graphallocator not having access to the data_pool_ in 0.9 anymore, or whether those are two separate issues. I kind of suspect the first and it might make sense to look at #5035 first.

Anyway, I hope with this you will be able to reproduce the issue.

eric-haibin-lin · 2017-02-20T19:02:58Z

@tdomhan Yes I was able to reproduce it. Sorry for the late update. You're right, the main problem is graph allocator doesn't have access to free memory pool in the current version. I'm working on integrating the free pool information during memory planning.

tdomhan · 2017-02-20T19:46:51Z

No problem, thanks for looking into this. I'm not sure it will completely solve the issue. I wonder if there is any other way memory planning/allocation could be made more consistent across buckets.

…

On Mon, 20 Feb 2017 at 20:03, Haibin Lin ***@***.***> wrote: @tdomhan <https://github.com/tdomhan> Yes I was able to reproduce it. Sorry for the late update. You're right, the main problem is graph allocator doesn't have access to free memory pool in the current version. I'm working on integrating the free pool information during memory planning. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4795 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAWM2SOk3BgULPMLIcKZZi4-oZ8bA0uzks5reeNxgaJpZM4Lsipg> .

phunterlau · 2017-09-28T23:57:48Z

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

mz24cn mentioned this issue Jan 28, 2017

Performace regression issue on current mxnet with latest nnvm #4673

Closed

piiswrong mentioned this issue Feb 15, 2017

High memory usage with bucketing #5035

Closed

eric-haibin-lin mentioned this issue Feb 24, 2017

Memory allocator bug fix #5035 #5133

Merged

phunterlau closed this as completed Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory consumption of RNNs in version v0.9 #4795

Memory consumption of RNNs in version v0.9 #4795

tdomhan commented Jan 24, 2017

piiswrong commented Jan 24, 2017 •

edited

Loading

tdomhan commented Jan 26, 2017

tdomhan commented Feb 10, 2017

eric-haibin-lin commented Feb 17, 2017

tdomhan commented Feb 17, 2017

tdomhan commented Feb 20, 2017

eric-haibin-lin commented Feb 20, 2017

tdomhan commented Feb 20, 2017 via email

phunterlau commented Sep 28, 2017

Memory consumption of RNNs in version v0.9 #4795

Memory consumption of RNNs in version v0.9 #4795

Comments

tdomhan commented Jan 24, 2017

piiswrong commented Jan 24, 2017 • edited Loading

tdomhan commented Jan 26, 2017

tdomhan commented Feb 10, 2017

eric-haibin-lin commented Feb 17, 2017

tdomhan commented Feb 17, 2017

tdomhan commented Feb 20, 2017

eric-haibin-lin commented Feb 20, 2017

tdomhan commented Feb 20, 2017 via email

phunterlau commented Sep 28, 2017

piiswrong commented Jan 24, 2017 •

edited

Loading