Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Memory consumption of RNNs in version v0.9 #4795

Closed
tdomhan opened this issue Jan 24, 2017 · 9 comments
Closed

Memory consumption of RNNs in version v0.9 #4795

tdomhan opened this issue Jan 24, 2017 · 9 comments

Comments

@tdomhan
Copy link
Contributor

tdomhan commented Jan 24, 2017

Hi,
I noticed that the memory consumption is considerably higher in version v0.9 compared to v0.8. This is probably due to the move to nnvm and not the same memory optimizations being applied. This can easily be reproduced with the rnn example in mxnet (example/rnn/lstm_bucketing.py) with larger hidden size. Namely, I set the following parameters:

    num_hidden = 1024
    num_embed = 512
    num_lstm_layer = 4

With this I get a GPU memory consumption of:

v0.9: 3.6 GB
v0.8: 2.3 GB

v0.9 was e1cafff
v0.8 was 67bee19

Tobi

@piiswrong
Copy link
Contributor

piiswrong commented Jan 24, 2017

Could you try again with latest master?

@tqchen

@tdomhan
Copy link
Contributor Author

tdomhan commented Jan 26, 2017

I'll try with master and let you know.
Are there any recent commits that affect memory allocation? Quickly skimming through the commits I couldn't see any.

@tdomhan
Copy link
Contributor Author

tdomhan commented Feb 10, 2017

I can confirm that this regression has not changed with the latest commit (9ebd906).

@eric-haibin-lin
Copy link
Member

Hi @tdomhan

What dataset are you using? I cannot reproduce your problem. Here is my setup:

Hardware: AWS p2.xlarge

Dataset: ptb dataset

Configuration:

buckets = [10, 20, 30, 40, 50, 60]
num_hidden = 1024                                                                   
num_embed = 512                                                                     
num_lstm_layer = 4

Memory consumption:

v0.9: 1179 MB
v0.8: 1540 MB

v0.9 was e1cafff
v0.8 was 67bee19

@tdomhan
Copy link
Contributor Author

tdomhan commented Feb 17, 2017

Hi Haibin,
thanks for looking into this. I was using the default ptb data set that you download with the script in the rnn example. As for code I was using the 0.8 version of the example. I didn't change the buckets. Which code/sample did you use for getting these numbers?

Tobi

@tdomhan
Copy link
Contributor Author

tdomhan commented Feb 20, 2017

Alright I was able to reproduce the issue. So basically I took the example code from 0.8 (not that it should matter, but just to be consistent). Then it's important to set the buckets correctly to reproduce this. Basically I had:

buckets = []

Which leads the code to generate the buckets automatically and is equivalent to setting:

buckets = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53,
 55, 58, 63, 78, 82]

With this I get:

  • v0.9 1840MB initially, 3744MiB after a while (then stops increasing)
  • v0.8 2310MiB initially, 2454MiB eventually, (then stops increasing)

I think the growth of memory over time is a result of not seeing all bucket sizes initially. Once each bucket size has been observed the memory will not grow anymore (this behavior also has to do with the way memory sharing is implemented, which is relates this issue to issue #5035). I'm not entirely sure if this issue is exactly the same as #5035, with different behavior observed in 0.8 vs 0.9 due to the change in the graphallocator not having access to the data_pool_ in 0.9 anymore, or whether those are two separate issues. I kind of suspect the first and it might make sense to look at #5035 first.

Anyway, I hope with this you will be able to reproduce the issue.

@eric-haibin-lin
Copy link
Member

@tdomhan Yes I was able to reproduce it. Sorry for the late update. You're right, the main problem is graph allocator doesn't have access to free memory pool in the current version. I'm working on integrating the free pool information during memory planning.

@tdomhan
Copy link
Contributor Author

tdomhan commented Feb 20, 2017 via email

@phunterlau
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants