Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extreme RAM consumption by multilayer models #159

Closed
funbotan opened this issue Apr 19, 2022 · 7 comments
Closed

Extreme RAM consumption by multilayer models #159

funbotan opened this issue Apr 19, 2022 · 7 comments

Comments

@funbotan
Copy link
Contributor

Hi Benedek! First of all, thank you for this project, I hope it has a bright future ahead.
My issue is with building deep/multi-layer models out of Recurrent Graph Convolutional Layers. There are no examples of it in the repo (or anywhere else on the internet as far as I searched), so I might be doing something wrong. Here is my project. Now, what I observe when running this code is that RAM utilization goes up nearly exponentially with the number of RGC layers. Of course, most of it ends up in swap, making the training process too slow to be viable. This does not appear to be a memory leak, since most of the memory is not used by Python objects, but rather internally by PyTorch. Have you encountered this issue and is there a way to fix it?

@SherylHYX
Copy link
Collaborator

Based on your code, it seems like you are training in a "cumulative" manner for backpropagation, which needs to store many gradient-related objects before doing backpropagation. To alleviate this issue, one option is to turn to the "incremental" manner, which does backpropagation for each snapshot during training.

@SherylHYX
Copy link
Collaborator

For another, during evaluation, you could use "with torch.no_grad()". See for example here.

@funbotan
Copy link
Contributor Author

Based on your code, it seems like you are training in a "cumulative" manner for backpropagation, which needs to store many gradient-related objects before doing backpropagation. To alleviate this issue, one option is to turn to the "incremental" manner, which does backpropagation for each snapshot during training.

Thank you, I see how that would help. However, won't that also discard the temporal data correlations?

@SherylHYX
Copy link
Collaborator

SherylHYX commented Apr 19, 2022

Thank you, I see how that would help. However, won't that also discard the temporal data correlations?

You could see from our paper that for some input e.g. Wikipedia Math, using the incremental backprop regime actually leads to better performance than the cumulative one. This is similar to using a mini-batch for SGD compared to using the full batch.

@funbotan
Copy link
Contributor Author

Well, this did solve the memory consumption problem. Training time has only marginally improved, though, due to the frequent backward passes. I'll try searching for a compromise later. Can't really compare the performance, because the previous configuration never even finished training.
Do you think this problem is inherent to GCNs or is it just an implementation issue?

@SherylHYX
Copy link
Collaborator

SherylHYX commented Apr 19, 2022 via email

@funbotan
Copy link
Contributor Author

Alright, I'll close the issue for the time being, but very much hope that someone comes up with a better solution down the line. Thank you very much @SherylHYX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants