Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow system memory leak of RMC model #1

Closed
L0SG opened this issue Sep 6, 2018 · 3 comments
Closed

slow system memory leak of RMC model #1

L0SG opened this issue Sep 6, 2018 · 3 comments

Comments

@L0SG
Copy link
Owner

L0SG commented Sep 6, 2018

Issue: while doing WikiText-103 benchmarks, the system memory usage of RMC increases very slowly. GPU VRAM remains stable.

Setup: anaconda, python 3.6, conda binary of PyTorch 0.4.1. Both CUDA 9.0 + CuDNN 7.1.2 and CUDA 9.2 + CuDNN 7.1.4 have same issues.

Problem: nn.CrossEntropyLoss wrapped inside the forward() of RelationalMemory may be the suspect. Removing it and calculating the loss in the training loop appears to remove the memory leak. But then the VRAM usage will be imbalanced if using multi-GPU with DataParallel.

Maybe related to reference cycle issue link, but it's fixed long ago and some toy examples (nn.CrossEntropyLoss inside the class) showed no leak.

Things tried: gc.collect() and del gc.gargabe[:] here and there. Didn't help.

Possible solution: Just giving up VRAM usage optimization. Have to use adaptive softmax anyway if using large-vocab dataset.

@L0SG
Copy link
Owner Author

L0SG commented Sep 11, 2018

edit: the issue still persists even if loss is calculated outside the model class. Still investigating.

@L0SG
Copy link
Owner Author

L0SG commented Sep 11, 2018

After a brute force memory footprint logging of the forward pass, there are (randomly?) occasional ~0.3MB jumps during dropout, linear and attend_over_memory() function calls. Closing this issue since it seems like a PyTorch bug itself (reference cycle? fragmentation? not sure) and there's not much i can do.

@L0SG L0SG closed this as completed Sep 11, 2018
@L0SG
Copy link
Owner Author

L0SG commented Dec 9, 2018

Update: the leak seems to have gone with the latest 1.0.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant