slow system memory leak of RMC model #1

L0SG · 2018-09-06T01:11:22Z

Issue: while doing WikiText-103 benchmarks, the system memory usage of RMC increases very slowly. GPU VRAM remains stable.

Setup: anaconda, python 3.6, conda binary of PyTorch 0.4.1. Both CUDA 9.0 + CuDNN 7.1.2 and CUDA 9.2 + CuDNN 7.1.4 have same issues.

Problem: nn.CrossEntropyLoss wrapped inside the forward() of RelationalMemory may be the suspect. Removing it and calculating the loss in the training loop appears to remove the memory leak. But then the VRAM usage will be imbalanced if using multi-GPU with DataParallel.

Maybe related to reference cycle issue link, but it's fixed long ago and some toy examples (nn.CrossEntropyLoss inside the class) showed no leak.

Things tried: gc.collect() and del gc.gargabe[:] here and there. Didn't help.

Possible solution: Just giving up VRAM usage optimization. Have to use adaptive softmax anyway if using large-vocab dataset.

The text was updated successfully, but these errors were encountered:

L0SG · 2018-09-11T07:00:09Z

edit: the issue still persists even if loss is calculated outside the model class. Still investigating.

L0SG · 2018-09-11T10:39:20Z

After a brute force memory footprint logging of the forward pass, there are (randomly?) occasional ~0.3MB jumps during dropout, linear and attend_over_memory() function calls. Closing this issue since it seems like a PyTorch bug itself (reference cycle? fragmentation? not sure) and there's not much i can do.

L0SG · 2018-12-09T04:43:07Z

Update: the leak seems to have gone with the latest 1.0.0 release.

L0SG closed this as completed Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow system memory leak of RMC model #1

slow system memory leak of RMC model #1

L0SG commented Sep 6, 2018 •

edited

L0SG commented Sep 11, 2018

L0SG commented Sep 11, 2018

L0SG commented Dec 9, 2018

slow system memory leak of RMC model #1

slow system memory leak of RMC model #1

Comments

L0SG commented Sep 6, 2018 • edited

L0SG commented Sep 11, 2018

L0SG commented Sep 11, 2018

L0SG commented Dec 9, 2018

L0SG commented Sep 6, 2018 •

edited