You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: while doing WikiText-103 benchmarks, the system memory usage of RMC increases very slowly. GPU VRAM remains stable.
Setup: anaconda, python 3.6, conda binary of PyTorch 0.4.1. Both CUDA 9.0 + CuDNN 7.1.2 and CUDA 9.2 + CuDNN 7.1.4 have same issues.
Problem: nn.CrossEntropyLoss wrapped inside the forward() of RelationalMemory may be the suspect. Removing it and calculating the loss in the training loop appears to remove the memory leak. But then the VRAM usage will be imbalanced if using multi-GPU with DataParallel.
Maybe related to reference cycle issue link, but it's fixed long ago and some toy examples (nn.CrossEntropyLoss inside the class) showed no leak.
Things tried: gc.collect() and del gc.gargabe[:] here and there. Didn't help.
Possible solution: Just giving up VRAM usage optimization. Have to use adaptive softmax anyway if using large-vocab dataset.
The text was updated successfully, but these errors were encountered:
After a brute force memory footprint logging of the forward pass, there are (randomly?) occasional ~0.3MB jumps during dropout, linear and attend_over_memory() function calls. Closing this issue since it seems like a PyTorch bug itself (reference cycle? fragmentation? not sure) and there's not much i can do.
Issue: while doing WikiText-103 benchmarks, the system memory usage of RMC increases very slowly. GPU VRAM remains stable.
Setup: anaconda, python 3.6, conda binary of PyTorch 0.4.1. Both CUDA 9.0 + CuDNN 7.1.2 and CUDA 9.2 + CuDNN 7.1.4 have same issues.
Problem:
nn.CrossEntropyLoss
wrapped inside the forward() ofRelationalMemory
may be the suspect. Removing it and calculating the loss in the training loop appears to remove the memory leak. But then the VRAM usage will be imbalanced if using multi-GPU withDataParallel
.Maybe related to reference cycle issue link, but it's fixed long ago and some toy examples (
nn.CrossEntropyLoss
inside the class) showed no leak.Things tried:
gc.collect()
anddel gc.gargabe[:]
here and there. Didn't help.Possible solution: Just giving up VRAM usage optimization. Have to use adaptive softmax anyway if using large-vocab dataset.
The text was updated successfully, but these errors were encountered: