fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory #83

Fizzbb · 2022-01-25T15:28:56Z

In the transformer training(validation phase),
Nvidia's fairseq multihead_ ttention file is different from Facebook's latest version
line 354 causing oom issues at the end of a epoch when calcuate bleu score for validation.
https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/fairseq/modules/multihead_attention.py

https://github.com/pytorch/fairseq/blob/main/fairseq/modules/multihead_attention.py (line 418)

File "/workspace/translation/fairseq/modules/multihead_attention, in forward
v = torch.cat((saved_state['prev_value'], v), dim=0)
RuntimeError: CUDA out of memory. Tried to allocate 314.00 MiB (GP total capacity; 8.63 GiB already allocated; 11.00 MiB free; 11.19in total by PyTorch)
Killing subprocess 558
Killing subprocess 559

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory #83

fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory #83

Fizzbb commented Jan 25, 2022

fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory #83

fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory #83

Comments

Fizzbb commented Jan 25, 2022