CUDA Out of Memory Error Even with small batch size and embedding size. #4

DennisLiu94 · 2018-04-19T18:01:09Z

I'm running the code on a machine with python 3.6, pytorch 0.3.1, K80 and CUDA 8.0 as described in the README.txt.

CUDA_VISIBLE_DEVICES=1 python3 train.py --src ~/IWT15/mono/euro.tc.en --trg ~/IWT15/mono/euro.tc.de --src_embeddings vecmap/data/euro.tc40.en.map --trg_em beddings vecmap/data/euro.tc40.de.map --save eurotc40_en2de --cuda

And I met the following fatal error.

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1522182087074/work/torch/lib/THC/generic/THCStorage.cu:58

I'm sure there is only one process on the specific GPU and it requires more than 12GB memory. I tried to use very small bilingual word embeddings (17MB in source language and 40MB in target language) and batch_size =2, the error still occurs.

Have any solutions or insights, please?

Thank you.

PS: the program runs smoothly on CPU.

DennisLiu94 · 2018-04-20T03:01:29Z

After I cut off the sentence length in the training data, the program runs smoothly. With default setting and 300-dim word embeddings, it requires 7GB gpu memory for training corpus with max-length 10 word.

Is there anything wrong about my program or the cutting is truly necessary?

stefan-it · 2018-04-23T09:27:33Z

I ran into out-of-memory problems with a max sentence length of 5 even with batch size 1. Sentence length of 1 and 2 worked. I also used an embedding size of 300. Tested on a P100 (with 16GB) using pytorch 0.3.1 and CUDA 9.

DennisLiu94 · 2018-04-23T10:17:35Z

I find that the problem caused my problem is the size of the vocabulary. Since I train on English <-> German language pair, the German vocabulary is 90k since I did not use BPE.

Cutting the size of vocabulary solves the problem.

artetxem · 2018-04-23T17:05:25Z

That makes sense, if you use large vocabulary sizes you are likely to run into out-of-memory problems. For the sake of clarity, we worked with a vocabulary size of 50k.

stefan-it · 2018-04-23T17:54:25Z

Thanks @DennisLiu94 and @artetxem : training is now working :)

artetxem closed this as completed Apr 23, 2018

artetxem mentioned this issue May 11, 2018

out of memory #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Out of Memory Error Even with small batch size and embedding size. #4

CUDA Out of Memory Error Even with small batch size and embedding size. #4

DennisLiu94 commented Apr 19, 2018

DennisLiu94 commented Apr 20, 2018

stefan-it commented Apr 23, 2018 •

edited

DennisLiu94 commented Apr 23, 2018

artetxem commented Apr 23, 2018

stefan-it commented Apr 23, 2018

CUDA Out of Memory Error Even with small batch size and embedding size. #4

CUDA Out of Memory Error Even with small batch size and embedding size. #4

Comments

DennisLiu94 commented Apr 19, 2018

DennisLiu94 commented Apr 20, 2018

stefan-it commented Apr 23, 2018 • edited

DennisLiu94 commented Apr 23, 2018

artetxem commented Apr 23, 2018

stefan-it commented Apr 23, 2018

stefan-it commented Apr 23, 2018 •

edited