Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Out of Memory Error Even with small batch size and embedding size. #4

Closed
DennisLiu94 opened this issue Apr 19, 2018 · 5 comments
Closed

Comments

@DennisLiu94
Copy link

I'm running the code on a machine with python 3.6, pytorch 0.3.1, K80 and CUDA 8.0 as described in the README.txt.

CUDA_VISIBLE_DEVICES=1 python3 train.py --src ~/IWT15/mono/euro.tc.en --trg ~/IWT15/mono/euro.tc.de --src_embeddings vecmap/data/euro.tc40.en.map --trg_em beddings vecmap/data/euro.tc40.de.map --save eurotc40_en2de --cuda

And I met the following fatal error.

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1522182087074/work/torch/lib/THC/generic/THCStorage.cu:58

I'm sure there is only one process on the specific GPU and it requires more than 12GB memory. I tried to use very small bilingual word embeddings (17MB in source language and 40MB in target language) and batch_size =2, the error still occurs.

Have any solutions or insights, please?

Thank you.

PS: the program runs smoothly on CPU.

@DennisLiu94
Copy link
Author

After I cut off the sentence length in the training data, the program runs smoothly. With default setting and 300-dim word embeddings, it requires 7GB gpu memory for training corpus with max-length 10 word.

Is there anything wrong about my program or the cutting is truly necessary?

@stefan-it
Copy link

stefan-it commented Apr 23, 2018

I ran into out-of-memory problems with a max sentence length of 5 even with batch size 1. Sentence length of 1 and 2 worked. I also used an embedding size of 300. Tested on a P100 (with 16GB) using pytorch 0.3.1 and CUDA 9.

@DennisLiu94
Copy link
Author

I find that the problem caused my problem is the size of the vocabulary. Since I train on English <-> German language pair, the German vocabulary is 90k since I did not use BPE.

Cutting the size of vocabulary solves the problem.

@artetxem
Copy link
Owner

That makes sense, if you use large vocabulary sizes you are likely to run into out-of-memory problems. For the sake of clarity, we worked with a vocabulary size of 50k.

@stefan-it
Copy link

Thanks @DennisLiu94 and @artetxem : training is now working :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants