Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8) #5

Open
lhuang9703 opened this issue Jul 5, 2020 · 2 comments

Comments

@lhuang9703
Copy link

Hi ,when I run your code: python cbert_finetune.py

I got the following problem:

Traceback (most recent call last):
File "cbert_finetune.py", line 168, in
main()
File "cbert_finetune.py", line 151, in main
loss.backward()
File "/home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f409fe5f536 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xf67ee5 (0x7f40a1222ee5 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x94c (0x7f40a1223ccc in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xf5d5e1 (0x7f40a12185e1 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x14079bd (0x7f40a16c29bd in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: THCudaTensor_addmm + 0x5c (0x7f40a16cc56c in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0x1053a08 (0x7f40a130ea08 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xf76dc8 (0x7f40a1231dc8 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0x10c3ec0 (0x7f40dd807ec0 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x2c9b6fe (0x7f40df3df6fe in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x10c3ec0 (0x7f40dd807ec0 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::Tensor::mm(at::Tensor const&) const + 0xf0 (0x7f40dd3cbb70 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x28e6b6c (0x7f40df02ab6c in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::MmBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x151 (0x7f40df02b971 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2d89c05 (0x7f40df4cdc05 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f40df4caf03 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f40df4cbce2 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f40df4c4359 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f40ebc034d8 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #19: + 0xb8408 (0x7f40ecab7408 in /home1/wxzuo/anaconda3/lib/libstdc++.so.6)
frame #20: + 0x7e25 (0x7f41212c8e25 in /lib64/libpthread.so.0)
frame #21: clone + 0x6d (0x7f41206e1bad in /lib64/libc.so.6)

Here are my enviroments:
Package Version


certifi 2020.4.5.2
chardet 3.0.4
click 7.1.2
ConfigArgParse 1.2.3
cycler 0.10.0
Cython 3.0a5
dataclasses 0.7
decorator 4.1.2
dgl 0.4.3.post2
filelock 3.0.12
future 0.18.2
idna 2.9
joblib 0.15.1
kiwisolver 1.2.0
matplotlib 3.2.2
networkx 2.1
nltk 3.5
numpy 1.13.3
packaging 20.4
pandas 1.0.4
Pillow 7.1.2
pip 20.1.1
psutil 5.7.0
pycocotools 2.0
pyparsing 2.4.7
python-dateutil 2.8.1
pytz 2020.1
regex 2020.6.8
requests 2.23.0
sacremoses 0.0.43
scikit-learn 0.23.1
scipy 1.4.1
sentencepiece 0.1.91
setuptools 36.4.0
six 1.15.0
sklearn 0.0
stanfordcorenlp 3.9.1.1
threadpoolctl 2.1.0
tokenizers 0.7.0
torch 1.5.0
torchtext 0.6.0
torchvision 0.6.0
tqdm 4.46.1
transformers 2.11.0
urllib3 1.25.9
wheel 0.29.0

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Could please tell me how to solve this problem, thanks

@smolPixel
Copy link

Took a lot of attempts, but you need to use transformers==2.1.1 for it to work

@wangcongcong123
Copy link

If you want to use the latest transformers, just change original_masked_lm_labels = [-1] * max_seq_length line 200 in cbert_utils.py to original_masked_lm_labels = [-100] * max_seq_length. Then here you go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants