Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda pypi : RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_INITIALIZED #311

Closed
PAOPAO6 opened this issue Nov 11, 2020 · 10 comments · Fixed by #335
Closed

cuda pypi : RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_INITIALIZED #311

PAOPAO6 opened this issue Nov 11, 2020 · 10 comments · Fixed by #335

Comments

@PAOPAO6
Copy link

PAOPAO6 commented Nov 11, 2020

run: translator.translate_batch(...) cause error: RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_INITIALIZED ,
environment:
Driver Version: 418.87.01 CUDA Version: 10.1

@guillaumekln
Copy link
Collaborator

I can't reproduce this error. Maybe your GPU is running out of memory because another process is using it?

If not, you need to give more information: OS version, GPU model, exact code that you executed, etc.

@PAOPAO6
Copy link
Author

PAOPAO6 commented Nov 12, 2020

I can't reproduce this error. Maybe your GPU is running out of memory because another process is using it?

If not, you need to give more information: OS version, GPU model, exact code that you executed, etc.

Thank you very much...

OS version:docker container: centos linux release 7.2.1511
container has 3 gpu(v100) and no other tasks, memory enough

gpu model generated by command:
ct2-opennmt-py-converter --model_path model_transformer_zh_en_step_1000000.pt --model_spec TransformerBaseRelative --output_dir transformer_zh_en_step_1000000_ct2_f16 --force --quantization float16

model is tested on cpu success

code:

sm = SpmSegment()
sm.load('spm_model_en_zh_32768.model')
translator = ctranslate2.Translator("models/transformer_zh_en_step_1000000_ct2_f16/", device='cuda'
print('Input En:')
while True:
    x = input()
    if x:
        with elapsed_timer() as elapsed:
            text_bytes = list(map(str, sm.sp.encode(x, out_type=int)))
            #  here except error: cuBLAS failed with status CUBLAS_STATUS_NOT_INITIALIZED 
            res = translator.translate_batch([text_bytes, text_bytes])  
            print(sm.sp.decode(list(map(int, res[0][0]['tokens']))))
            print('耗时:{}s'.format('%.6f' % elapsed()))

@guillaumekln
Copy link
Collaborator

Do other GPU applications work on your cards?

@PAOPAO6
Copy link
Author

PAOPAO6 commented Nov 12, 2020

Do other GPU applications work on your cards?
none, gpu status:
image

@guillaumekln
Copy link
Collaborator

I mean can you test another GPU application, such as TensorFlow, and verify that it works?

@PAOPAO6
Copy link
Author

PAOPAO6 commented Nov 12, 2020

I mean can you test another GPU application, such as TensorFlow, and verify that it works?

tensor2tensor code run is ok.

@guillaumekln
Copy link
Collaborator

Can you check whether the error occurs or not in the following cases:

  • When running outside the Docker container.
  • When using one of the CTranslate2 Docker images.
  • When using the pretrained Transformer model (see download and conversion instructions in the Quickstart).

@guillaumekln
Copy link
Collaborator

The cuBLAS library included in the published Python wheels was incorrect. It targeted CUDA 10.2 instead of CUDA 10.1. So I believe the error happened when the driver version is older than 440.33.

This is fixed and released in version 1.16.2.

@PAOPAO6
Copy link
Author

PAOPAO6 commented Nov 30, 2020

The cuBLAS library included in the published Python wheels was incorrect. It targeted CUDA 10.2 instead of CUDA 10.1. So I believe the error happened when the driver version is older than 440.33.

This is fixed and released in version 1.16.2.

Thank you very much...

@PAOPAO6
Copy link
Author

PAOPAO6 commented Nov 30, 2020

The cuBLAS library included in the published Python wheels was incorrect. It targeted CUDA 10.2 instead of CUDA 10.1. So I believe the error happened when the driver version is older than 440.33.

This is fixed and released in version 1.16.2.

I tried version 1.16.2, this problem was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants