[FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED #411

gongel · 2023-01-01T15:19:54Z

Branch/Tag/Commit

nccl_dependent_refine

Docker Image Version

.

GPU name

V100

CUDA Driver

CUDA 10.1

Reproduced Steps

Background: Add XGLM model to FT(similar to GPT), and GPT model did not raise the error.
The main error message
For fp16, I report an error in this line of code: https://github.com/NVIDIA/FasterTransformer/blob/nccl_dependent_refine/fastertransformer/open_decoder.h#L735; For fp32, there is no problem.
Search the same issue
The error is same as CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED for fp16 #181, but the weights of model are pretrained, I can not modify it.

Thank you very much for helping me.

byshiue · 2023-01-02T06:56:16Z

Can you try the GPT model of latest main branch? FT has supported XGLM and have some example in https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-xglm

gongel · 2023-01-02T13:05:07Z

Thanks for your reply. GPT model did not encounter the error in latest main branch and nccl_dependent_refine branch. But for fp16 in nccl_dependent_refine branch, XGLM will raise the error. " FT has supported XGLM " is great! Where is FT XGLM model code? I can't find it and I would appreciate it if you could provide me with the code. I will compare it with GPT model.

byshiue · 2023-01-02T14:30:33Z

XGLM shares same codes of GPT. It is supported in ParallelGpt class.

gongel · 2023-01-03T02:15:25Z

ok thanks

gongel · 2023-01-05T03:08:49Z

Fixed. Do 'bias_pad' will raise this error when in fp16 mode.

gongel added the bug Something isn't working label Jan 1, 2023

gongel closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED #411

[FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED #411

gongel commented Jan 1, 2023 •

edited

byshiue commented Jan 2, 2023

gongel commented Jan 2, 2023

byshiue commented Jan 2, 2023

gongel commented Jan 3, 2023

gongel commented Jan 5, 2023

[FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED #411

[FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED #411

Comments

gongel commented Jan 1, 2023 • edited

Branch/Tag/Commit

Docker Image Version

GPU name

CUDA Driver

Reproduced Steps

byshiue commented Jan 2, 2023

gongel commented Jan 2, 2023

byshiue commented Jan 2, 2023

gongel commented Jan 3, 2023

gongel commented Jan 5, 2023

gongel commented Jan 1, 2023 •

edited