Skip to content

CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED for fp16 #181

@bharatv007

Description

@bharatv007

Description

When trying to benchmark a gpt3 6 billion parameters model for sampling. I get the following error when using fp16.
python ./pytorch/gpt_sample.py --output_len=100 --time --max_batch_size=1 --max_seq_len=2048 --layer_num=28 --head_num=32 --size_per_head=128 --fp16

Traceback (most recent call last):
File "./pytorch/gpt_sample.py", line 167, in
main()
File "./pytorch/gpt_sample.py", line 138, in main
tokens_batch = gpt(start_ids, start_lengths, attn_mask)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/FasterTransformer/build/pytorch/utils/gpt.py", line 210, in forward
output_ids, = self.model.forward(start_ids, start_lengths, attn_mask, self.output_len)
RuntimeError: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED /workspace/FasterTransformer/fastertransformer/utils/functions.h:674

The error does not occur when I run without fp16 flag
python ./pytorch/gpt_sample.py --output_len=1900 --time --max_batch_size=1 --sample_input_file=" " --max_seq_len=2048 --layer_num=28 --head_num=32 --size_per_head=128 --fp16

=============== Arguments ===============
layer_num: 28
output_len: 128
head_num: 32
size_per_head: 128
vocab_size: 50304
top_k: 1
top_p: 0.0
temperature: 1.0
tensor_para_size: 1
layer_para_size: 1
layer_para_batch_size: 1
ckpt_path: ../models/megatron-models/c-model/345m/1-gpu
lib_path: ./lib/libpyt_fastertransformer.so
vocab_file: ../models/gpt2-vocab.json
merges_file: ../models/gpt2-merges.txt
start_id: 50256
end_id: 50256
max_batch_size: 1
max_seq_len: 1024
fp16: False
time: True
sample_input_file:
sample_output_file: None
=========================================
[INFO] batch size: 1
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[INFO] GPT time costs: 2432.03 ms

System Configuration

I am using the pytorch NGC container -> nvcr.io/nvidia/pytorch:20.12-py3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions