-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Description
When trying to benchmark a gpt3 6 billion parameters model for sampling. I get the following error when using fp16.
python ./pytorch/gpt_sample.py --output_len=100 --time --max_batch_size=1 --max_seq_len=2048 --layer_num=28 --head_num=32 --size_per_head=128 --fp16
Traceback (most recent call last):
File "./pytorch/gpt_sample.py", line 167, in
main()
File "./pytorch/gpt_sample.py", line 138, in main
tokens_batch = gpt(start_ids, start_lengths, attn_mask)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/FasterTransformer/build/pytorch/utils/gpt.py", line 210, in forward
output_ids, = self.model.forward(start_ids, start_lengths, attn_mask, self.output_len)
RuntimeError: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED /workspace/FasterTransformer/fastertransformer/utils/functions.h:674
The error does not occur when I run without fp16 flag
python ./pytorch/gpt_sample.py --output_len=1900 --time --max_batch_size=1 --sample_input_file=" " --max_seq_len=2048 --layer_num=28 --head_num=32 --size_per_head=128 --fp16
=============== Arguments ===============
layer_num: 28
output_len: 128
head_num: 32
size_per_head: 128
vocab_size: 50304
top_k: 1
top_p: 0.0
temperature: 1.0
tensor_para_size: 1
layer_para_size: 1
layer_para_batch_size: 1
ckpt_path: ../models/megatron-models/c-model/345m/1-gpu
lib_path: ./lib/libpyt_fastertransformer.so
vocab_file: ../models/gpt2-vocab.json
merges_file: ../models/gpt2-merges.txt
start_id: 50256
end_id: 50256
max_batch_size: 1
max_seq_len: 1024
fp16: False
time: True
sample_input_file:
sample_output_file: None
=========================================
[INFO] batch size: 1
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[WARNING] decoding_gemm_config.in is not found
[INFO] GPT time costs: 2432.03 ms
System Configuration
I am using the pytorch NGC container -> nvcr.io/nvidia/pytorch:20.12-py3