You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your reply. GPT model did not encounter the error in latest main branch and nccl_dependent_refine branch. But for fp16 in nccl_dependent_refine branch, XGLM will raise the error. " FT has supported XGLM " is great! Where is FT XGLM model code? I can't find it and I would appreciate it if you could provide me with the code. I will compare it with GPT model.
Branch/Tag/Commit
nccl_dependent_refine
Docker Image Version
.
GPU name
V100
CUDA Driver
CUDA 10.1
Reproduced Steps
For fp16, I report an error in this line of code: https://github.com/NVIDIA/FasterTransformer/blob/nccl_dependent_refine/fastertransformer/open_decoder.h#L735; For fp32, there is no problem.
The error is same as CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED for fp16 #181, but the weights of model are pretrained, I can not modify it.
Thank you very much for helping me.
The text was updated successfully, but these errors were encountered: