New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Degradation when using FP16 #175
Comments
Do you get correct result under FP32? I try to repoduce your problem but fail.
I don't use the FP16 checkpoint because the converter will convert it back to FP32. |
@byshiue Thank you for your quick response!
I am trying to test accuracy using FP32, but my objective is to use FP16 checkpoint because
Yes, I am aware that tensors are converted to FP32 before save.
You mean that FasterTransformer convert FP32 parameters to FP16 dynamically when FP16 inference mode? |
Yes. FT assumes that the checkpoint is always under FP32. If you set |
@byshiue Sure. I'll try to modify
Which weights are needed to generate the sentences you generated? |
What's your meaning of "failed"? Cannot conver the model or cannot generate correct result? |
@byshiue "failed" means that generated sentences are not correct like here.
Oh, I see. |
Sorry, I misunderstand something. As you say, the converter of FT does not support the checkpoint of Huggingface. So, if you want to load the model of Huggingface, you need to modify the converter. Thus, I think it is not a problem for precision, but the converting. |
GPT-J also has a different architecture to vanilla decoder transformer. Does FasterTransformer support different architectures too? |
FT supports GPT-J, standard encoder-decoder, BERT, longformer and T5. |
The difference between huggingface and original gpt-j-6b is just layer names, right?
You mean that the following conversion script from |
|
Hi, can you try the tag dev/v5.0_beta_2021.09_tag? |
@byshiue |
Sorry, it seems there are some bugs in the latest codes, I will fix it as soon as possible. |
Information
I want to perform GPT-J model in fp16 precision(https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16) on FasterTransformer + Triton, but I have a trouble with the accuracy.
For example, the following sentences are generated when following sample scripts with FasterTransformer.
sample scripts
https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/GPT-J-6B/Inference_with_GPT_J_6B.ipynb#scrollTo=RdOynYcY8jb1
generated sentences
Environment
To reproduce
download gpt-j-6b model in float 16
pytorch_model.bin
, renamegpt-j.pt
, and storegpt-j/gpt-j.pt
.https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16
convert pytorch model to fasterTransformer format via the following scripts on docker image
nvcr.io/nvidia/pytorch:21.07-py3
.config.pbtxt
totriton-model-store/fastertransformer/config.pbtxt
. I note that I have changedtemperature
to0.9
from sample GPT-J config.triton-model-store
directory)chat.py
viapython3 chat.py
Expected Behavior
The following reference outputs accurate sentences.
Ref: https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/GPT-J-6B/Inference_with_GPT_J_6B.ipynb#scrollTo=RdOynYcY8jb1
output:
Related Issue
#172
The text was updated successfully, but these errors were encountered: