-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent pull 23 generating jammed sentence output with new quantized neox20b 4bit model #26
Comments
If this is true, it's very strange. I've coded the result so that it doesn't change. |
@GenTxt can you share your quantization code and model to us so that we can try to reproduce and figure out what went wrong. Also you may try on the up-to-date commit in main branch, may be it can solve your problem. |
https://huggingface.co/kz919/gpt-neox-20b-8k-longtuning/tree/main Converted above to safetensors with text generation webui script. CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/neox20b_8192_safe --quantized_model_dir 4bit_converted --bits 4 --group_size 128 --fast_tokenizer --save_and_reload Old models deleted as current triton kernel can cause errors on refurbished 6000. For the specific code above, this error:
Quantized in latest cuda main and not encountering the error. False alarm. Closing here and carefully testing each update. Thanks |
Hi, I've also tyied neox20b quantization, the inference speed I got is 16tokens/s, which isn't fast enough, may I have your results? |
#23
neox-20b 4bit models quantized with above generates jammed sentences as per example below.
The smell of tobacco smoke in theseemingly ceaseless breeze which swept through during these conversationswas unmistakable evidence of his presence to anyone who heard that faintpunctual pungency overrode any other possible olfactory suggestion; butRobert could sense without really seeing more than once how he etc.
Previous main version using same seed generates above correctly as:
The smell of tobacco smoke in the seemingly ceaseless breeze which swept through during these conversations was unmistakable evidence of his presence to anyone who heard that faint punctual pungency overrode any other possible olfactory suggestion; but Robert could sense without really seeing more than once how he etc.
The text was updated successfully, but these errors were encountered: