New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan when converting neox and opt models with AutoGPTQ-triton #8
Comments
Confirming similar nan results above for main cuda branch using same models plus additional neox-20b. |
I saw you were using Please let me know if the same problem still occurs when using |
Tested above 'quantize_with_alpaca.py' with latest 0.3 version. Needed to change the following: parser.add_argument("--fast_tokenizer", action="store_true") ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported. changed to: parser.add_argument("--fast_tokenizer", action="store_false") CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/gpt-neox-20b --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor After change the quantization proceeded without error complete with final examples from script printed to terminal. Unfortunately, the quantized model isn't saved to --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor 2023-04-24 15:03:20 INFO [auto_gptq.modeling._utils] Model packed. The model 'GPTNeoXGPTQForCausalLM' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM']. prompt: Instruction: Tested 3x. '4bit_converted' folder exists at same level as scripts and models Am I missing a command to save the model to a local folder or has it been saved to another default location? Thanks |
There are two things you should be aware of, maybe it's my bad that don't make it clear in example's README:
|
Thanks for the update. Model saved in '4bit_converted' in .bin format. The model 'GPTNeoXGPTQForCausalLM' is not supported for . is still generated but not a big deal. How to save as safetensors? Will run again using: model.save_quantized(args.quantized_model_dir, use_safetensors=True) Also, is there a simple inference script to use with the generated model above? Cheers |
this is a warning throw by
I would consider to write one as soon as possible in from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, TextGenerationPipeline
text = "Hello, World!"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
generated_text = pipeline(text, return_full_text=False, num_beams=1, max_new_tokens=128)[0]['generated_text']
print(generated_text) |
Thanks. Will work with above and close issue. Looking forward to the script. Cheers |
Will test default cuda version next but encountering nan for all conversions using 'AutoGPTQ-triton'
Using ubuntu 22.04, python3.10, transformers 4.28(dev), 64 gigs ram, 2x RTX 24 gig cards.
Installed successfully with all dependencies.
Am I missing a particular package version?
python basic_usage.py
pretrained_model_dir = "models/gpt-neox-20b"
quantized_model_dir = "4bit_converted"
2023-04-22 13:29:42 INFO [auto_gptq.modeling._base] Quantizing attention.query_key_value in layer 1/44...
2023-04-22 13:29:47 INFO [auto_gptq.quantization.gptq] duration: 5.032277584075928
2023-04-22 13:29:47 INFO [auto_gptq.quantization.gptq] avg loss: 17.77143669128418
2023-04-22 13:29:47 INFO [auto_gptq.modeling._base] Quantizing attention.dense in layer 1/44...
2023-04-22 13:29:48 INFO [auto_gptq.quantization.gptq] duration: 1.7948594093322754
2023-04-22 13:29:48 INFO [auto_gptq.quantization.gptq] avg loss: 1.888306736946106
2023-04-22 13:29:49 INFO [auto_gptq.modeling._base] Quantizing mlp.dense_h_to_4h in layer 1/44...
2023-04-22 13:29:50 INFO [auto_gptq.quantization.gptq] duration: 1.8883254528045654
2023-04-22 13:29:50 INFO [auto_gptq.quantization.gptq] avg loss: 28.566619873046875
2023-04-22 13:29:50 INFO [auto_gptq.modeling._base] Quantizing mlp.dense_4h_to_h in layer 1/44...
2023-04-22 13:30:02 INFO [auto_gptq.quantization.gptq] duration: 11.343331575393677
2023-04-22 13:30:02 INFO [auto_gptq.quantization.gptq] avg loss: nan
2023-04-22 13:30:02 INFO [auto_gptq.modeling._base] Start quantizing layer 2/44
2023-04-22 13:30:02 INFO [auto_gptq.modeling._base] Quantizing attention.query_key_value in layer 2/44...
2023-04-22 13:30:04 INFO [auto_gptq.quantization.gptq] duration: 1.9044442176818848
2023-04-22 13:30:04 INFO [auto_gptq.quantization.gptq] avg loss: nan
etc. stopped
Similar results with opt-30b:
Same results with opt:
pretrained_model_dir = "models/opt-30b"
quantized_model_dir = "4bit_converted"
Loading checkpoint shards: 100%|██████████████| 267/267 [13:52<00:00, 3.12s/it]
2023-04-22 13:55:41 INFO [auto_gptq.modeling._base] Start quantizing layer 1/44
2023-04-22 13:55:46 INFO [auto_gptq.modeling._base] Quantizing attention.query_key_value in layer 1/44...
2023-04-22 13:55:51 INFO [auto_gptq.quantization.gptq] duration: 4.748894453048706
2023-04-22 13:55:51 INFO [auto_gptq.quantization.gptq] avg loss: 17.623794555664062
2023-04-22 13:55:51 INFO [auto_gptq.modeling._base] Quantizing attention.dense in layer 1/44...
2023-04-22 13:55:53 INFO [auto_gptq.quantization.gptq] duration: 1.8472576141357422
2023-04-22 13:55:53 INFO [auto_gptq.quantization.gptq] avg loss: 1.9249645471572876
2023-04-22 13:55:53 INFO [auto_gptq.modeling._base] Quantizing mlp.dense_h_to_4h in layer 1/44...
2023-04-22 13:55:55 INFO [auto_gptq.quantization.gptq] duration: 1.9470229148864746
2023-04-22 13:55:55 INFO [auto_gptq.quantization.gptq] avg loss: 28.64271354675293
2023-04-22 13:55:55 INFO [auto_gptq.modeling._base] Quantizing mlp.dense_4h_to_h in layer 1/44...
2023-04-22 13:56:06 INFO [auto_gptq.quantization.gptq] duration: 11.630852222442627
2023-04-22 13:56:06 INFO [auto_gptq.quantization.gptq] avg loss: nan
2023-04-22 13:56:07 INFO [auto_gptq.modeling._base] Start quantizing layer 2/44
2023-04-22 13:56:07 INFO [auto_gptq.modeling._base] Quantizing attention.query_key_value in layer 2/44...
2023-04-22 13:56:09 INFO [auto_gptq.quantization.gptq] duration: 1.975161075592041
2023-04-22 13:56:09 INFO [auto_gptq.quantization.gptq] avg loss: nan
etc. stopped
The text was updated successfully, but these errors were encountered: