Release v0.2.0 · AutoGPTQ/AutoGPTQ

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
Full CPU offloading
Multiple GPUs inference with triton backend
Three new models are supported: codegen, gpt_bigcode and falcon
Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

Fix bug cuda by @qwopqwop200 in #44
Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by @TheBloke in #58
Setup conda by @Sciumo in #59
fix incorrect pack while using cuda, desc_act and grouping by @lszxb in #62
Faster llama by @qwopqwop200 in #43
Gptj fused attention by @PanQiWei in #76
Look for .pt files by @oobabooga in #79
Support users customize device_map by @PanQiWei in #80
Update example script to include desc_act by @Ph0rk0z in #82
Forward position args to allow model(tokens) syntax by @TheBloke in #84
Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by @TheBloke in #93
fix ImportError when triton is not installed by @PanQiWei in #92
Fix CUDA out of memory error in qlinear_old.py by @LexSong in #66
Improve CPU offload by @PanQiWei in #100
triton float32 support by @qwopqwop200 in #104
Add support for CodeGen/2 by @LaaZa in #65
Add support for GPTBigCode(starcoder) by @LaaZa in #63
Minor syntax fix for auto.py by @billcai in #112
Falcon support by @qwopqwop200 in #111
Add support for HF Hub download, and push_to_hub by @TheBloke in #91
Add build wheels workflow by @PanQiWei in #120

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

@Sciumo made their first contribution in #59
@lszxb made their first contribution in #62
@oobabooga made their first contribution in #79
@Ph0rk0z made their first contribution in #82
@LexSong made their first contribution in #66
@LaaZa made their first contribution in #65
@billcai made their first contribution in #112

Full Changelog: v0.1.0...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Features Summary

Change Log

New Contributors

Contributors