v0.2.0
Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.
Features Summary
There are bunch of new features been added in this version:
- Optimized modules for faster inference speed: fused attention for
llama
andgptj
, fused mlp forllama
- Full CPU offloading
- Multiple GPUs inference with triton backend
- Three new models are supported:
codegen
,gpt_bigcode
andfalcon
- Support download/upload quantized model from/to HF Hub
Change Log
Below are the detailed change log:
- Fix bug cuda by @qwopqwop200 in #44
- Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by @TheBloke in #58
- Setup conda by @Sciumo in #59
- fix incorrect pack while using cuda, desc_act and grouping by @lszxb in #62
- Faster llama by @qwopqwop200 in #43
- Gptj fused attention by @PanQiWei in #76
- Look for .pt files by @oobabooga in #79
- Support users customize
device_map
by @PanQiWei in #80 - Update example script to include desc_act by @Ph0rk0z in #82
- Forward position args to allow
model(tokens)
syntax by @TheBloke in #84 - Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by @TheBloke in #93
- fix ImportError when triton is not installed by @PanQiWei in #92
- Fix CUDA out of memory error in qlinear_old.py by @LexSong in #66
- Improve CPU offload by @PanQiWei in #100
- triton float32 support by @qwopqwop200 in #104
- Add support for CodeGen/2 by @LaaZa in #65
- Add support for GPTBigCode(starcoder) by @LaaZa in #63
- Minor syntax fix for auto.py by @billcai in #112
- Falcon support by @qwopqwop200 in #111
- Add support for HF Hub download, and
push_to_hub
by @TheBloke in #91 - Add build wheels workflow by @PanQiWei in #120
New Contributors
Following are new contributors and their first pr. Thank you very much for your love of auto_gptq
and contributions! ❤️
- @Sciumo made their first contribution in #59
- @lszxb made their first contribution in #62
- @oobabooga made their first contribution in #79
- @Ph0rk0z made their first contribution in #82
- @LexSong made their first contribution in #66
- @LaaZa made their first contribution in #65
- @billcai made their first contribution in #112
Full Changelog: v0.1.0...v0.2.0