Releases: AutoGPTQ/AutoGPTQ
Releases · AutoGPTQ/AutoGPTQ
v0.2.2: Patch Release
- fix
autogptq_cuda
dir missed in distribution file
v0.2.1: Patch Release
Fix the problem that installation from pypi failed when the environment variable CUDA_VERSION
is set.
v0.2.0
Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.
Features Summary
There are bunch of new features been added in this version:
- Optimized modules for faster inference speed: fused attention for
llama
andgptj
, fused mlp forllama
- Full CPU offloading
- Multiple GPUs inference with triton backend
- Three new models are supported:
codegen
,gpt_bigcode
andfalcon
- Support download/upload quantized model from/to HF Hub
Change Log
Below are the detailed change log:
- Fix bug cuda by @qwopqwop200 in #44
- Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by @TheBloke in #58
- Setup conda by @Sciumo in #59
- fix incorrect pack while using cuda, desc_act and grouping by @lszxb in #62
- Faster llama by @qwopqwop200 in #43
- Gptj fused attention by @PanQiWei in #76
- Look for .pt files by @oobabooga in #79
- Support users customize
device_map
by @PanQiWei in #80 - Update example script to include desc_act by @Ph0rk0z in #82
- Forward position args to allow
model(tokens)
syntax by @TheBloke in #84 - Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by @TheBloke in #93
- fix ImportError when triton is not installed by @PanQiWei in #92
- Fix CUDA out of memory error in qlinear_old.py by @LexSong in #66
- Improve CPU offload by @PanQiWei in #100
- triton float32 support by @qwopqwop200 in #104
- Add support for CodeGen/2 by @LaaZa in #65
- Add support for GPTBigCode(starcoder) by @LaaZa in #63
- Minor syntax fix for auto.py by @billcai in #112
- Falcon support by @qwopqwop200 in #111
- Add support for HF Hub download, and
push_to_hub
by @TheBloke in #91 - Add build wheels workflow by @PanQiWei in #120
New Contributors
Following are new contributors and their first pr. Thank you very much for your love of auto_gptq
and contributions! ❤️
- @Sciumo made their first contribution in #59
- @lszxb made their first contribution in #62
- @oobabooga made their first contribution in #79
- @Ph0rk0z made their first contribution in #82
- @LexSong made their first contribution in #66
- @LaaZa made their first contribution in #65
- @billcai made their first contribution in #112
Full Changelog: v0.1.0...v0.2.0
v0.1.0
What's Changed
- add option by @qwopqwop200 in #23
- Add gpt2 by @qwopqwop200 in #30
- Fix bug speedup quant and support gpt2 by @qwopqwop200 in #29
- Offloading and Multiple devices quantization/inference by @PanQiWei in #24
- Add raise exception and gpt2 xl example add by @qwopqwop200 in #31
- Allow to load arbitrary models by @z80maniac in #33
- Change save name by @qwopqwop200 in #34
- Fix typo: 'hole' -> 'whole' by @TheBloke in #40
- bug fix quantization demo by @qwopqwop200 in #37
- Check that model_save_name exists before trying to load it, to avoid confusing checkpoint error by @TheBloke in #39
- Faster cuda no actorder by @qwopqwop200 in #38
New Contributors
- @z80maniac made their first contribution in #33
- @TheBloke made their first contribution in #40
Full Changelog: v0.0.5...v0.1.0
v0.0.5
What's Changed
- add simple demo ppl test with wikitext2 by @qwopqwop200 in #17
- push_to_hub integration by @PanQiWei in #18
New Contributors
- @qwopqwop200 made their first contribution in #17
Full Changelog: v0.0.4...v0.0.5
v0.0.4
v0.0.3
What's Changed
- fix typo in README.md
- fix problem that can't get some models' max sequence length
- fix problem that some models have more required positional arguments when forward in transformer layers
- fix mismatch GPTNeoxForCausalLM's lm_head
New Contributors
- @eltociear made their first contribution in #10