Releases · AutoGPTQ/AutoGPTQ

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
Full CPU offloading
Multiple GPUs inference with triton backend
Three new models are supported: codegen, gpt_bigcode and falcon
Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

Fix bug cuda by @qwopqwop200 in #44
Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by @TheBloke in #58
Setup conda by @Sciumo in #59
fix incorrect pack while using cuda, desc_act and grouping by @lszxb in #62
Faster llama by @qwopqwop200 in #43
Gptj fused attention by @PanQiWei in #76
Look for .pt files by @oobabooga in #79
Support users customize device_map by @PanQiWei in #80
Update example script to include desc_act by @Ph0rk0z in #82
Forward position args to allow model(tokens) syntax by @TheBloke in #84
Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by @TheBloke in #93
fix ImportError when triton is not installed by @PanQiWei in #92
Fix CUDA out of memory error in qlinear_old.py by @LexSong in #66
Improve CPU offload by @PanQiWei in #100
triton float32 support by @qwopqwop200 in #104
Add support for CodeGen/2 by @LaaZa in #65
Add support for GPTBigCode(starcoder) by @LaaZa in #63
Minor syntax fix for auto.py by @billcai in #112
Falcon support by @qwopqwop200 in #111
Add support for HF Hub download, and push_to_hub by @TheBloke in #91
Add build wheels workflow by @PanQiWei in #120

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

@Sciumo made their first contribution in #59
@lszxb made their first contribution in #62
@oobabooga made their first contribution in #79
@Ph0rk0z made their first contribution in #82
@LexSong made their first contribution in #66
@LaaZa made their first contribution in #65
@billcai made their first contribution in #112

Full Changelog: v0.1.0...v0.2.0

Contributors

TheBloke, Sciumo, and 8 other contributors

Assets 15

04 May 16:19

PanQiWei

v0.1.0

374ce21

v0.1.0

What's Changed

add option by @qwopqwop200 in #23
Add gpt2 by @qwopqwop200 in #30
Fix bug speedup quant and support gpt2 by @qwopqwop200 in #29
Offloading and Multiple devices quantization/inference by @PanQiWei in #24
Add raise exception and gpt2 xl example add by @qwopqwop200 in #31
Allow to load arbitrary models by @z80maniac in #33
Change save name by @qwopqwop200 in #34
Fix typo: 'hole' -> 'whole' by @TheBloke in #40
bug fix quantization demo by @qwopqwop200 in #37
Check that model_save_name exists before trying to load it, to avoid confusing checkpoint error by @TheBloke in #39
Faster cuda no actorder by @qwopqwop200 in #38

New Contributors

@z80maniac made their first contribution in #33
@TheBloke made their first contribution in #40

Full Changelog: v0.0.5...v0.1.0

Contributors

TheBloke, z80maniac, and 2 other contributors

Assets 2

26 Apr 10:03

PanQiWei

v0.0.5

a87d0b6

v0.0.5

What's Changed

add simple demo ppl test with wikitext2 by @qwopqwop200 in #17
push_to_hub integration by @PanQiWei in #18

New Contributors

@qwopqwop200 made their first contribution in #17

Full Changelog: v0.0.4...v0.0.5

Contributors

PanQiWei and qwopqwop200

Assets 2

26 Apr 06:06

PanQiWei

v0.0.4

8281547

v0.0.4

Big News

triton is officially supported start from this version!
quick install from pypi using pip install auto-gptq is supported start from this version!

What's Changed

Support MOSS model by @PanQiWei in #15
Triton integration by @PanQiWei in #16

Full Changelog: v0.0.3...v0.0.4

Contributors

PanQiWei

Assets 2

25 Apr 03:31

PanQiWei

v0.0.3

73606a3

v0.0.3

What's Changed

fix typo in README.md
fix problem that can't get some models' max sequence length
fix problem that some models have more required positional arguments when forward in transformer layers
fix mismatch GPTNeoxForCausalLM's lm_head

New Contributors

@eltociear made their first contribution in #10

Contributors

eltociear

Assets 2

23 Apr 12:06

PanQiWei

v0.0.2

4f84d21

v0.0.2

added eval_tasks module to support evaluate model's performance on predefined down-stream tasks before and after quantization
fixed some bugs when using LLaMa model
fixed some bugs when using models that required position_ids

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features Summary

Change Log

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Big News

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: AutoGPTQ/AutoGPTQ

v0.2.2: Patch Release

v0.2.1: Patch Release

v0.2.0

Features Summary

Change Log

New Contributors

Contributors

v0.1.0

What's Changed

New Contributors

Contributors

v0.0.5

What's Changed

New Contributors

Contributors

v0.0.4

Big News

What's Changed

Contributors

v0.0.3

What's Changed

New Contributors

Contributors

v0.0.2