Skip to content

Releases: AutoGPTQ/AutoGPTQ

v0.2.2: Patch Release

08 Jun 06:10
Compare
Choose a tag to compare
  • fix autogptq_cuda dir missed in distribution file

v0.2.1: Patch Release

02 Jun 11:50
Compare
Choose a tag to compare

Fix the problem that installation from pypi failed when the environment variable CUDA_VERSION is set.

v0.2.0

01 Jun 16:05
Compare
Choose a tag to compare

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

  • Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
  • Full CPU offloading
  • Multiple GPUs inference with triton backend
  • Three new models are supported: codegen, gpt_bigcode and falcon
  • Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

Full Changelog: v0.1.0...v0.2.0

v0.1.0

04 May 16:19
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.5...v0.1.0

v0.0.5

26 Apr 10:03
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.4...v0.0.5

v0.0.4

26 Apr 06:06
8281547
Compare
Choose a tag to compare

Big News

  • triton is officially supported start from this version!
  • quick install from pypi using pip install auto-gptq is supported start from this version!

What's Changed

Full Changelog: v0.0.3...v0.0.4

v0.0.3

25 Apr 03:31
Compare
Choose a tag to compare

What's Changed

  • fix typo in README.md
  • fix problem that can't get some models' max sequence length
  • fix problem that some models have more required positional arguments when forward in transformer layers
  • fix mismatch GPTNeoxForCausalLM's lm_head

New Contributors

v0.0.2

23 Apr 12:06
4f84d21
Compare
Choose a tag to compare
  • added eval_tasks module to support evaluate model's performance on predefined down-stream tasks before and after quantization
  • fixed some bugs when using LLaMa model
  • fixed some bugs when using models that required position_ids