Skip to content

v0.2.0

Compare
Choose a tag to compare
@PanQiWei PanQiWei released this 01 Jun 16:05
· 377 commits to main since this release

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

  • Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
  • Full CPU offloading
  • Multiple GPUs inference with triton backend
  • Three new models are supported: codegen, gpt_bigcode and falcon
  • Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

Full Changelog: v0.1.0...v0.2.0