v0.4.2: Patch release
Major bugfix: exllama backend with arbitrary input length
This patch release includes a major bugfix to have the exllama backend work with input length > 2048 through a reconfigurable buffer size:
from auto_gptq import exllama_set_max_input_length
...
model = exllama_set_max_input_length(model, 4096)
Exllama kernels support in Windows wheels
This patch tentatively includes the exllama kernels in the wheels for Windows.
What's Changed
- Build wheels on ubuntu 20.04 by @fxmarty in #272
- Free disk space for rocm build by @fxmarty in #273
- Use focal for RoCm build by @fxmarty in #274
- Use conda incubator for rocm build by @fxmarty in #276
- Update install instructions by @fxmarty in #275
- Use --extra-index-url to resolve dependencies by @fxmarty in #277
- Fix python version for rocm build by @fxmarty in #278
- Fix powershell in workflow by @fxmarty in #284
Full Changelog: v0.4.1...v0.4.2