Skip to content

v0.4.2: Patch release

Compare
Choose a tag to compare
@fxmarty fxmarty released this 24 Aug 19:05
· 147 commits to main since this release

Major bugfix: exllama backend with arbitrary input length

This patch release includes a major bugfix to have the exllama backend work with input length > 2048 through a reconfigurable buffer size:

from auto_gptq import exllama_set_max_input_length

...
model = exllama_set_max_input_length(model, 4096)
  • Expose a function to update exllama max input length by @fxmarty in #281

Exllama kernels support in Windows wheels

This patch tentatively includes the exllama kernels in the wheels for Windows.

  • Add PyPI build workflow, tentatively fix exllama on windows by @fxmarty in #282

What's Changed

Full Changelog: v0.4.1...v0.4.2