Expose a function to update exllama max input length #281

fxmarty · 2023-08-24T11:26:30Z

Exllama requires to set a buffer on the C++ side in the act-order case, so as to reorder the activations. Although by default this buffer is of length 2048, we provide an API to allow to change the size of this buffer:

from auto_gptq import exllama_set_max_input_length
model = exllama_set_max_input_length(model, 4096)

Fixes #253

Test with: CUDA_VISIBLE_DEVICES=0 pytest tests/test_q4.py -k "test_exllama_buffer_size" -s

@PanQiWei I think it would be worth a patch release, what do you think?

PanQiWei

Thank you very much for this pr @fxmarty !

expose api to set exllama max length

04730ac

fxmarty requested a review from PanQiWei August 24, 2023 11:41

fxmarty mentioned this pull request Aug 24, 2023

[BUG] RuntimeError: temp_state buffer is too small #253

Closed

PanQiWei approved these changes Aug 24, 2023

View reviewed changes

PanQiWei merged commit 8bb4d60 into AutoGPTQ:main Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose a function to update exllama max input length #281

Expose a function to update exllama max input length #281

fxmarty commented Aug 24, 2023 •

edited

Loading

PanQiWei left a comment

Expose a function to update exllama max input length #281

Expose a function to update exllama max input length #281

Conversation

fxmarty commented Aug 24, 2023 • edited Loading

PanQiWei left a comment

Choose a reason for hiding this comment

fxmarty commented Aug 24, 2023 •

edited

Loading