FIX: Fix multiple generations for new HF cache format #444

younesbelkada · 2024-04-15T12:30:40Z

What does this PR do?

Currently on transformers main + latest autoawq, users will face issues with fused modules + multiple calls of generate:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig

model_id = "hf-internal-testing/Mixtral-tiny-AWQ"

quantization_config = AwqConfig(bits=4, fuse_max_seq_len=128, do_fuse=True)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config).to(0)

dummy_input = torch.LongTensor([[0, 1, 0, 1]]).to(0)

_ = model.generate(dummy_input, use_cache=True)

# second generate fails
_ = model.generate(dummy_input, use_cache=True)

This PR addresses that by checking the correct attributes in the FusedAttention modules

cc @casper-hansen

casper-hansen · 2024-04-19T09:18:32Z

Hi @younesbelkada, these are some interesting edge cases. Would it help using the attention module the same way as it is used in AutoAWQ? Currently, transformers just resets the start_pos which seems to require multiple fixes and there are probably more bugs that we are not aware of.

Update attn.py

f6503a9

younesbelkada requested a review from casper-hansen April 15, 2024 12:32

younesbelkada mentioned this pull request Apr 19, 2024

idefics2-8b-AWQ failed when doing multiple calls huggingface/transformers#30331

Closed

4 tasks

casper-hansen merged commit 33af761 into main May 1, 2024

younesbelkada deleted the younesbelkada-patch-2 branch May 2, 2024 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Fix multiple generations for new HF cache format #444

FIX: Fix multiple generations for new HF cache format #444

younesbelkada commented Apr 15, 2024

casper-hansen commented Apr 19, 2024

FIX: Fix multiple generations for new HF cache format #444

FIX: Fix multiple generations for new HF cache format #444

Conversation

younesbelkada commented Apr 15, 2024

What does this PR do?

casper-hansen commented Apr 19, 2024