Support for LLMLingua #1065

TechnotechGit · 2024-01-05T00:56:41Z

microsoft/LLMLingua seems to be an interesting project. It's essentially (lossy) prompt compression, and you can use any HF model with it currently, including GPTQ. I think it would be useful to have llama.cpp supported via llama-cpp-python, since prompt compression would be useful for both CPU and GPU users, and especially alongside llama.cpp itself.

I was trying to implement llama-cpp-python for inference, but got stuck on needing an attention mask (perhaps I missed something). Any ideas on how to go about this?
microsoft/LLMLingua#41

abetlen · 2024-01-05T07:23:18Z

Hey @TechnotechGit yes would be happy to help, do you have an outline of some existing code and the requirements of the method? Currently you should be able to extract all of the token logits from the transformer for a given prompt but I'm not sure if anything else is required for llmlingua.

TechnotechGit · 2024-01-06T15:28:34Z

On the model side, it seems that it is just attention masks that are needed:

# HF version
response = self.model(
    input_ids[:, past_length:end],
    attention_mask=attention_mask[:, :end],
    past_key_values=past_key_values,
    use_cache=True,
)

I think everything else there is ok.
In terms of the tokeniser, I can mimic the hf calls so that's not a problem, but it again seems to require an attention mask property:

attention_mask = tokenized_text["attention_mask"].to(self.device)

Unfortunately I don't know what the low level for attention masks looks like so I do not know if this would be a big change or not. Might be able to look into it.

TechnotechGit · 2024-01-13T15:43:05Z

@abetlen I've been having some trouble with retrieving logits; if you have any experience with transformers, do you know if the logprobs returned by llama-cpp-python are the same as transformers? (Trying to figure out if it's an issue on my end or not since I am getting different shaped tensors)

abetlen added the enhancement New feature or request label Jan 5, 2024

iofu728 mentioned this issue Mar 7, 2024

[Question]: Running LLMLingua with GGUF models microsoft/LLMLingua#100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for LLMLingua #1065

Support for LLMLingua #1065

TechnotechGit commented Jan 5, 2024

abetlen commented Jan 5, 2024

TechnotechGit commented Jan 6, 2024

TechnotechGit commented Jan 13, 2024

Support for LLMLingua #1065

Support for LLMLingua #1065

Comments

TechnotechGit commented Jan 5, 2024

abetlen commented Jan 5, 2024

TechnotechGit commented Jan 6, 2024

TechnotechGit commented Jan 13, 2024