-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for LLMLingua #1065
Comments
Hey @TechnotechGit yes would be happy to help, do you have an outline of some existing code and the requirements of the method? Currently you should be able to extract all of the token logits from the transformer for a given prompt but I'm not sure if anything else is required for llmlingua. |
On the model side, it seems that it is just attention masks that are needed: # HF version
response = self.model(
input_ids[:, past_length:end],
attention_mask=attention_mask[:, :end],
past_key_values=past_key_values,
use_cache=True,
) I think everything else there is ok. attention_mask = tokenized_text["attention_mask"].to(self.device) Unfortunately I don't know what the low level for attention masks looks like so I do not know if this would be a big change or not. Might be able to look into it. |
@abetlen I've been having some trouble with retrieving logits; if you have any experience with transformers, do you know if the logprobs returned by llama-cpp-python are the same as transformers? (Trying to figure out if it's an issue on my end or not since I am getting different shaped tensors) |
microsoft/LLMLingua seems to be an interesting project. It's essentially (lossy) prompt compression, and you can use any HF model with it currently, including GPTQ. I think it would be useful to have llama.cpp supported via llama-cpp-python, since prompt compression would be useful for both CPU and GPU users, and especially alongside llama.cpp itself.
I was trying to implement llama-cpp-python for inference, but got stuck on needing an attention mask (perhaps I missed something). Any ideas on how to go about this?
microsoft/LLMLingua#41
The text was updated successfully, but these errors were encountered: