New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Zero shap values w/ llama-2 models #3217
Comments
@codybum probably a different issue than this bug report, but I ran into something similar. Do you have return_all_values=True in your pipeline or top_k=None? |
@derekelewis thanks for the response! I am doing text generation not classification, so that could be part of my issue. I tried to follow the SHAP docs for text generation, but there was nothing specific to llama. I had previously attempted to set "return_all_value=True" based on your code, but it could cause the following error: "ValueError: The following import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
import numpy as np
import shap
import torch
from transformers import pipeline
import shap
tokenizer = AutoTokenizer.from_pretrained("models/llama-2-7b-hf")
model = AutoModelForCausalLM.from_pretrained("models/llama-2-7b-hf")
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
trust_remote_code=True,
#return_all_scores=True,
top_k=None,
device_map="auto",
)
explainer = shap.Explainer(pipeline)
shap_values = explainer(["hello, world!"])
print(shap_values) |
Does anyone have a working transformer example? |
After some debugging, I modified the text-generation example of GPT-2 a bit and got it to work with GPT-J. import numpy as np
from transformers import AutoTokenizer, AutoModelForCausalLM
import shap
import torch
tokenizer = AutoTokenizer.from_pretrained('nlpcloud/instruct-gpt-j-fp16', use_fast=True)
model = AutoModelForCausalLM.from_pretrained('nlpcloud/instruct-gpt-j-fp16', torch_dtype=torch.float16).cuda()
model.config.is_decoder=True
gen_dict = dict(
max_new_tokens=10,
num_beams=5,
renormalize_logits=True,
no_repeat_ngram_size=8,
)
model.config.task_specific_params = dict()
model.config.task_specific_params["text-generation"] = gen_dict
shap_model = shap.models.TeacherForcing(model, tokenizer)
masker = shap.maskers.Text(tokenizer, mask_token="...", collapse_mask_token=True)
explainer = shap.Explainer(shap_model, tokenizer)
shap_values = explainer([prompt]) |
Yet, when I swap the above with OpenLlama2, I got weird results.
|
I was wondering if there has been any progress in this area? Is the computational cost too high for this to be practical on large models? |
I am also curious to see if anyone has successfully written a code with recent models such as |
Your Issues should be connected to shap not handling llama as a huggingface transformers model. I got it to work with Mistral, haven't come around to do it with LlaMa 2 yet. But I had the same error and would think the fix is the same too. However afterwards you will run in another error/weird plots (as mentioned by @LittleDijkstraZ ). |
I'm not sure if there is a fundamental reason why it shouldn't work based on the size. The computational cost certainly is high though and it runs a long time. My Partition Explainer with Mistral runs about 4 Minutes on 50 new tokens with very small input. And the values of course must be wrong. Currently, this is my output: I think it's a problem of implementation. I'd be willing to do a PR on this and the other changes plus update the example notebook... but I'm not sure where to start. Maybe one of the maintainers can help? (@connortann ?) - I'm not sure how this all works so I'll just tag you here, sorry if that's annoying/not the correct approach. |
PRs to investigate this issue and get this fixed would be most welcome! I think it's been quite a while since the text explanation functionality has had significant active development, but hopefully together we can figure out a fix. It would be great to get this sorted. There some existing tests that use a tiny test model here: shap/tests/explainers/conftest.py Lines 8 to 20 in 63223e1
shap/tests/explainers/test_partition.py Lines 11 to 13 in 63223e1
A good place to start would probably be to get a small fast minimal reproducible example - ideally with a synthetic dataset and a dummy tiny language model. Hopefully we can pin down the root issue without using a full llama-2 |
Thank you for the reply @connortann ! I will open a PR once I am closer to the solution. It's not a general transformer error. I'm successfully running it with GPT-2, Dialog GPT, and GODEL - the results look good to me. Therefore in my mind it must be an issue with the models and/or masking and not the calculation of SHAP values. As the Explainer is handling a model wrapped by the teacher forcing class in all these cases. This would lead me to think that it's a problem with the transformer's implementation of those particular models (mistral/llama2) or the model architecture. I'll update this with some more details and logs when I have the time to try it out. |
Hello again. I have had some more time for debugging and was able to drill down the issue. I think this is because of the full_masking call. Running log odds calculation separately with the teacher forcing model seems fine to me. Any insights on this full masking call would be appreciated. |
FYI, Captum now has included KernelSHAP for explaining LLMs. Read more here: https://captum.ai/tutorials/Llama2_LLM_Attribution |
Yea @amir-rahnama , this is what I used for my thesis project in the end, the performance compared to PartitionSHAP is not great though. Using a GPU you can achieve good speed but it's quite resource consuming. However it seems the underlying issue is already fixed, which is great @costrau ! |
Issue Description
Hello,
I am trying to use shap with a simple HuggingFace text-classification pipeline using one of the llama-2 models; however, the output values from the explainer are all zero. My environment works fine with the multi-class example in the docs w/ BERT, so I believe it is llama-2 specific.
Minimal Reproducible Example
Traceback
No response
Expected Behavior
No response
Bug report checklist
Installed Versions
Currently using master branch
The text was updated successfully, but these errors were encountered: