Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the meaning of contribution? #1266

Open
Dongximing opened this issue Mar 28, 2024 · 1 comment
Open

what is the meaning of contribution? #1266

Dongximing opened this issue Mar 28, 2024 · 1 comment

Comments

@Dongximing
Copy link

Hi everyone,

I have a question about LLM contribution, in the picture. this is Perturbation-based Attribution method. the basic idea is that replace the words in order for example
I love you after tokenized
10 20 30
then it use a "0"(you can change the "0" to something else) to replace the order to see the target_id log_softmax change. it is using baseline's log_softmax(target_id) - replace the words's log_softmax(target_id)
'0 20 30'
'10,0,30'
'10.20,0'
Screen Shot 2024-03-28 at 9 20 15 AM

so in my opinion, need we use "absolute value" to evaluate the importance of tokens?
For example, if contribution is [-3.5,3.6,1], the first important token is token_1(3.6) and the second is token_0(-3.5) and third is token_2(1)

also in the LLMGradientAttribution method

the final step is , https://github.com/pytorch/captum/blob/master/captum/attr/_core/llm_attr.py#L570 , it will sum the gradients on the last dim. I have a question how to eval the importance of tokens, Does bigger mean more important?
for example, after sum , contribution is [-3.5,3.6,1], so ,is that means the first important token is token_1(3.6) and the second is token_2(1) and third is token_0(-3.5) ?

Thanks

@vivekmig
Copy link
Contributor

vivekmig commented Apr 9, 2024

Hi @Dongximing , generally, yes, if you want the magnitude of importance, it is generally reasonable to consider the absolute value of attribution scores and compare these values. But the sign information does provide information regarding the direction of the feature or token's contribution. Particularly, for feature ablation, if the attribution score is negative, this implies that the output score is higher when replaced with the baseline than the original. Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants