Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the meaning of "CD(x, candidateGroup;model)-CD(x,selectedGroup; model)"? #7

Closed
LinkToPast1990 opened this issue Sep 7, 2020 · 5 comments

Comments

@LinkToPast1990
Copy link

LinkToPast1990 commented Sep 7, 2020

Hi, @csinva

Referring to the CD paper, we have p=Softmax(Wβ+Wγ), and Wβ provides a quantitative score for the given phrase.
But for different phrases, the Wβ terms may have very different values while the softmax results remain similar, is it reasonable to do subtraction between two Wβ terms?

Thanks!

@csinva
Copy link
Owner

csinva commented Sep 7, 2020

Hello!

A good question! Indeed for the final layer of a classification network, we have p=Softmax(Wβ+Wγ). A reasonable way to think of the CD-score for a group of features (i.e. the group corresponding to β) is like the input * a logistic regression coefficient, in a 2-term logistic regression. You can take the CD score either with or without the softmax.

What you suggest, with the subtraction does, in fact, make sense: it is like taking the importance to be one coefficient - another coefficient. In some cases, this may make sense, but in general just taking the original CD score for a group is probably more intuitive.

@LinkToPast1990
Copy link
Author

LinkToPast1990 commented Sep 8, 2020

Hi, @csinva
Thanks!
The logistic regression is
image

Could you kindly further explain how could we "think of the CD-score for a group of features (i.e. the group corresponding to β) is like the input * a logistic regression coefficient"?

CD score is Wβ, right? Which one is the coefficient, W or β?

@csinva
Copy link
Owner

csinva commented Sep 8, 2020

Hey again - yes CD score is Wβ. Thus Wβ would be the coefficient for the group of features we are considering. Wγ would be the coefficient corresponding to the rest of the features.

@LinkToPast1990
Copy link
Author

LinkToPast1990 commented Sep 8, 2020

Thanks again.
In CD, we could find that f(x) = SoftMax(g(x)) = SoftMax(Wβ+Wγ).
I thought the β vector is representing the group of features we are considering previously.
But based on your comments, it seems that the logistic regression should be:
SoftMax(logits) = SoftMax(bias + Wβ * group_considering + Wγ * group_rest), right?
In where we model this regression?

@csinva
Copy link
Owner

csinva commented Sep 8, 2020

Yes sorry your insights are correct and my description is a little confusing.

CD computes Wβ, and this represents the coefficient * the group - both are already multiplied and we can't disentangle them and say one is the coefficient and one is the input.

When we write, SoftMax(logits) = SoftMax(Wβ * group_considering + Wγ * group_rest), I like to think of group_considering and group_rest as binary 0/1 variables. Alternatively, you can just think of SoftMax(Wβ+Wγ) without the need for the binary terms.

Hope that helps!

@csinva csinva closed this as completed Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants