Compute attributions w.r.t the predicted logit, not the predicted loss #4882

sarahwie · 2020-12-22T21:49:38Z

Compute gradient attribution with respect to the predicted class' logit to avoid a dependency of the gradient on the loss' distance to 0, which causes a 0 gradient.

See for justification:

Section 8.1 "What Can We Do if the Derivative is 0?" Baehrens et al. (2010)
Section 3 Simonyan et al. (2013)
DeepExplain repository

sarahwie · 2020-12-22T23:27:39Z

Realized this breaks Hotflip, since that relies on the loss. Also, I am not sure how the input reduction method is intended to be calculated (the paper just says "outputs"), but it will change with this, too.

matt-gardner · 2021-01-04T21:48:17Z

allennlp/predictors/predictor.py

@@ -113,13 +113,13 @@ def get_gradients(self, instances: List[Instance]) -> Tuple[Dict[str, Any], Dict
                self._model.forward(**dataset_tensor_dict)  # type: ignore
            )

-            loss = outputs["loss"]
+            predicted_logit = outputs["logits"].squeeze(0)[int(torch.argmax(outputs["probs"]))]


The trouble with doing it this way is that it hard-codes assumptions about the model's outputs which may not be true. The test failure you're getting is because of this. This method has to be generic enough to work for any model. This is ok when we query the loss key, because that key is already required by the Trainer. Nothing else is guaranteed to be in the output, so we can't hard-code anything else.

Maybe a better way of accomplishing what you want is to allow the caller to specify the output key, with a default value of "loss". Then it would be the model's responsibility make sure that the value in the key is a single number on which we can call .backward(). E.g., you could imagine adding a target_logit key in your model class, and then use that key when calling get_gradients().

We could get by with less model modification if we add a second flag that says whether to take an argmax of the values in that key, but that gets a bit messy, because then you're always getting gradients of the model's prediction, completely ignoring whatever label was given in the input instance. This breaks a lot of assumptions in other methods in the code (which I think is what you were referring to when you said this breaks hotflip), so I don't really like this option.

Thanks for the feedback! I agree that using a key is straightforward. I'll refactor.

dirkgr · 2021-01-14T22:55:45Z

Is this still an active project? Can we help in any way?

sarahwie requested a review from matt-gardner December 22, 2020 21:49

sarahwie force-pushed the master branch 2 times, most recently from 9c43e97 to ef337d2 Compare December 22, 2020 23:10

Compute attributions w.r.t the predicted logit, not the predicted loss

39c40fe

sarahwie force-pushed the master branch from ef337d2 to 39c40fe Compare December 22, 2020 23:12

schmmd changed the base branch from master to main December 23, 2020 18:47

matt-gardner reviewed Jan 4, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute attributions w.r.t the predicted logit, not the predicted loss #4882

Compute attributions w.r.t the predicted logit, not the predicted loss #4882

sarahwie commented Dec 22, 2020

sarahwie commented Dec 22, 2020

matt-gardner Jan 4, 2021

sarahwie Jan 8, 2021

dirkgr commented Jan 14, 2021

Compute attributions w.r.t the predicted logit, not the predicted loss #4882

Are you sure you want to change the base?

Compute attributions w.r.t the predicted logit, not the predicted loss #4882

Conversation

sarahwie commented Dec 22, 2020

sarahwie commented Dec 22, 2020

matt-gardner Jan 4, 2021

Choose a reason for hiding this comment

sarahwie Jan 8, 2021

Choose a reason for hiding this comment

dirkgr commented Jan 14, 2021