-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Second Order Gradients #159
Conversation
10df67b
to
3976295
Compare
Hi @chr5tphr I will provide my feedback here instead of the issue itself. I didn't check the attribution maps in detail, but your code example (here) seems to work fine. There are some points I would like to mention:
Code Example 1canonizers = None
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
def explain_LRP(model: torch.nn.Module, input: torch.Tensor, target: torch.Tensor):
with composite.context(model) as modified_model:
outputs = modified_model(input)
relevance, = torch.autograd.grad(outputs, input, target, create_graph=True)
return outputs, relevance
outputs, relevance = explain_LRP(model, input, target)
# create a target heatmap, rolled 12 pixels south east
target_heat = torch.roll(relevance.detach(), (12, 12), (2, 3))
loss = ((relevance - target_heat) ** 2).mean()
# deactivate the rule hooks in order to leave the second order gradient untouched
# version 1
with composite.inactive():
adv_grad, = torch.autograd.grad(loss, input) # <<-- error because `hook.active` is still True
# version 2
with composite.context(model):
with composite.inactive():
adv_grad, = torch.autograd.grad(loss, input) # <<-- error because `hook.active` is still True
Code Example 2canonizers = None
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
with Gradient(model=model, composite=composite) as attributor:
outputs, relevance = attributor(inputs, torch.eye(1000)[targets])
# create a target heatmap, rolled 12 pixels south east
target_heat = torch.roll(relevance.detach(), (12, 12), (2, 3))
loss = ((relevance - target_heat) ** 2).mean()
# deactivate the rule hooks in order to leave the second order gradient untouched
with attributor.composite.inactive():
adv_grad, = torch.autograd.grad(loss, inputs) # <<-- Error
# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Code Example 3def explain_LRP(model: torch.nn.Module, input: torch.Tensor, target: torch.Tensor):
canonizers = None
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
with Gradient(model=model, composite=composite) as attributor:
outputs, attributions = attributor(input, target)
return outputs, attributions
outputs, relevance = explain_LRP(model, input, target)
# create a target heatmap, rolled 12 pixels south east
target_heat = torch.roll(relevance.detach(), (12, 12), (2, 3))
loss = ((relevance - target_heat) ** 2).mean()
# now the gradient calculation should be possible by default without any futher deactivation etc.
adv_grad, = torch.autograd.grad(loss, input) # <<-- this should work by default Code Example 4def explain_LRP(model: torch.nn.Module, input: torch.Tensor, target: torch.Tensor):
canonizers = None
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
with composite.context(model) as modified_model:
outputs = modified_model(input)
relevance, = torch.autograd.grad(outputs, input, target, create_graph=True)
return outputs, relevance
outputs, relevance = explain_LRP(model, input, target)
# create a target heatmap, rolled 12 pixels south east
target_heat = torch.roll(relevance.detach(), (12, 12), (2, 3))
loss = ((relevance - target_heat) ** 2).mean()
# now the gradient calculation should be possible by default without any futher deactivation etc.
adv_grad, = torch.autograd.grad(loss, input) # <<-- this should work by default Edit: Code Example 5def explain_LRP(model: torch.nn.Module, input: torch.Tensor, target: torch.Tensor):
canonizers = None
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
with composite.context(model) as modified_model:
outputs = modified_model(input)
relevance, = torch.autograd.grad(outputs, input, target, create_graph=True)
for hook in composite.hook_refs:
hook.active = False
return outputs, relevance
outputs, relevance = explain_LRP(model, input, target)
# create a target heatmap, rolled 12 pixels south east
target_heat = torch.roll(relevance.detach(), (12, 12), (2, 3))
loss = ((relevance - target_heat) ** 2).mean()
# now the gradient calculation should be possible by default without any futher deactivation etc.
adv_grad, = torch.autograd.grad(loss, input) # <<-- OK |
Hey @HeinrichAD thanks a lot for your feedback! For clarification: Hooks are only ever active after
To summarize, the second order gradient should be possible to compute after destroying the hooks, or while the hooks still exist within Edit: Actually, for the intended behaviour, you do not need to loop over the hooks in Code Example 5, it should work without it, since you leave the context and destroy the hooks. |
047f64e
to
e139b66
Compare
Hi @chr5tphr, It makes much more sense now. Thank you for the detailed clarification. |
5d25460
to
9976b52
Compare
9b581c3
to
033a0be
Compare
Hey @HeinrichAD I'm currently working on the rest of the documentation for this, but functionality and tests are now finished (unless I find a bug or something missing). If you would like to try it out, you can either do so now, or wait a little bit until I also finished the documentation, at which point I will mark this PR ready and merge it in the following days. Everything should work as expected, and as a bonus, Attributors now also have a |
da87d65
to
8998b13
Compare
- support computing the gradient of the modified gradient via second order gradients - for this, the second gradient pass must be done without hooks - this is supported by either removing the hooks before computing the second order gradients, or setting for each hook `hook.active = False` by e.g. using the contextmanager `composite.inactive()` before computing the second order gradients - make SmoothGrad and IntegratedGradients inherit from Gradient - add `create_graph` and `retain_graph` arguments for Gradient-Attributors - add `.grad` function to Gradient, which is used by its subclasses to compute the gradient - fix attributor docstrings - recognize in BasicHook.backward whether to use `create_graph=True` for the backward pass in order to compute the relevance by checking whether `grad_output` requires a gradient - add the ReLUBetaSmooth rule, which transforms the gradient of ReLU to the gradient of softplus (i.e. sigmoid); this is used as a surrogate to compute meaningful gradients of ReLU - add the BetaSmooth Composite, which applies the ReLUBetaSmooth rule to ReLUs - add test to check effect of hook.active - add test to check whether the second order gradient of Hook is computed as expected - add second order gradient tests for gradient attributors - add test for attributor.inactive - add test for Composite.inactive - add test for ReLUBetaSmooth - add How-To for computing second order gradients - explain the issue with second order gradients of ReLU networks - show how to compute second order gradients with composites outside of the composite context - show how to compute second order gradients with composites using Composite.inactive - show how to compute second order gradients with attributors - show how to compute second order gradients using only hooks
8998b13
to
ec65d46
Compare
Hey @HeinrichAD I am done with the PR and would like to merge. |
Hi @chr5tphr, |
Hi @chr5tphr, First thank you for your effort! CodeIn general, the code looks good. It also works as I expected. Same code generates now this output: NOTE: I do not know which output is correct. TyposSince my IDE already points this out for me, here is a list of typos:
Also sometimes it's layer-wise relevance propagation and sometimes layerwise relevance propagation. |
Hey @HeinrichAD thanks a lot again for your feedback.
The first version was actually wrong. The problem was that
I will add a quick follow-up PR to fix these, since many files where not touched in this PR, and I prefer to not touch files for typos etc. if there was no change in that file. |
Thank you for the explanation. In this case the PR gets a ready to go from my side 😄. |
support computing the gradient of the modified gradient via second
order gradients
for this, the second gradient pass must be done without hooks
this is supported by either removing the hooks before computing the
second order gradients, or setting for each hook
hook.active = False
by e.g. using the contextmanager
composite.inactive()
beforecomputing the second order gradients
make SmoothGrad and IntegratedGradients inherit from Gradient
add
create_graph
andretain_graph
arguments forGradient-Attributors
add
.grad
function to Gradient, which is used by its subclasses tocompute the gradient
fix attributor docstrings
recognize in BasicHook.backward whether to use
create_graph=True
forthe backward pass in order to compute the relevance by checking
whether
grad_output
requires a gradientadd the ReLUBetaSmooth rule, which transforms the gradient of ReLU to
the gradient of softplus (i.e. sigmoid); this is used as a surrogate
to compute meaningful gradients of ReLU
add test to check effect of hook.active
add test to check whether the second order gradient of Hook is
computed as expected
add second order gradient tests for gradient attributors
add test for attributor.inactive
add test for Composite.inactive
add test for ReLUBetaSmooth
add How-To for computing second order gradients
of the composite context
Composite.inactive
Implements #142 and fixes #125