*Accompanying code examples of the book "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python" by [Sebastian Raschka](https://sebastianraschka.com). All code examples are released under the [MIT license](https://github.com/rasbt/deep-learning-book/blob/master/LICENSE). If you find this content useful, please consider supporting the work by buying a [copy of the book](https://leanpub.com/ann-and-deeplearning).*
  
Other code examples and content are available on [GitHub](https://github.com/rasbt/deep-learning-book). The PDF and ebook versions of the book are available through [Leanpub](https://leanpub.com/ann-and-deeplearning).

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Sebastian Raschka 

CPython 3.6.3
IPython 6.2.1

torch 0.3.0.post4


# Model Zoo -- Getting Gradients of an Intermediate Variable in PyTorch

This notebook illustrates how we can fetch the intermediate gradients of a function that is composed of multiple inputs and multiple computation steps in PyTorch. Note that gradient is simply a vector listing the derivatives of a function with respect
to each argument of the function. So, strictly speaking, we are discussing how to obtain the partial derivatives here.

Assume we have this simple toy graph:
    
![](images/manual-gradients/graph_1.png)

Now, we provide the following values to b, x, and w; the red numbers indicate the intermediate values of the computation and the end result:

![](images/manual-gradients/graph_2.png)

Now, the next image shows the partial derivatives of the output node, a, with respect to the input nodes (b, x, and w) as well as all the intermediate partial derivatives:


![](images/manual-gradients/graph_3.png)

(The images were taken from my PyData Talk in August 2017, for more information of how to arrive at these derivatives, please see the talk/slides at https://github.com/rasbt/pydata-annarbor2017-dl-tutorial; also, I put up a little calculus and differentiation primer if helpful: https://sebastianraschka.com/pdf/books/dlb/appendix_d_calculus.pdf)



For instance, if we are interested in obtaining the partial derivative of the output a with respect to each of the input and intermediate nodes, we could do the following in TensorFlow:

In [2]:
import tensorflow as tf

g = tf.Graph()
with g.as_default() as g:
    
    x = tf.placeholder(dtype=tf.float32, shape=None, name='x')
    w = tf.Variable(initial_value=2, dtype=tf.float32, name='w')
    b = tf.Variable(initial_value=1, dtype=tf.float32, name='b')
    
    u = x * w
    v = u + b
    a = tf.nn.relu(v)
    
    d_a_b = tf.gradients(a, b)
    d_a_u = tf.gradients(a, u)
    d_a_v = tf.gradients(a, v)
    d_a_w = tf.gradients(a, w)
    d_a_x = tf.gradients(a, x)
    
    
with tf.Session(graph=g) as sess:
    sess.run(tf.global_variables_initializer())
    grads = sess.run([d_a_b, d_a_u, d_a_v, d_a_w, d_a_x], feed_dict={'x:0': 3})

print(grads)


[[1.0], [1.0], [1.0], [3.0], [2.0]]


Here, `d_a_b` denotes "partial derivative of a with respect to b" and so forth.

In PyTorch, this is a bit more tricky (or let's say, the functionality is a bit more "hidden"), but it's not inconvenient. Based on the suggestion by Adam Paszke (https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7?u=rasbt), we can use "hooks" with a little helper function, `save_grad` and a `hook` closure writing the results to a global variable `grads`.

> The hook will be called every time a gradient with respect to the variable is computed.  (http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.register_hook)

So, if we invoke the `backward` method on the output node `a`, all the intermediate gradients will be collected in `grads`, as illustrated below:

In [3]:
import torch
from torch.autograd import Variable
import torch.nn.functional as F

In [4]:
import torch
import torch.nn.functional as F
from torch.autograd import Variable


grads = {}
def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook


x = Variable(torch.Tensor([3]).view(1, 1), requires_grad=True)
w = Variable(torch.Tensor([2]).view(1, 1), requires_grad=True)
b = Variable(torch.Tensor([1]).view(1, 1), requires_grad=True)

u = x * w
v = u + b

x.register_hook(save_grad('d_a_x'))
w.register_hook(save_grad('d_a_w'))
b.register_hook(save_grad('d_a_b'))
u.register_hook(save_grad('d_a_u'))
v.register_hook(save_grad('d_a_v'))

a = F.relu(v)

a.backward()

grads

{'d_a_b': Variable containing:
  1
 [torch.FloatTensor of size 1x1], 'd_a_u': Variable containing:
  1
 [torch.FloatTensor of size 1x1], 'd_a_v': Variable containing:
  1
 [torch.FloatTensor of size 1x1], 'd_a_w': Variable containing:
  3
 [torch.FloatTensor of size 1x1], 'd_a_x': Variable containing:
  2
 [torch.FloatTensor of size 1x1]}

While this looks like a workaround at first, note that gradients are not saved on purpose during a regular `backward` call without hooks, as Soumith Chintala pointed out:

> By default, gradients are only retained for leaf variables. non-leaf variables’ gradients are not retained to be inspected later. This was done by design, to save memory. (https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/2)