<a href="https://colab.research.google.com/github/dixiong777/DL_Pytorch/blob/master/Autograd__Automatic_Differentiation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Autograd: Automatic Differentiation

---

Key python package for the neural network on GPU is `autograd`. This notebook will briefly go over it. 

More details on [Here](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py).

### 1. Tensor and Functions

* `.detach()`: stop a tensor from tracking history and prevent future computation from being tracked.

* `.requires_grad = True` and `.backward()`: track all operations and have all gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

  **Note**: If the tensor is a scalar, we don't need to specify any arguments to `.backward()`, however, if there is more than one element, we need to specify the `gradient` argument matching the shape of the tensor.


* `.grad_fn`: Each tensor has a `.grad_fn` attribute that references a `Function` that created this tensor (**except** for tensor created by the user directly.)

In [21]:
import torch

In [22]:
# Create a tensor and set requires_grad = True
x = torch.ones(2, 2, requires_grad = True)
print(x)
print(x.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None


In [23]:
# Do an operation
y = x + 2
print(y)
print(y.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f978f61c0b8>


In [24]:
# Do more operations
z = y * y * 3
out = z.mean()
print(z, '\n', out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) 
 tensor(27., grad_fn=<MeanBackward0>)


In [25]:
# `.requires_grad()` default as False
# Cannot get the grad function 
a = torch.randn(2, 2)
print(a, a.requires_grad, a.grad_fn)

a = ((a * 3) / (a - 1))
print(a, a.requires_grad, a.grad_fn)

b = (a * a).sum()
print(b.grad_fn)

# After update, we can get the grad function.
a.requires_grad_(True)
print(a.requires_grad)

b = (a * a).sum()
print(b.grad_fn)

tensor([[-0.1634, -0.6950],
        [ 0.8751,  1.1640]]) False None
tensor([[  0.4213,   1.2301],
        [-21.0248,  21.2918]]) False None
None
True
<SumBackward0 object at 0x7f978f608198>


### 2. Gradients and Vector-Jacobian Product
Main component for the Neural Networks.

* Have the `.backward()` first and then calculate the grad.

  **Note**: Only can call `.backward()` once and calcualte the grad on the the leaf tensor when `.requires_grad = True` using `.grad` method. Otherwise, use `.retain_grad` for non-leaf tensor.

* `torch.autograd` is an engine for computing
vector-Jacobian product. That is, given any vector
$v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}$,
compute the product $v^{T}\cdot J$. 

In [26]:
# Backward.
# Have the requires_grad = True first.
x = torch.ones(2, 2, requires_grad = True)
y = x + 2
z = y * y * 3
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward0>)


For the tensor `out`. We have $out = \frac{1}{4}\sum_i z_i$ where $z_i = 3 * (x_i + 2)^2$. Therefore for all $i = 1, 2, 3, 4$,
$$\frac{\partial out}{\partial x_i}|_{x_i = 1} = \frac{3}{2} * (x_i + 2) = 4.5$$


In [27]:
# Jacobian Matrix: Gradients d(out) / dx
out = z.mean()
out.backward()
print(x.grad)
print(y.retain_grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
<bound method Tensor.retain_grad of tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)>


Suppose $\vec{y}=f(\vec{x})$ with the gradient of $\vec{y}$ with respect to $\vec{x}$
as a Jacobian matrix:

\begin{align}J=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\end{align}


Assign $v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$ where, $l=g\left(\vec{y}\right)$. 
Based on **the chain rule**, the vector-Jacobian product would be the
gradient of $l$ with respect to $\vec{x}$:

\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\left(\begin{array}{c}
   \frac{\partial l}{\partial y_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial y_{m}}
   \end{array}\right)=\left(\begin{array}{c}
   \frac{\partial l}{\partial x_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial x_{n}}
   \end{array}\right)\end{align}

In [28]:
# An example of vector-Jacobian product:
x = torch.randn(3, requires_grad = True)
y = x * 2
while y.data.norm() < 1000:
  y = y * 2

print(y)

# Sine y is a vector, we only can get the vector-jacobian product
v = torch.tensor([0.1, 1.0, 0.0001], dtype = torch.float)
y.backward(v)
print(x.grad)

tensor([-1408.8143,   136.5251,   675.6753], grad_fn=<MulBackward0>)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


Stop autograd from tracking history on Tensors with .requires_grad = True by 

* The code block `with torch.no_grad()`.
  
  **Note**: It only works for the no-leaf variables. 

* Using `.detach()` to get a new Tensor with the same content.

In [29]:
# with toch.no_rad block.
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
  print(x.requires_grad)
  print((x ** 2).requires_grad)

True
True
True
False


In [30]:
# .detach()
# After detached, it is as requires.grad = False by default.
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)

# Test whether the content is still the same.
print(x.eq(y).all())

True
False
tensor(True)
