# **Automatic Differentiation**

**Import autograd and create a variable and attach gradient**

In [1]:
import torch
from torch.autograd import Variable

x = Variable(torch.arange(4, dtype=torch.float32).reshape((4, 1)), requires_grad=True)
print(x)

tensor([[0.],
        [1.],
        [2.],
        [3.]], requires_grad=True)


Now compute $y=2\mathbf{x}^T\mathbf{x}$, by placing code inside a *with torch.enable_grad()* block

In [2]:
with torch.enable_grad():
    y = 2 * torch.matmul(x.T, x)

print(y)

tensor([[28.]], grad_fn=<MulBackward0>)


**Backward**

In [3]:
y.backward()

**Get the gradient**

Given $y=2\mathbf{x}^T\mathbf{x}$, we know $\frac{\partial y}{\partial \mathbf{x}}=4\mathbf{x}$

In [4]:
# Check if each graient value x.grad is equal to 4*x
print((x.grad - 4 * x).norm().item() == 0)
print(x.grad)

True
tensor([[ 0.],
        [ 4.],
        [ 8.],
        [12.]])


**Backward on non-scalar**

Unlike MxNet, in Pytoch grad can be implicitly created only for scalar outputs.

**Computing the hradient of Python control flow**

Autograd also works with Python functions and control flows.

In [5]:
def f(a):
    b = a * 2
    while b.norm().item() < 1000:
        b = b * 2
    if b.sum().item() > 0:
        c = b
    else:
        c = 100 * b
    return c

**Function behaviors depends on inputs**

In [6]:
a = torch.randn((1,))
a.requires_grad=True
with torch.enable_grad():
  d = f(a)
d.backward()

**Verify the results**
$f$ is piecewise linear in its input $a$. There exists $g$ such as $f(a) = ga$ and $\frac{\partial f}{\partial a}=g$. Verify the result:

In [7]:
print(a.grad == (d / a))

tensor([True])


**Head gradients and the chain rule**

We can break the chain rule manually. Assume $\frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \frac{\partial y}{\partial x}$. *y.backward()* will only compute $\frac{\partial y}{\partial x}$. To get $\frac{\partial z}{\partial x}$, we can first compute $\frac{\partial z}{\partial y}$, and then pass it as head gradient to *y.backward*.