# Autograd: automatic differentiation

The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

In [None]:
import torch

Create a tensor:

In [None]:
# Create a 2x2 tensor with gradient-accumulation capabilities
x = torch.tensor([[1, 2], [3, 4]], requires_grad=True, dtype=torch.float32)
print(x)

Do an operation on the tensor:

In [None]:
# Deduct 2 from all elements
y = x - 2
print(y)

``y`` was created as a result of an operation, so it has a ``grad_fn``.



In [None]:
print(y.grad_fn)

In [None]:
# What's happening here?
print(x.grad_fn)

In [None]:
# Let's dig further...
y.grad_fn

In [None]:
y.grad_fn.next_functions[0][0]

In [None]:
y.grad_fn.next_functions[0][0].variable

In [None]:
# Do more operations on y
z = y * y * 3
a = z.mean()  # average

print(z)
print(a)

In [None]:
# Let's visualise the computational graph! (thks @szagoruyko)
from torchviz import make_dot

In [None]:
make_dot(a)

## Gradients

Let's backprop now `out.backward()` is equivalent to doing `out.backward(torch.tensor([1.0]))`

In [None]:
# Backprop
a.backward()

Print gradients $\frac{\text{d}a}{\text{d}x}$.




In [None]:
# Compute it by hand BEFORE executing this
print(x.grad)

You can do many crazy things with autograd!
> With Great *Flexibility* Comes Great Responsibility

In [None]:
# Dynamic graphs!
x = torch.randn(3, requires_grad=True)

y = x * 2
i = 0
while y.data.norm() < 1000:
    y = y * 2
    i += 1
print(y)

In [None]:
# If we don't run backward on a scalar we need to specify the grad_output
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

In [None]:
# BEFORE executing this, can you tell what would you expect it to print?
print(i)

## Inference

In [None]:
# This variable decides the tensor's range below
n = 3

In [None]:
# Both x and w that allows gradient accumulation
x = torch.arange(1., n + 1, requires_grad=True)
w = torch.ones(n, requires_grad=True)
z = w @ x
z.backward()
print(x.grad, w.grad, sep='\n')

In [None]:
# Only w that allows gradient accumulation
x = torch.arange(1., n + 1)
w = torch.ones(n, requires_grad=True)
z = w @ x
z.backward()
print(x.grad, w.grad, sep='\n')

In [None]:
x = torch.arange(1., n + 1)
w = torch.ones(n, requires_grad=True)

# Regardless of what you do in this context, all torch tensors will not have gradient accumulation
with torch.no_grad():
    z = w @ x

try:
    z.backward()  # PyTorch will throw an error here, since z has no grad accum.
except RuntimeError as e:
    print('RuntimeError!!! >:[')
    print(e)

## More stuff

Documentation of the automatic differentiation package is at
http://pytorch.org/docs/autograd.