# Torch autodiff introduction
## Autodiff
Load needed libraries
$\newcommand\p[1]{{\left(#1\right)}}$
$\newcommand\code[1]{\texttt{#1}}$

In [1]:
import numpy as np

In [2]:
import torch
import torch.optim as optim

Here is a simple example of how to find the minimum of the function
$x\mapsto\p{x-3}^2$ using the autodiff functionality of Pytorch.

First initialize a tensor `x` and indicate that we want to store a
gradient on it.

In [3]:
x = torch.tensor([1.0], requires_grad=True)

Create an optimizer on parameters. Here we want to optimize w.r.t.
variable `x`:

In [4]:
optimizer = optim.SGD([x], lr=0.01)

Create a computational graph using parameters (here only `x`) and
potentially other tensors.

Here we only want to compute $\p{x-3}^2$ so we define:

In [5]:
y = (x - 3) ** 2

Back-propagating gradients for `y` down to `x`. Don't forget to
reset gradients before.

In [6]:
optimizer.zero_grad()

In [7]:
y.backward()

Use gradient on `x` to apply a one-step gradient descent.

In [8]:
x.grad

tensor([-4.])

In [9]:
x

tensor([1.], requires_grad=True)

In [10]:
optimizer.step()

In [11]:
x.grad

tensor([-4.])

In [12]:
x

tensor([1.0400], requires_grad=True)

And last we iterate the whole process

In [13]:
x = torch.tensor([1.0], requires_grad=True)
y = (x - 3) ** 2
optimizer = optim.SGD([x], lr=0.01)
it = 0
while it < 1000:
    loss = (x - 3) ** 2
    # set gradient to zero, else it will accumulate the gradients
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if it % 20 == 0:
        print('Iteration: %d, x: %f, loss: %f' % (it, x.item(), loss.item()))
    it += 1

Iteration: 0, x: 1.040000, loss: 4.000000
Iteration: 20, x: 1.691488, loss: 1.782802
Iteration: 40, x: 2.126427, loss: 0.794596
Iteration: 60, x: 2.416796, loss: 0.354152
Iteration: 80, x: 2.610648, loss: 0.157846
Iteration: 100, x: 2.740066, loss: 0.070352
Iteration: 120, x: 2.826466, loss: 0.031356
Iteration: 140, x: 2.884147, loss: 0.013975
Iteration: 160, x: 2.922656, loss: 0.006229
Iteration: 180, x: 2.948364, loss: 0.002776
Iteration: 200, x: 2.965527, loss: 0.001237
Iteration: 220, x: 2.976985, loss: 0.000552
Iteration: 240, x: 2.984635, loss: 0.000246
Iteration: 260, x: 2.989742, loss: 0.000110
Iteration: 280, x: 2.993152, loss: 0.000049
Iteration: 300, x: 2.995428, loss: 0.000022
Iteration: 320, x: 2.996948, loss: 0.000010
Iteration: 340, x: 2.997962, loss: 0.000004
Iteration: 360, x: 2.998639, loss: 0.000002
Iteration: 380, x: 2.999091, loss: 0.000001
Iteration: 400, x: 2.999393, loss: 0.000000
Iteration: 420, x: 2.999595, loss: 0.000000
Iteration: 440, x: 2.999730, loss: 0.0

## Differentiate the exponential
The exponential function can be approximated using its Taylor
expansion:
\\[\exp\p{z}\approx\sum_{k=0}^{N}\frac{z^k}{k!}\\]

First define `z`, the "parameter" and build a computational graph
from it to compute the exponential.

In [14]:
z = torch.tensor([1.0], requires_grad=True)
N = 10
fk = 1
zk = 1
expz = 0
for k in range(N):
    expz = expz + zk/fk
    zk = zk * z
    fk = fk * (k + 1)

Compute the gradient and verify that it is correct

In [15]:
expz.backward()
z.grad

tensor([2.7183])

In [16]:
np.exp(1)

2.718281828459045

## Solving equations with Pytorch
Suppose we want to solve the following system of two equations
\\[e^{-e^{-(x_1 + x_2)}} = x_2 (1 + x_1^2)\\]
\\[x_1 \cos(x_2) + x_2 \sin(x_1) = 1/2\\]

Find a loss whose optimization leads to a solution of the system of
equations above.

In [17]:
def f1(x1, x2):
    return torch.exp(-torch.exp(-(x1 + x2))) - x2 * (1 + x1 ** 2)
def f2(x1, x2):
    return x1 * torch.cos(x2) + x2 * torch.sin(x1) - 0.5
x1 = torch.tensor([0.0], requires_grad=True)
x2 = torch.tensor([0.0], requires_grad=True)

loss = f1(x1, x2)**2 + f2(x1, x2)**2

Use Pytorch autodiff to solve the system of equations

In [18]:
optimizer = optim.SGD([x1, x2], lr=0.01)
it = 0
while it < 1000:
    loss = f1(x1, x2)**2 + f2(x1, x2)**2
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if it % 20 == 0:
        print('Iteration: %d, x1: %f, x2: %f, loss: %f' % (it, x1.item(), x2.item(), loss.item()))
    it += 1

Iteration: 0, x1: 0.007293, x2: 0.004651, loss: 0.385335
Iteration: 20, x1: 0.134238, x2: 0.107664, loss: 0.250012
Iteration: 40, x1: 0.229074, x2: 0.214341, loss: 0.146439
Iteration: 60, x1: 0.294770, x2: 0.309024, loss: 0.078318
Iteration: 80, x1: 0.335219, x2: 0.384967, loss: 0.040020
Iteration: 100, x1: 0.356608, x2: 0.442270, loss: 0.020582
Iteration: 120, x1: 0.365716, x2: 0.484327, loss: 0.010968
Iteration: 140, x1: 0.368040, x2: 0.515040, loss: 0.006060
Iteration: 160, x1: 0.367179, x2: 0.537610, loss: 0.003429
Iteration: 180, x1: 0.365141, x2: 0.554350, loss: 0.001966
Iteration: 200, x1: 0.362903, x2: 0.566874, loss: 0.001134
Iteration: 220, x1: 0.360872, x2: 0.576303, loss: 0.000656
Iteration: 240, x1: 0.359175, x2: 0.583435, loss: 0.000380
Iteration: 260, x1: 0.357814, x2: 0.588844, loss: 0.000221
Iteration: 280, x1: 0.356749, x2: 0.592954, loss: 0.000128
Iteration: 300, x1: 0.355924, x2: 0.596081, loss: 0.000074
Iteration: 320, x1: 0.355290, x2: 0.598461, loss: 0.000043
Ite