# Gradients with Pythorch

Torch include

`torch.autograd.backward()`

`torch.autograd.grad()`

**Autograd** provides automatic differentiation for all operations on Tensors. This is because operations become attributes of the tensor themselves. When a Tensor's `requires_grad` attribute is True, it starts to track all operations on it. When all operations finishes you can `.backward()` and have all the gradients computed automatically. The grafdients for a tensor will be accumulated into its `.grad` attribute.

## Brief introduction to Back_propagation on one step

A single polynomial function $y=f(x)$ to tensor $x$. Then, we'll backdrop and print the gradient $\frac{dy}{dx}$

$$
Function: y = 2x^4 + x^3 + 3x^2 + 5x + 1
\\
Derivative: y' = 8x^3 + 3x^2 + 6x + 5
$$

In [1]:
# First step. Imports
import torch

### Creation of the Tensor

In [4]:
# The tensor will requires requires_grad
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

### Defining the function

In [5]:
# The same function as in the description
y = 2*x**4 + x**3 + 3*x**2 + 5*x + 1

print(y)

tensor(63., grad_fn=<AddBackward0>)


$y$ is calculated from one operation, it has an associated gradient function accessible as y.grad_fn.

The value of y is 63 by this calculation.

$$
x = 2.0
\\
y = 2(2)^4 + (2)^3 + 5(2) + 1 = 32 + 8 + 12 + 10 + 1 = 63

$$

### Preparing backward

In [6]:
y.backward()
y

tensor(63., grad_fn=<AddBackward0>)

### Check the resulting gradient of x

In [7]:
x.grad

tensor(93.)

As `x.grad` is an attribute of tensor $x$ we don't use parentheses. The computation is the result of the gradient.

$$
y' = 8(2)^3 + 3(2)^2 + 6(2) + 5 = 64 + 12 + 12 + 5 = 93
$$

## Back propagation on multiple steps

We now incluude layers `y` and `z` between `x` and our output layer

In [8]:
# Creating a new tensor
x = torch.tensor([[1., 2, 3], [3, 2, 1]], requires_grad=True)
x

tensor([[1., 2., 3.],
        [3., 2., 1.]], requires_grad=True)

First layer:

$$
y = 3x + 2
$$

In [9]:
y = 3*x + 2
y

tensor([[ 5.,  8., 11.],
        [11.,  8.,  5.]], grad_fn=<AddBackward0>)

Create a second layer

$z = 2*y^2$

In [10]:
# z layer
z = 2 * y**2
print(z)

tensor([[ 50., 128., 242.],
        [242., 128.,  50.]], grad_fn=<MulBackward0>)


Let set the output to be a mean

In [11]:
output = z.mean()
output

tensor(140., grad_fn=<MeanBackward0>)

## Perform back propagation

Finding the gradient of $x$ with respect to $out$

In [12]:
output.backward()

# x changes!
x.grad

tensor([[10., 16., 22.],
        [22., 16., 10.]])

The way it works is calculating the partial derivative of $out$ with respect to $x_i$ as follows:

Where n = 6

$$
output = \frac{1}{n} \sum^n_{i=1} z_i
\\
z_i = 2(y_i)^2 = 2(3x_i + 2)^2
$$

To solve the derivative of $z_i$ we use the chain rule, where the derivative of $f(g(x)) = f'(g(x))g'(x)$

In this case:

$$
\begin{split} f(g(x)) &= 2(g(x))^2, \quad &f'(g(x)) = 4g(x)
\\
g(x) &= 3x+2, &g'(x) = 3
\\
\frac {dz} {dx} &= 4g(x)\times 3 &= 12(3x+2)
\end{split}
$$

Therefore:

$ \frac{\partial o}{\partial x_i} = \frac{1}{6} * 12(3x + 2)$

$ \frac{\partial o}{\partial x_i} \bigr\rvert_{x_i=1} = 2(3(1) + 2) = 10$

$ \frac{\partial o}{\partial x_i} \bigr\rvert_{x_i=2} = 2(3(2) + 2) = 16$

$ \frac{\partial o}{\partial x_i} \bigr\rvert_{x_i=2} = 2(3(3) + 2) = 22$

