# Pytorch Autogradient

Reference:

[Autograd Api Doc](https://pytorch.org/docs/stable/autograd.html)

[Back Propagation](https://medium.com/ai-academy-taiwan/bacn-propagation-3946e8ed8c55)

During neural networn tarining, gradient decent is a common way of determining how the weights within a neural networn should be adjusted. By calculating the parital derivatives of the loss function to a given weight, we can see the direction that a given weight should be adjusted. When the partial derivative reaches 0, the given weight is possibly at its optimal value.

Consider the following simple neural networn with 2 input features and 2 layers:

<img src="img/SimpleNeuralNetwork.jpg" width="900" height="400">


To find the inpact of $w^{1}_{11}$ on the loss function, calculate the partial derivative 
$\frac{\partial L}{\partial w^{1}_{11}}$

\begin{align}
\hat{y}:ground truth\\
L = \frac{1}{2n} \sum_{k=1}^n (y - \hat{y})^2, n=batch size\\
\text{Assume batch size n = 1, then}L = \frac{1}{2}(y - \hat{y})^2 \\
\end{align}

\begin{align}
\frac{\partial L}{\partial w^{1}_{11}} &= \frac{\partial L}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^{1}_{11}}\\
&= \frac{\partial L}{\partial z^3_1} (\frac{\partial z^3_1}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} \frac{\partial z^2_1}{\partial a^1_1} \frac{\partial a^1_1}{\partial z^1_1} \frac{\partial z^1_1}{\partial w^1_1} + \frac{\partial z^3_1}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} \frac{\partial z^2_2}{\partial a^1_1} \frac{\partial a^1_1}{\partial z^1_1} \frac{\partial z^1_1}{\partial w^1_1}) \\
&= \frac{\partial L}{\partial z^3_1} (\frac{\partial z^3_1}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} \frac{\partial z^2_1}{\partial a^1_1} + \frac{\partial z^3_1}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} \frac{\partial z^2_2}{\partial a^1_1})*( \frac{\partial a^1_1}{\partial z^1_1} \frac{\partial z^1_1}{\partial w^1_1}) \\
&= \frac{\partial L}{\partial z^3_1} (\sum_{n=1}^2\frac{\partial z^3_1}{\partial a^n_1} \frac{\partial a^n_1}{\partial z^n_1} \frac{\partial z^n_1}{\partial a^1_1}) * ( \frac{\partial a^1_1}{\partial z^1_1} \frac{\partial z^1_1}{\partial w^1_1}) \text{ since } y = z^3_1 \text{ then } \partial z^3_1 = \partial y\\

&= \frac{\partial L}{\partial y} (\sum_{n=1}^2\frac{\partial z^3_1}{\partial a^n_1} \frac{\partial a^n_1}{\partial z^n_1} \frac{\partial z^n_1}{\partial a^1_1}) * ( \frac{\partial a^1_1}{\partial z^1_1} \frac{\partial z^1_1}{\partial w^1_1})\\
\end{align}

\begin{align}
\text{subsitude}\\
&\frac{\partial L}{\partial y} = \frac{\partial}{\partial y}[\frac{1}{2}(y-\hat{y})^2] = y - \hat{y}\\
&\frac{\partial z^3_1}{\partial a^2_1}=\frac{\partial}{\partial a^2_1}{(w^3_{11} a^2_1 + w^3_{12} a^2_2)} = w^3_{11}\\
&\frac{\partial a^n_1}{\partial z^n_1} = \frac{\partial f(z^n_1)}{\partial z^n_1}
\end{align}

\begin{align}
\frac{\partial L}{\partial w^{1}_{11}} = (y - \hat{y}) \sum_{n=1}^2(w^3_{11}\frac{\partial f(z^n_1)}{\partial z^n_1} w^2_{11})
\end{align}

In [2]:
import torch

x = torch.arange(0, 10, 1, dtype = torch.float32, requires_grad = True)
print(x)
x = x + 1
print(x)
x **= 2
print(x)

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], requires_grad=True)
tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       grad_fn=<AddBackward0>)
tensor([  1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81., 100.],
       grad_fn=<PowBackward0>)
