# Lecture2


## Backpropagation
 We mentioned in lecture 1 that PyTorch can compute the gradient to any function. In this case we have indirections.
 PyTorch builds a computation graph and uses the chain rule
 we introduce with an example

 As an example, consider the following computations:

 $a=2$
 
 $b=4$
 
 $c=a+b$
 
 $d=\log(a)*\log(b)$

 $e=c*d$

 When a gradient is needed, PyTorch builds a **computational graph** to perform the operations as shown in the figure below


![Comp-graph-1](comp-graph1.png)


 Suppose that we need to compute the gradient of $e$ with respect to $a$ and $b$. Using the chain rule we know that
 $$\frac{\partial e}{\partial a}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial a}+\frac{\partial e}{\partial d}\frac{\partial d}{\partial a}$$
 $$\frac{\partial e}{\partial b}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial b}+\frac{\partial e}{\partial d}\frac{\partial d}{\partial b}$$

When the gradient is needed, PyTorch creates auxiliary nodes in the computation graph to help with the gradient computation. Considering that for every operation performed, PyTorch already knows its derivative, PyTorch builds as shown in the figure below.


![Comp-graph-2](comp-graph2.png)

Once the graph is built with all the necessary nodes, PyTorch "Walks backward" in the computational graph to compute the gradients as shown in the figure below.

![backprop](backprop.png)

Below is the code corresponding for the above computations. Let us consider it step by step.
1. After defining $a$ and $b$, with ```requires_grad=True``` we are declaring that we need to compute the gradient as some point.
1. ```c=a+b``` a node is created for $c$ where the operation is ```+``` and the inputs are $a$ and $b$. Automatically, PyTorch creates two auxillary nodes (actually they are functions, but it is easier to visualize as nodes) that compute $\frac{\partial c}{\partial a}$ and  $\frac{\partial c}{\partial b}$
1. $d=\log a*\log b$. As before, PyTorch creates a node for $d$ where the operations are ```*``` and ```log```. (Actually it creates the subtree shown below but for the sake of simplicity we visualize it as a single node) and, crucially, PyTorch already knows the derivative for all the primitive operations.
1. ```e=c*d``` this similar to the second step ```c=a+b``` but the operation now is ```*```.
1. Finally ```e.backward()``` is called which starts from node ```e``` and recursively computes the gradient using the already created nodes.
1. For example, there are two paths leading to a from e. The first one gives 1x2x1 and the second 1x6x1.44

In [None]:
import torch
a=torch.tensor(2.,requires_grad=True)
b=torch.tensor(4.,requires_grad=True)
c=a+b
d=torch.log(a)*torch.log(b)
e=c*d
e.backward()
print(a.grad,b.grad)