#### ❇️ Automatic differentiation in PyTorch

Back propagation is widely used for training neural nets. During back propagation we adjust model<br> parameters (weights & biases) based on the gradient of the loss function w.r.t. the given parameter.<br> ∂loss/∂w (gradient w.r.t weight w); ∂loss/∂b (gradient w.r.t. bias b)

##### 🔴 Let's consider a simplest neural net with 3 inputs and 2 outputs 👇 

![](./resources/basic_neural_net.png)

##### 🟡 Let's code ⬆️  this using pytorch ⬇️ 

In [1]:
import torch

x = torch.ones(3)  # input tensor
y = torch.zeros(2)  # expected output
# Notice the use of requires_grad = True ⬇️ 
w = torch.randn(3, 2, requires_grad=True) # weights 
b = torch.randn(2, requires_grad=True) # biases 
z = torch.matmul(x, w)+b # output
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Above ⬆️ code represents the following ⬇️ computationl graph <br>
In this graph, w and b are parameters, which we need to optimize. <br> 
Thus, we need to be able to compute the gradients of loss function with respect to those variables. <br>
In order to do that, we set the <b>requires_grad</b> property of those tensors. <br>

![](./resources/computational_graph.png)

##### 🟢 grad_fn:
grad_fn is an object of class Function that is applied to tensors to construct computational graph ⬆️ .<br> This object knows how to compute the function in the forward direction, and also how to compute its <br>derivative during the backward propagation step.<br> grad_fn becomes property of a tensor. Check this out 👇 

In [6]:
print(f"grad_function for z = {z.grad_fn}")
print(f"grad_function for loss = {loss.grad_fn}")

grad_function for z = <AddBackward0 object at 0x7fd9d92fe748>
grad_function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x7fd9d92fe5c0>


##### 🔵 Computing Gradients
To optimize weights (w) of our neural network, we need to compute we need ∂loss/∂w (gradient of loss w.r.t weight w);<br> ∂loss/∂b (gradient of loss w.r.t. bias b) under some fixed values of x and y. <br>
This is how we do it 👇 

In [3]:
# loss.backward() Computes the gradient of current tensor w.r.t. graph leaves.
# In the graph we see that the leaves are w and b (ones for which required_grad = True)
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.2980, 0.4528],
        [0.2980, 0.4528],
        [0.2980, 0.4528]])
tensor([0.2980, 0.4528])
