<a href="https://colab.research.google.com/github/ashishpatel26/Pytorch-Learning/blob/main/Pytorch_Variable_and_Autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Variable
* The difference between pytorch and numpy is that it provides automatic derivation, which can automatically give you the gradient of the parameters you want. This operation is provided by another basic element, Variable.
![](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/dynamic_graph.gif)

* A Variable wraps a Tensor. It supports nearly all the API’s defined by a Tensor. 
* Variable also provides a backward method to perform backpropagation. For example, to backpropagate a loss function to train model parameter x, we use a variable loss to store the value computed by a loss function. 
* Then, we call loss.backward which computes the gradients ∂loss∂x for all trainable parameters. PyTorch will store the gradient results back in the corresponding variable x.
* Variable in torch is to build a computational graph, but this graph is dynamic compared with a static graph in Tensorflow or Theano.So torch does not have placeholder, torch can just pass variable to the computational graph.

In [1]:
import torch
from torch.autograd import Variable

* Build a **tensor**
* Build a **Variable, usually for compute Gradient**

In [2]:
tensor = torch.FloatTensor([[5,2],[6, 8]])
variable = Variable(tensor, requires_grad=True)

In [3]:
print(tensor)       # [torch.FloatTensor of size 2x2]
print(variable)     # [torch.FloatTensor of size 2x2]

tensor([[5., 2.],
        [6., 8.]])
tensor([[5., 2.],
        [6., 8.]], requires_grad=True)


* Till now the **tensor and variable** seem the same.However, the **variable is a part of the graph**, it's a part of the **auto-gradient**.

In [4]:
t_out = torch.mean(tensor * tensor)
v_out = torch.mean(variable * variable)

In [5]:
print(t_out)
print(v_out)

tensor(32.2500)
tensor(32.2500, grad_fn=<MeanBackward0>)


* Backpropagation from v_out
```
v_out = 1 / 4 * sum(variable * variable)
```
* the gradients w.r.t the variable, 
```
d(v_out)/d(variable) = 1/4*2*variable = variable/2
```

In [6]:
v_out.backward()
print(variable.grad)

tensor([[2.5000, 1.0000],
        [3.0000, 4.0000]])


* This is data in **variable format**.

In [7]:
print(variable) # variable with require gradient format

tensor([[5., 2.],
        [6., 8.]], requires_grad=True)


* This is data in **tensor format**.


In [8]:
print(variable.data) # Variable with tensor format on original data

tensor([[5., 2.],
        [6., 8.]])


* This is in **numpy format**

In [9]:
print(variable.data.numpy())

[[5. 2.]
 [6. 8.]]


### Torch.AutoGrad
* `torch.autograd` is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

### Background
Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

**Forward Propagation**: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

**Backward Propagation**: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. For a more detailed walkthrough of backprop, check out this

In [10]:
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

* Next step run the input data through model through each of its layers to make a prediction. This is the **forward pass**

In [11]:
prediction = model(data) # forward pass

* Now, Model's prediction and corresponding label to calculate the error.
* Next, Step is to backpropagate this error through network.

In [12]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

* Next step is to load an optimiser and this case we are applying SGD with learning rate 0.01 and momentum of 0.9

In [13]:
optim = torch.optim.SGD(model.parameters(), lr = 1e-02, momentum=0.9)

* Finally, we call `.step()` to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in `.grad.`

In [15]:
optim.step() #gradient descent

* _At this point, you have everything you need to train your neural network. The below sections detail the workings of autograd - feel free to skip them._