# Pytorch Fundamental
## 2. Variables and Gradients
## 2.1 Variables (**Deprecated**) `since pytorch 0.4`
* a Variable wraps a Tensor
* Allows accumulation of gradients

import Variable from torch.autograd

In [1]:
import torch
from torch.autograd import Variable

In [2]:
var = Variable(torch.ones(2, 2), requires_grad=True)
var

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

___
# Current Pytorch Version doesn't need variable
**Tensors and Variables have merged**

In earlier versions of PyTorch, the `torch.autograd.Variable` class was used to create tensors that support gradient calculations and operation tracking but as of PyTorch v0.4.0 Variable class has been deprecated. `torch.Tensor` and `torch.autograd.Variable` are now the same class. More precisely, `torch.Tensor` is capable of tracking history and behaves like the old `Variable`

Source : 
* https://pytorch.org/blog/pytorch-0_4_0-migration-guide/

## Create Tensor with Gradient :
**Add `requires_grad = True`**

In [3]:
tensor_grad = torch.ones((2, 2), requires_grad = True)
tensor_grad

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

**Tensor withoud gradient**

By default, tensor created in torch has param `requires_grad = False` 

In [4]:
tensor_nograd = torch.ones((2, 2))
tensor_nograd

tensor([[1., 1.],
        [1., 1.]])

**To Check tensor use gradient or not** :

Use syntax : `tensor.requires_grad`

In [5]:
tensor_grad.requires_grad

True

In [6]:
tensor_nograd.requires_grad

False

___
## 2.2 Gradients
### What is gradient ?
`requires_grad` indicates whether a variable is trainable. By default, requires_grad is False in creating a Variable. If one of the input to an operation requires gradient, its output and its subgraphs will also require gradient. To fine tune just part of a pre-trained model, we can set requires_grad to False at the base but then turn it on at the entrance of the subgraphs that we want to retrain.

**Behave similarly to tensors :** <br>
* First we Create 2 tensors

In [7]:
# Method 1
torch_twos = torch.tensor([[2., 2.],[2., 2.]], requires_grad = True)

# NOTE : tensor_twos we use value '2.' instead of just 2 (integer)
# Because Only Tensors of floating point dtype can require gradients

torch_twos

tensor([[2., 2.],
        [2., 2.]], requires_grad=True)

In [8]:
# Method 2
torch_ones = torch.ones((2, 2))
torch_ones.requires_grad_(True)

# NOTE : we use function `requires_grad_` instead of `requires_grad` 
# to do inplace function as seen on previous lecture (its a common pytorch thing)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

### Standard Math Operation
The Behave is similar with usual tensor, but in this case, the gradient will track the tensor operation, you can see in the `grad_fn`

In [9]:
torch_add = torch_ones + torch_twos
torch_add

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

In [10]:
torch_mul = torch.mul(torch_ones, torch_twos)
torch_mul

tensor([[2., 2.],
        [2., 2.]], grad_fn=<MulBackward0>)

In [11]:
torch_matmul = torch.matmul(torch_ones, torch_twos)
torch_matmul

tensor([[4., 4.],
        [4., 4.]], grad_fn=<MmBackward>)

**Tensor with gradient, after operation will have `grad_fn`**

In [12]:
torch_add.grad_fn

<AddBackward0 at 0x19a4ad3ab00>

In [13]:
torch_mul.grad_fn

<MulBackward0 at 0x19a4ad3ae48>

In [14]:
torch_matmul.grad_fn

<MmBackward at 0x19a4ad421d0>

___
## 2.3 Manually and Automatically Calculating Gradients
**`requires_grad` will allows calculation of gradients with respect to the variable**

_Example :_ 
$$y_i = 5(x_i+1)^2$$

In [15]:
x = torch.ones(2, requires_grad=True)
x

tensor([1., 1.], requires_grad=True)

**Check if `x` has gradient**

In [16]:
print(x.grad)

None


Now, let's do sample equation :

_if :_
$$x=1$$
_then_
$$y_i\bigr\rvert_{x_i=1} = 5(1 + 1)^2 = 5(2)^2 = 5(4) = 20$$

In [17]:
y = 5 * (x+1) ** 2
y

tensor([20., 20.], grad_fn=<MulBackward0>)

**Since `y` is multi element tensor, and Backward prop only support `scalar` so backward will show error on `y`**

In [18]:
# Will showing error since y it's not scalar yet
y.backward()

RuntimeError: grad can be implicitly created only for scalar outputs

**Backward should be called only on a scalar (i.e. 1-element tensor) or with gradient with respect to the variable**
- Let's reduce `y` to a scalar by taking the **mean** of `y`

$$Output = \frac{1}{len(y)}\sum_i y_i$$

$$Output = \frac{1}{2}\sum_i y_i$$

In [19]:
output = (1/len(y)) * torch.sum(y)
output

tensor(20., grad_fn=<MulBackward0>)

**Recap `y` equation :**
<center> $y_i = 5(x_i+1)^2$ </center>

**Recap `output` equation (using `mean` of `y`)**:
<center>  $output = \frac{1}{len(y)}\sum_i y_i$ </center>

**Substitute `y` into `output` equation**:
<center>  $output = \frac{1}{len(y)} \sum_i 5(x_i+1)^2$ </center>
<br>
$$\frac{\partial o}{\partial x_i} = \frac{1}{len(y)}[10(x_i+1)]$$
<br>
$$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{1}{len(y)}[10(1 + 1)] = \frac{10}{2}(2) = 10$$

In [20]:
output.backward()

**Now check whether x has gradient**

In [21]:
print(x.grad)

tensor([10., 10.])


---
# Summary
- Tensor with Gradients
    - Wraps a tensor for gradient accumulation
- Gradients
    - Define original equation
    - Substitute equation with `x` values
    - Reduce to scalar output, `o` through `mean`
    - Calculate gradients with `o.backward()`
    - Then access gradients of the `x` variable through `x.grad`