## Autograd

In [1]:
import torch

### Creation of a tensor
> When we are creating a tensor we should specify `requires_grad` to be `True` so as to track all the operations on the tensor

In [2]:
x = torch.randn(4, requires_grad=True)
x

tensor([-0.2457,  0.7481,  0.1907, -1.7052], requires_grad=True)

> Now we have a tensor with `requires_grad=True`. Which means we are able to track all operations on our tensor `x`

In [4]:
y = x+2
y

tensor([1.7543, 2.7481, 2.1907, 0.2948], grad_fn=<AddBackward0>)

In [5]:
y.grad_fn

<AddBackward0 at 0x160f0892640>

In [6]:
z = y.mean()
z

tensor(1.7470, grad_fn=<MeanBackward0>)

### Gradient Calculation
> first of all we need to call the `.backward()`

In [7]:
z.backward() ## backward propagation

> To calculate the gradient we call the `x.grad` this will calculate the gradient of `dz/dx`

> It computes partial derivates while applying the chain rule

> Generally speaking, torch.autograd is an engine for computing vector-Jacobian product

In [11]:
x.grad

tensor([0.2500, 0.2500, 0.2500, 0.2500])

### Non-Scaler output
* From the previous example we just called `x.grad` because our operation on `z` returns a scalar value, which was the mean.
* In the event that we do some operations on the tensor that doesn't return a scaler value, our backward function should accept a `vector, matrix or tensor` with the same shape as the `output` itself.


In [23]:
x = torch.randn(4, requires_grad=True)
y = x**2
y, y.shape

(tensor([2.5166, 0.5973, 0.5338, 1.5008], grad_fn=<PowBackward0>),
 torch.Size([4]))

> as we can see our `y` shape is not a scalar value it is a vector, so to calculate the gradiend of `y` with respect to `x` we our `.backward()` should take a vector of random numbers with size `4`

In [25]:
v = torch.rand(4)
y.backward(v)

In [26]:
x.grad

tensor([-1.8745,  0.0334, -1.1328, -0.8105])

### Last Example
> Let's have an example on the situation where we have a matrix as our ouput for our `operation`

In [27]:
x = torch.randn((2, 5), requires_grad=True)
x

tensor([[ 0.8216,  1.0936,  0.4323, -0.7128,  0.2093],
        [ 1.6032, -1.8102, -2.8202, -0.9106,  2.0954]], requires_grad=True)

In [30]:
y = x + 8
y, y.shape

(tensor([[ 8.8216,  9.0936,  8.4323,  7.2872,  8.2093],
         [ 9.6032,  6.1898,  5.1798,  7.0894, 10.0954]], grad_fn=<AddBackward0>),
 torch.Size([2, 5]))

> Creating a matrix of size `[2, 5]` to call `y.backward`

In [32]:
m = torch.rand([2,5])
y.backward(m)

In [33]:
x.grad

tensor([[0.5457, 0.2883, 0.3765, 0.0206, 0.0860],
        [0.1082, 0.7485, 0.8342, 0.2105, 0.4323]])

### Stopping tensors from tracking History
>  For example during our training loop when we want to update our ``weights`` then this update operation should not be part of the ``gradient`` computation

There are three ways we can stop the `pytorch` from tacking the history of our tensor which are:

1. ``x.requires_grad_(False)`` - set the required_grad to false inplace

2. `x.detach()`
3. ``with torch.no_grad()`` - wrap our operation in a with statement

In [34]:
a = torch.randn(2, requires_grad=True)
b = torch.randn(2, requires_grad=True)
c = torch.randn(2, requires_grad=True)
a, b, c

(tensor([ 0.8408, -0.6605], requires_grad=True),
 tensor([ 0.1434, -1.3224], requires_grad=True),
 tensor([0.2439, 0.7129], requires_grad=True))

### ``x.requires_grad_(False)``

In [35]:
a.requires_grad_(False)
a

tensor([ 0.8408, -0.6605])

### `x.detach()`

In [36]:
b.detach()

tensor([ 0.1434, -1.3224])

### ``with torch.no_grad()``

In [37]:
with torch.no_grad():
    c = c+0
c

tensor([0.2439, 0.7129])