# AUTOGRAD: Automatic Differentiation

The PyTorch's `autograd` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

## Tensors

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad():`. This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don’t need the gradients.

There’s one more class which is very important for autograd implementation - a `Function`.

`Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a `Function` that has created the `Tensor` (except for Tensors created by the user - their `grad_fn is None`).

If you want to compute the derivatives, you can call `.backward()` on a `Tensor`. If `Tensor` is a scalar (i.e. it holds one element data), you don’t need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `gradient` argument that is a tensor of matching shape.

In [0]:
import torch 
import numpy as np

### What is `.data`?

Read this [answer](https://stackoverflow.com/a/51744091/6644968) from stack overflow to get more insigths regarding `.data` attribute and also Pytorch's `Variable` wrapper which is deprecated now.

Furthermore,
`.data` attribute is important for updating a tensor (whose `required_grads` attribute is set to `True`) during backpropagation. 

For ex : 

We cannot update `f` directly as follows:

```
learning_rate = 0.01
for f in net.parameters():
    f.sub_(f.grad.data * learning_rate)
```

It will throw the following error:

```
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
```

So instead we should be updating its `data` attribute as follows:

```
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
```

In [2]:
test = torch.randn(5, 3)
print(test.data)
print(test.requires_grad)
test.requires_grad_() # This sets the requires_grad attribute to True for the tensor test in place.
print(test.requires_grad)

tensor([[-0.3807, -0.2505, -0.3975],
        [ 0.2366,  0.2425, -0.7856],
        [ 0.5951, -0.0046,  0.3324],
        [-0.9676, -0.4793, -0.4148],
        [-0.0694,  2.3514, -1.2304]])
False
True


### Difference between `torch.Tensor` and `torch.cuda.Tensor`

Before we continue with the `autograd` package let us look at some of the differences between the above two.
This [answer](https://stackoverflow.com/a/53630326/6644968) clears quite a few doubts regarding the differences.

In [3]:
# device will be 'cuda' if a GPU is available
# Try enabling GPU before running this cell

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# creating a CPU tensor
cpu_tensor = torch.rand(10)
# moving same tensor to GPU
gpu_tensor = cpu_tensor.to(device)

print(cpu_tensor)
print(gpu_tensor)
print("*" * 65)

print(cpu_tensor.dtype, type(cpu_tensor), cpu_tensor.type(), cpu_tensor.device)
print(gpu_tensor.dtype, type(gpu_tensor), gpu_tensor.type(), gpu_tensor.device)

# print(cpu_tensor * gpu_tensor)
# The above line throws the following error
# RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'

tensor([0.1740, 0.6659, 0.0572, 0.4530, 0.3958, 0.5776, 0.3269, 0.5295, 0.0776,
        0.0674])
tensor([0.1740, 0.6659, 0.0572, 0.4530, 0.3958, 0.5776, 0.3269, 0.5295, 0.0776,
        0.0674], device='cuda:0')
*****************************************************************
torch.float32 <class 'torch.Tensor'> torch.FloatTensor cpu
torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor cuda:0


There are two ways to push the cpu_tensor (which currently resides in CPU) to GPU.

1st Method: We can just change the `Tensor` type as follows:

It will automatically change the `device` attribute too.

In [4]:
dtype = torch.cuda.FloatTensor
cpu_tensor = cpu_tensor.type(dtype)

print(cpu_tensor.dtype, type(cpu_tensor), cpu_tensor.type(), cpu_tensor.device)
print(gpu_tensor.dtype, type(gpu_tensor), gpu_tensor.type(), gpu_tensor.device)

torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor cuda:0
torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor cuda:0


2nd Method: We can directly change the `device` attribute as `"cuda"`.
This will automatically change the `Tensor` type.

In [5]:
dtype = torch.FloatTensor
cpu_tensor = cpu_tensor.type(dtype) # Pushing back to cpu from gpu

cpu_tensor = cpu_tensor.to(torch.device("cuda:0"))
print(cpu_tensor.dtype, type(cpu_tensor), cpu_tensor.type(), cpu_tensor.device)
print(gpu_tensor.dtype, type(gpu_tensor), gpu_tensor.type(), gpu_tensor.device)

torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor cuda:0
torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor cuda:0


You can read more about the `Tensor` types and `Tensor` attributes here.

[`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)

[Tensor Attributes](https://pytorch.org/docs/stable/tensor_attributes.html)

[Get datatype of Tensor](https://stackoverflow.com/questions/53374499/get-the-data-type-of-a-pytorch-tensor)

Bottomline is you can follow two neat methods to switch from CPU to GPU

**Method 1:**
You can set the Tensor's type.

In [6]:
dtype = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
print(torch.zeros(2, 2).type(dtype))

tensor([[0., 0.],
        [0., 0.]], device='cuda:0')


**Method 2:** You can directly set the `device` attribute and use the `to()` method to switch between CPU and GPU



In [7]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.zeros(2, 2).to(device))

tensor([[0., 0.],
        [0., 0.]], device='cuda:0')


### Autograd

Without deviating too much, lets get back to the PyTorch's `autograd` package which is the main objective of this notebook.

Create a tensor and set `requires_grad=True` to track computation with it

In [8]:
x = torch.ones(2, 2, requires_grad = True)
print(x)
print(x.requires_grad)
print(x.data)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True
tensor([[1., 1.],
        [1., 1.]])


Do a tensor operation:

In [9]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


`y` was created as a result of an operation, so it has a `grad_fn`.

In [10]:
print(y.grad_fn)

<AddBackward0 object at 0x7f13c0f47a58>


Do more operations on `y`

In [11]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


`.requires_grad_( ... )` changes an existing Tensor’s `requires_grad` flag in-place. The input flag defaults to `False` if not given.

In [12]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7f13bf3182b0>


## Gradients

Let’s backprop now. Because `out` contains a single scalar, `out.backward()` is equivalent to `out.backward(torch.tensor(1.))`.

In [0]:
out.backward()

Print gradients `d(out)/dx`

In [14]:
print(x.grad)
print(x.grad.data)
x.grad.data.zero_()
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
tensor([[0., 0.],
        [0., 0.]])


The result obtained can be verified as follows :

![forward_prop](https://drive.google.com/uc?id=1DMHZ_fvfuH8k65ED-t6WSZ44aYc_yArA)

![backprop](https://drive.google.com/uc?id=1GMwUkT7vdXRpVYmWX7nVqm3aLLOi17To)

![alt text](https://drive.google.com/uc?id=1-EDfqeFC4Rg9cQYSyWunlhrEKMn10RoJ)

Now let’s take a look at an example of vector-Jacobian product:

In [15]:
x = torch.randn(3, requires_grad = True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ -40.3429, -435.6156, 1240.3335], grad_fn=<MulBackward0>)


Now in this case `y` is no longer a scalar. `torch.autograd` could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to `backward` as argument:

In [16]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


You can also stop autograd from tracking history on Tensors with `.requires_grad=True` either by wrapping the code block in `with torch.no_grad():`

In [17]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


Or by using `.detach()` to get a new Tensor with the same content but that does not require gradients:

In [18]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.requires_grad)
print(x.eq(y).all())

True
False
True
tensor(True)


**Read Later:**

Documentation of `autograd.Function` can be found at https://pytorch.org/docs/stable/autograd.html#function

# References

1. [Deep Learning with PyTorch: A 60 minute blitz, Soumith Chintala](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

2. [Automatic Differentiation Package - TORCH.AUTOGRAD](https://pytorch.org/docs/stable/autograd.html)

3. [TORCH](https://pytorch.org/docs/stable/torch.html)

4. [Stefan Otte: Deep Neural Networks with PyTorch | PyData Berlin 2018](https://www.youtube.com/watch?v=_H3aw6wkCv0&t=821s)

5. [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)

6. [Tensor Attributes](https://pytorch.org/docs/stable/tensor_attributes.html)

7. [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)