<a href="https://colab.research.google.com/github/AvishekRoy16/DeepLearning/blob/master/6-Pytorch/Pytorch-Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Outline
* PyTorch
* What are tensors
* Initialising, slicing, reshaping tensors
* Numpy and PyTorch interfacing
* GPU support for PyTorch + Enabling GPUs on Google Colab
* Speed comparisons, Numpy -- PyTorch -- PyTorch on GPU
* Autodiff concepts and application
* Writing a basic learning loop using autograd
* Exercises

In [2]:
import torch
import numpy as np
import matplotlib.pyplot as plt

Tensor is a kind of datastructure just like vector and matrics (list and dataframed/2D Lists).  
They have a higher order and also many tensors have relation between them.

## Initialise Tensors

In [74]:
# Makes the tensors of the specified dimentions and fill them with ones
x = torch.ones(3,2)
print(x)

# Makes the tensors of the specified dimentions and fill them with zeros
x = torch.zeros(3, 2)
print(x)

# Makes the tensors of the specified dimentions and fill them with random numbers
x = torch.rand(3, 2)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[0.9788, 0.2605],
        [0.8890, 0.1032],
        [0.2772, 0.7433]])


In [75]:
# Will create space for the dimentions spesified but will not initialise value in it
x = torch.empty(3, 2)
print(x)

# if we want to give something the same shape as another tensor we can do that
y = torch.zeros_like(x)
print(y)

tensor([[6.5650e+28, 1.7788e+25],
        [1.3425e+13, 1.1168e+33],
        [2.5348e-09, 1.0765e+21]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])


In [76]:
# Create a linearspace start, end, steps - start and end are included
x = torch.linspace(0, 1, steps=5)
print(x)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [77]:
# Manually defining the tensors
x = torch.tensor([[1, 2], 
                 [3, 4], 
                 [5, 6]])
print(x)

tensor([[1, 2],
        [3, 4],
        [5, 6]])


## Slicing tensors

In [78]:
# Dimentions of the tensors are fiven insde a list
print(x.size())

# slicing[rows: column]
# Take all rows and print the column of id 1
print(x[:, 1]) 
# Take the 0th roaw and print all the column in that
print(x[0, :])

# All the rules for slicing in list apply to tensors as well

torch.Size([3, 2])
tensor([2, 4, 6])
tensor([1, 2])


In [79]:
# We are accessing a particular element from the x rows we are accessing the element in the 
# first row and first column, The data type of the element still remains tensor
y = x[1, 1]
print(y)
# To change the data type of the element while accessing it from tensor to it's actual datatype.
print(y.item())

tensor(4)
4


## Reshaping tensors

Dimentions play a very important role in machine learning and we have to keep track of what we are multiplying with what when we are trying to do matrix multiplications and other operation that erquire the dimentions of the tensors to be correct

In [80]:
# To view the tensor in another dimentions we can use views - views(row, column)
print(x)
y = x.view(2, 3)
print(y)

tensor([[1, 2],
        [3, 4],
        [5, 6]])
tensor([[1, 2, 3],
        [4, 5, 6]])


In [81]:
# We can reshape it when we know pnly one of the dimentions and 
# it will pick and appropriate number to put in the second dimention, to do that we have to fill -1 in
# the dimention we do not know the number
y = x.view(6,-1) 
print(y)

tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6]])


## Simple Tensor Operations

In [82]:
# Simple operations in Tesors
x = torch.ones([3, 2])
y = torch.ones([3, 2])
z = x + y
print(z)
z = x - y
print(z)
z = x * y
print(z)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [83]:
# z is being updated by adding x to y. Here y reamins the same and is not updated
z = y.add(x)
print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [84]:
# Addition in place
# We are taking y and then adding x to it and updating y in the process
z = y.add_(x)
print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])


## Numpy <> PyTorch

In [85]:
# Interfacing Numpy and Pytorch

# Converted tensor into numpy
x_np = x.numpy()
print(type(x), type(x_np))
print(x_np)

<class 'torch.Tensor'> <class 'numpy.ndarray'>
[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [86]:
# Converting a numpy array into a tenors
a = np.random.randn(5)
print(a)
a_pt = torch.from_numpy(a)
print(type(a), type(a_pt))
print(a_pt)
# This is less of copying and more a bridge between the two as if we make changes into numpy,
# it will be reflected in tensor

[ 0.40097206 -0.37719441 -0.30400602  0.85176317 -2.5799248 ]
<class 'numpy.ndarray'> <class 'torch.Tensor'>
tensor([ 0.4010, -0.3772, -0.3040,  0.8518, -2.5799], dtype=torch.float64)


In [87]:
np.add(a, 1, out=a)
print(a)
print(a_pt) 

[ 1.40097206  0.62280559  0.69599398  1.85176317 -1.5799248 ]
tensor([ 1.4010,  0.6228,  0.6960,  1.8518, -1.5799], dtype=torch.float64)


In [88]:
%%time
# Checking the time taken to loop and add random numbers using numpy arrays
for i in range(100):
  a = np.random.randn(100,100) # (100,100) is the matrix size
  b = np.random.randn(100,100)
  c = np.matmul(a, b)

Wall time: 429 ms


In [89]:
%%time
# Checking the time taken to loop and add random numbers using tensors
for i in range(100):
  a = torch.randn([100, 100])
  b = torch.randn([100, 100])
  c = torch.matmul(a, b)

# Note we are still not using the GPU.

Wall time: 501 ms


In [90]:
%%time
for i in range(10):
  a = np.random.randn(10000,10000)
  b = np.random.randn(10000,10000)
  c = a + b

Wall time: 1min 19s


In [91]:
%%time
for i in range(10):
  a = torch.randn([10000, 10000])
  b = torch.randn([10000, 10000])
  c = a + b

Wall time: 23 s


## CUDA support

In [92]:
print(torch.cuda.device_count())

1


In [93]:
print(torch.cuda.device(0))
print(torch.cuda.get_device_name(0))

<torch.cuda.device object at 0x000002377BD370A0>
NVIDIA GeForce MX150


In [94]:
cuda0 = torch.device('cuda:0')

In [95]:
a = torch.ones(3, 2, device=cuda0)
b = torch.ones(3, 2, device=cuda0)
c = a + b
print(c)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]], device='cuda:0')


In [96]:
print(a)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], device='cuda:0')


In [97]:
%%time
# Time comparison between numpy, cpu and gpu performance for matrix addition
for i in range(10):
  a = np.random.randn(10000,10000)
  b = np.random.randn(10000,10000)
  np.add(b, a)

Wall time: 1min 17s


In [98]:
%%time
for i in range(10):
  a_cpu = torch.randn([10000, 10000])
  b_cpu = torch.randn([10000, 10000])
  b_cpu.add_(a_cpu)

Wall time: 21.8 s


In [99]:
%%time
for i in range(10):
  a = torch.randn([10000, 10000], device=cuda0)
  b = torch.randn([10000, 10000], device=cuda0)
  b.add_(a)

Wall time: 2.66 s


In [100]:
%%time
# Time comparison between numpy, cpu and gpu performance for matrix multiplication
for i in range(10):
  a = np.random.randn(10000,10000)
  b = np.random.randn(10000,10000)
  np.matmul(b, a)

Wall time: 7min 14s


In [101]:
%%time
for i in range(10):
  a_cpu = torch.randn([10000, 10000])
  b_cpu = torch.randn([10000, 10000])
  torch.matmul(a_cpu, b_cpu)

Wall time: 3min 11s


In [102]:
%%time
for i in range(10):
  a = torch.randn([10000, 10000], device=cuda0)
  b = torch.randn([10000, 10000], device=cuda0)
  torch.matmul(a, b)

Wall time: 14.7 s


## Autodiff

This feature lets us calculate the gradient automatically

In [3]:
# requires grad = True is spefied when the tensor is used in autodiff
# so we might later differenciate therse wrt x
x = torch.ones([3,2], requires_grad = True)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)


Tensor on a data-structure level is about storing these multidimentional matrices but further at a structural level it also relates different tensors with each other.  
The ability to model different high dimentional matrices is what makes tensors what they are

In [4]:
# when we do this y automatically understands that y is a function of x which itself requires gradients
y = x + 5
print(y)

tensor([[6., 6.],
        [6., 6.],
        [6., 6.]], grad_fn=<AddBackward0>)


In [5]:
# so we are stackin another fuction on top of y
z = y*y + 1
print(z)

tensor([[37., 37.],
        [37., 37.],
        [37., 37.]], grad_fn=<AddBackward0>)


In [6]:
# torch.sum simply adds all the numbers in the tensors
t = torch.sum(z)
print(t)
# We can think of this as a forward pass we have been doing
# The book-keeping is being kept bu pyTorch and it tells that the last operation being done on the tensor was sum operation

tensor(222., grad_fn=<SumBackward0>)


In [7]:
# At this point we are ready to do a backward pass
t.backward()
# nothing is shown in the output, pyTorch is internally doing some computations 

In [8]:
# x.grad is the derivative of t wrt x
# We had taken the backward starting from t so that becomes the fucntion that we want to diffrenciate
# ans we want do diffrenciate it against x upon which we are calling the grad
print(x.grad)

tensor([[12., 12.],
        [12., 12.],
        [12., 12.]])


Logic for why the derivative of t wrt x was 12:  
$t = \sum_i z_i,    z_i = y_i^2 + 1,    y_i = x_i + 5$

$\frac{\partial t}{\partial x_i} = \frac{\partial z_i}{\partial x_i} = \frac{\partial z_i}{\partial y_i} \frac{\partial y_i}{\partial x_i} = 2y_i \times 1$


At x = 1, y = 6, $\frac{\partial t}{\partial x_i} = 12$

So here we can see that we are getting the partial derivative of t wrt all x's.  
We can now write any cascading set on functions on a given set of inputs,  
we can call trig functions, tanh, log, even things like standard deviation mean and so on, 
then compute derivative wrt inputs

This is numerically being computed at the poin we have initialised our values. So in this particular example, x is initialised at 1's.

In [9]:
# Another example, x and y values replacing z by r where r is taking the sigmoid of y
x = torch.ones([3, 2], requires_grad=True)
y = x + 5
r = 1/(1 + torch.exp(-y))
print(r)
s = torch.sum(r)
s.backward()
print(x.grad)

tensor([[0.9975, 0.9975],
        [0.9975, 0.9975],
        [0.9975, 0.9975]], grad_fn=<MulBackward0>)
tensor([[0.0025, 0.0025],
        [0.0025, 0.0025],
        [0.0025, 0.0025]])


We were earlier writing the forward pass and backward pass ourselves and were implementing our knowlege of the derivative of sigmoids, tanh etc. now we are letting pyTorch do it for us automatically. So it's quite a powerful thing in that sense!

In [10]:
# We can do the above diffrensiation in this manne too, just that here instad of taking r into s and summing it to get one value
# we define 'a' with 1's and the same shape of x.
# So basicallly we are avoiding calling the sum fuction
x = torch.ones([3, 2], requires_grad=True)
y = x + 5
r = 1/(1 + torch.exp(-y))
a = torch.ones([3, 2])
r.backward(a)
print(x.grad)

tensor([[0.0025, 0.0025],
        [0.0025, 0.0025],
        [0.0025, 0.0025]])


r.backward is computing the derivative of r wrt x, but it multiplies point wise with derivative the value of 'a' which we  have taken as an argument in r.backward   
so we are doing $\frac{\partial{s}}{\partial{r}}$ and multiplying poin wise with 'a'.  

This feature is there so that we are able to cascade our chain rule through multiple fucntions  

$\frac{\partial{s}}{\partial{x}} = \frac{\partial{s}}{\partial{r}} \cdot \frac{\partial{r}}{\partial{x}}$

For the above code $a$ represents $\frac{\partial{s}}{\partial{r}}$ and then $x.grad$ gives directly $\frac{\partial{s}}{\partial{x}}$



id we want to calculate $\frac{\partial{s}}{\partial{x}}$ then it is given by chain rule $\frac{\partial{s}}{\partial{r}} \cdot \frac{\partial{r}}{\partial{x}}$  
So we want to move from s to, so we first move from s to r and then form r to x.  
r to x is given by r.backward, but if we have already computed s to r and stored that in a then point wise multiplying a with $\frac{\partial{r}}{\partial{x}}$ will directly give us $\frac{\partial{s}}{\partial{x}}$  
In this case we dont have a s we want to concider so, When a is a submission we will use torch.ones