# Pytorch: Intro:

### what is pytorch?

-> [Pytorch](https://pytorch.org/) is an *Open Source* machine learning liabrary for python based on Tourch developed by Facebook. (see [Wiki definition](https://en.wikipedia.org/wiki/PyTorch)).

It provides:

1.   Tensor computition like Numpy which able to get run by GPU.

2.   Deep Neural network build on tape-based autodiff i.e. automatic differentiation (also called as auto-grad for automatic gradients) system which enables you to write relations between tensors functionally and differentiate throgh them.

### What is tensor?

-> Tensor is nothing but the higher dimenssion matrix. But why they are called as tensor instead of matrix. The reason is that matrix represents high-dimentional arrays but tensor also represents the relation between the numbers it contains. For example [Cauchy stress tensor](https://en.wikipedia.org/wiki/Cauchy_stress_tensor) every point of which represents the state of stress at a point in deformed material.

-> Lets first install Pytorch.
-> Google collabe doesn't have pytorch defaulty installed, you need to install it.
-> Most stable version os 1.0.0
-> Run this command in cell : "***!pip install torch==1.0.0*"

In [0]:
#!pip install torch==1.0.0

In [0]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# Initializing Tensors

This section introduces the different ways of initializing the tensor.

Just similar to the numpy you can initialize tensor as follows:

In [58]:
x = torch.ones(3,2) # 3 rows and 2 colomns
print(x)
x = torch.zeros(3,2)
print(x)
x = torch.rand(3,2)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[0.8094, 0.8136],
        [0.1963, 0.5464],
        [0.2261, 0.9688]])


In [59]:
x = torch.empty(3,2) # It will just create a space for the tnsor (3,2) without initializing it.
print(x)

# But you can fill the above empty tensor as follows:
y = torch.zeros_like(x)
print(y)

tensor([[2.6699e-35, 0.0000e+00],
        [0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])


In [60]:
x = torch.linspace(0, 1, steps=5) # linspace statnds for linear space. x contains number from 0 to 1 inclusively and equali devided by steps = 5
print(x)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [61]:
x = torch.tensor([[1,2], [3,4], [5,6]]) # hardcoded initializartion with desired values
print(x)

tensor([[1, 2],
        [3, 4],
        [5, 6]])


# Reshaping

In [62]:
print(x)
y = x.view(3, 2) # it reshapes x as (3,2) which previously has shape (2, 3) and stores it in y
print(y)

tensor([[1, 2],
        [3, 4],
        [5, 6]])
tensor([[1, 2],
        [3, 4],
        [5, 6]])


In [63]:
x = y.view(6, -1)
print(x)

tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6]])


# Simple pytorch operations

In [64]:
x = torch.ones(3,2)
y = torch.ones(3,2)

z = x + y # It performs element wise addition and resulting in tensor having same shape i.e. 3,2
print(z)

z = x - y # It performs element wise subtraction
print(z)

z = x * y # It performs element wise multiplication
print(z)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [65]:
z = y.add(x) # As tensor variable acts as an object it has add function. It stores addition result in z and both x, y remain unchanged.

print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [66]:
z = y.add_(x) # It stores the result of addition in both z and y

print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])


# Numpy <> Pytorch

In [67]:
x_np = x.numpy() # converting Pytorch tensor to numpy tensor
print(type(x_np), type(x))
print(x_np)
print(x)

<class 'numpy.ndarray'> <class 'torch.Tensor'>
[[1. 1.]
 [1. 1.]
 [1. 1.]]
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [68]:
a = np.random.randn(3,2)
a_pt = torch.from_numpy(a)
print(type(a_pt), type(a))
print(a_pt)
print(a)

<class 'torch.Tensor'> <class 'numpy.ndarray'>
tensor([[-0.4486, -0.8917],
        [ 1.2347,  1.1371],
        [-0.1117, -0.9747]], dtype=torch.float64)
[[-0.44856405 -0.89170255]
 [ 1.23473346  1.13712243]
 [-0.1116943  -0.97466147]]


In [69]:
# Here when we try to add any value in a it is also get added in to a_pt. See the results

np.add(a, 1, out=a)
print(a)
print(a_pt)

[[0.55143595 0.10829745]
 [2.23473346 2.13712243]
 [0.8883057  0.02533853]]
tensor([[0.5514, 0.1083],
        [2.2347, 2.1371],
        [0.8883, 0.0253]], dtype=torch.float64)


In [70]:
# execution time of numpy
%%time

for i in range(100):
  a = np.random.randn(100,100)
  b = np.random.randn(100,100)
  c = a+b

CPU times: user 88.8 ms, sys: 1.8 ms, total: 90.6 ms
Wall time: 95.1 ms


In [71]:
# execution time of pytorch 
%%time

for i in range(100):
  a = torch.randn([100,100])
  b = torch.randn([100,100])
  c = a+b
  
# Pytorch is 3x faster than numpy

CPU times: user 19.7 ms, sys: 2.1 ms, total: 21.8 ms
Wall time: 22.6 ms


# CUDA Support

In [72]:
# First check the number of CUDA supported devices
print(torch.cuda.device_count())

'''
At first time it shows you zero count as there is no GPU device is connected.
So to use it go to Edit -> Notebook settings -> Select GPU from drop down button.
After that the notebooks gets reconnected to lab.
So run all the cells again sequestially and then this cell.
You can see the count as 1 as CUDA device is available now.
'''

1


'\nAt first time it shows you zero count as there is no GPU device is connected.\nSo to use it go to Edit -> Notebook settings -> Select GPU from drop down button.\nAfter that the notebooks gets reconnected to lab.\nSo run all the cells again sequestially and then this cell.\nYou can see the count as 1 as CUDA device is available now.\n'

In [73]:
print("Reference to the object within torch:  ", torch.cuda.device(0))
print("\nDevice name: ", torch.cuda.get_device_name(0))

Reference to the object within torch:   <torch.cuda.device object at 0x7f94b4495588>

Device name:  Tesla T4


In [0]:
cuda0 = torch.device('cuda:0') # get device

In [75]:
'''
Here we have given an additional argument as device=cuda0. 
This indicates that it uses device cuda0 for initilizing a and b.
'''

a = torch.randn(3, 2, device=cuda0)
b = torch.randn(3, 2, device=cuda0)

c = a+b

'''
Eventually when we add a and b it gets performed on GPU device cuda0 and 
result c also get initialized on cuda0. 
You can see the result C and you will the device='cuda:0'.
'''
print(c,"\n\n")


tensor([[ 0.8998, -1.3132],
        [ 3.2904,  1.6229],
        [ 0.9953,  1.2194]], device='cuda:0') 




In [76]:
# execution time of numpy
%%time

for i in range(10): 
  a = np.random.randn(10000,10000)
  b = np.random.randn(10000,10000)
  c = a+b

CPU times: user 1min 29s, sys: 547 ms, total: 1min 29s
Wall time: 1min 29s


In [77]:
# execution time of pytorch
%%time

for i in range(10): 
  a = torch.randn(10000,10000)
  b = torch.randn(10000,10000)
  c = a+b

CPU times: user 18.8 s, sys: 117 ms, total: 18.9 s
Wall time: 18.9 s


In [78]:
# execution time of pytorch on GPU
%%time

for i in range(100): # make 10 iterations while you run this cell
  a = torch.randn(10000,10000, device=cuda0)
  b = torch.randn(10000,10000, device=cuda0)
  c = a+b
 
'''You can see pytorch on GPU is 1000x faster than CPU'''

CPU times: user 6.22 ms, sys: 19 ms, total: 25.3 ms
Wall time: 24.6 ms


# Autograd

Automatic computation of gradients.

In [79]:
x = torch.ones(3, 2, requires_grad=True) # an additional argument spacifies that it requires to calculate gradient so, it will automatically calculate gradients as we can it see next
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)


In [80]:
# Here y is in linear relation with x

y = x + 5
print(y)

# as y in in linear relation with x, it will calculate derivative of y with respect to x (which itself requires gradient) spacified by grad_fn=<AddBackward0>. 

tensor([[6., 6.],
        [6., 6.],
        [6., 6.]], grad_fn=<AddBackward0>)


In [81]:
# we will write another function of y as z
z = y*y + 1
print(z)

tensor([[37., 37.],
        [37., 37.],
        [37., 37.]], grad_fn=<AddBackward0>)


In [82]:
t = torch.sum(z) 
print(t)

# Here book-keeping is done and it spacified that this is last function as it its an inbuilt sum function of torch.

tensor(222., grad_fn=<SumBackward0>)


In [0]:
# Now you can see all the function starting from x to t as a forward pass in neural networks
# Where each operation is performed at each layer.

# Now lets do backward pass or backpropagation. 
# we can do this using backward() function

t.backward()

# at this time it shows nothing but; it perform gradient operation on back

In [84]:
# now we can see the results using x.grad means derivative of t w.r.t. x. 
# As z is a function of y and y is a function of x and hence z is a function of x

print(x.grad)

tensor([[12., 12.],
        [12., 12.],
        [12., 12.]])


$ t = \sum_i z_i $

$ z_i = y_i^2 + 1$
 
 $ y_i = x_i+5$

$\frac{\partial t}{\partial x_i} = \frac{\partial z_i}{\partial x_i} = \frac{\partial z_i}{\partial y_i} \frac{\partial y_i}{\partial x_i} $

$\frac{\partial z_i}{\partial y_i} = 2y_i$  $and$ $  \frac{\partial y_i}{\partial x_i} =1 $

$ when$ $x_i = 1$ $ y_i = 6$ $ and $ $ \frac{\partial t}{\partial x_i} = \frac{\partial z_i}{\partial x_i} = 2y_i = 12$

Thus we can see that function backward uses chain rule.

In [85]:
# lets see another example
x = torch.ones(3, 2, requires_grad=True)

y = x + 1

z = 1/(1+torch.exp(-y)) # sigmoid function 

print(z)

r = torch.sum(z)

r.backward()

tensor([[0.8808, 0.8808],
        [0.8808, 0.8808],
        [0.8808, 0.8808]], grad_fn=<MulBackward0>)


$ r = \sum_i z_i$

$ z_i = \frac{1}{1 + e^{-y_i}}$


$ y_i = x_i + 1$

$\frac{\partial r}{\partial x_i} = \frac{\partial z_i}{\partial x_i} = \frac{\partial z_i}{\partial y_i} \frac{\partial y_i}{\partial x_i} $

$\frac{\partial z_i}{\partial y_i} = z(1-z)$ 
[See the derivative of sigmoid here](http://www.ai.mit.edu/courses/6.892/lecture8-html/sld015.htm)

$ \frac{\partial y_i}{\partial x_i} = 1$

$ \frac{\partial r}{\partial x_i} = \frac{\partial z_i}{\partial x_i} = \frac{\partial z_i}{\partial y_i} = z(1-z) = 0.1050 $






In [86]:
print(x.grad)

tensor([[0.1050, 0.1050],
        [0.1050, 0.1050],
        [0.1050, 0.1050]])


# Autograd Example

In [0]:
x = torch.randn([20, 1], requires_grad=True)

y = 3*x - 2

In [0]:
w = torch.tensor([1.], requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

y_hat = w*x + b  # a linear function

loss = torch.sum((y_hat - y)**2) # loss function

In [89]:
print(loss)

tensor(192.9307, grad_fn=<SumBackward0>)


In [0]:
loss.backward()

In [91]:
print(w.grad, b.grad) # as the gradient is -ve we need to increase the weights i.e. moving opposite to gradient

tensor([-20.7903]) tensor([114.7603])


# In loops

-> Write the above autograd algorithm in loop and make it a learning alogorithm

In [92]:
learning_rate = 0.01

w = torch.tensor([1.], requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

print(w.item(), b.item())

for i in range(10):
  
  x = torch.randn(3, 2)
  
  y = 3*x -2 
  
  y_hat = w*x + b
  
  loss = torch.sum((y_hat - y)**2)
  
  loss.backward()
  
  '''
    Now we have done forward pass as well as backward pass but; pytorch doesn't have any idea of it. 
    We need to updated the weight now but; pytorch still consider it as next equation in forward pass so it will take those update equations in gradient calculations which shouldn't be cosidered as per gradient descent algorithms.
    Hence to avoid this mistake we will use torch.no_grad().
  '''
  
  # updating weights
  with torch.no_grad():
    w -= learning_rate * w.grad
    b -= learning_rate * b.grad
    
    '''
      we need to make gradients of w and b zero for refreshing them in next iterations
    '''
    w.grad.zero_()
    b.grad.zero_()
    
  print(w.item(), b.item()) # we get the final value of w and b in last iteration which will fit into equation.

1.0 1.0
1.3342381715774536 0.6081753969192505
1.5825026035308838 0.24793660640716553
1.660771369934082 0.04386614263057709
1.8801357746124268 -0.2932453751564026
1.9151567220687866 -0.4846447706222534
2.1496729850769043 -0.7035585641860962
2.180433511734009 -0.8516274690628052
2.224364995956421 -1.0012438297271729
2.2460215091705322 -1.107055902481079
2.275125503540039 -1.1307240724563599


# For large problems

In [101]:
%%time

learning_rate = 0.001

N = 1000000

epochs = 200

w = torch.randn([N], requires_grad=True)
b = torch.randn([1], requires_grad=True)

#print(torch.mean(w).item(), b.item())

x = torch.randn([N])

#print(x.shape)


for i in range(epochs):
  
  x = torch.randn([N])
  
  y = torch.dot(3*torch.ones([N]), x) - 2
  
  y_hat = torch.dot(w,x) + b
  
  loss = torch.sum((y_hat - y)**2)
  
  loss.backward()
  
  
  # updating weights
  with torch.no_grad():
    w -= learning_rate * w.grad
    b -= learning_rate * b.grad
    
    
    w.grad.zero_()
    b.grad.zero_()
    
  

print("Final Values of w and b: ", torch.mean(w).item(), b.item(), "\n\n") # we get the final value of w and b in last iteration which will fit into equation.


Final Values of w and b:  nan nan 


CPU times: user 2.8 s, sys: 1.06 s, total: 3.87 s
Wall time: 3.88 s


In [102]:
%%time

learning_rate = 0.001

N = 1000000

epochs = 200

w = torch.randn([N], requires_grad=True, device=cuda0)
b = torch.randn([1], requires_grad=True, device=cuda0)

#print(torch.mean(w).item(), b.item())

x = torch.randn([N])

#print(x.shape)


for i in range(epochs):
  
  x = torch.randn([N], device=cuda0)
  
  y = torch.dot(3*torch.ones([N], device=cuda0), x) - 2
  
  y_hat = torch.dot(w,x) + b
  
  loss = torch.sum((y_hat - y)**2)
  
  loss.backward()
  
  
  # updating weights
  with torch.no_grad():
    w -= learning_rate * w.grad
    b -= learning_rate * b.grad
    
    
    w.grad.zero_()
    b.grad.zero_()
    
  

print("Final Values of w and b: ", torch.mean(w).item(), b.item(), "\n\n") # we get the final value of w and b in last iteration which will fit into equation.


Final Values of w and b:  nan nan 


CPU times: user 148 ms, sys: 47.5 ms, total: 196 ms
Wall time: 202 ms
