<h1 style="color:red;">What is Pytorch?</h1> 

    
It’s a Python based scientific computing package targeted at two sets of audiences:

Tensorial library that uses the power of GPUs
A deep learning research platform that provides maximum flexibility and speed
Tensor is a term for the generalization of vectors (a quantity that has a magnitude and direction, for example, 5 meters to the left) to multiple directions/dimensions (5 meters to the left and to the right). The number of directions is referred to as the rank of the tensor. If that is confusing, just think of them as matrices. Matrix multiplication pops up a lot in machine learning and so having a library that can implement matrices and perform matrix multiplication very fast is very nice to have...i.e. PyTorch!

In [173]:
import torch
x= torch.empty(2,3)
x

tensor([[1.6269e+19, 7.0065e-45, 0.0000e+00],
        [1.4013e-45, 1.3434e-21, 3.0837e-41]])

In [174]:
y=torch.rand(2,3)
y

tensor([[0.6926, 0.8677, 0.6430],
        [0.0356, 0.5421, 0.4181]])

In [175]:
z=torch.rand(2,3)
z

tensor([[0.4311, 0.3839, 0.5122],
        [0.6417, 0.3807, 0.6953]])

In [176]:
#We can also specify the data type
m=torch.ones(2,3, dtype=torch.float64)
m.dtype

torch.float64

In [177]:
y+z

tensor([[1.1237, 1.2516, 1.1552],
        [0.6773, 0.9228, 1.1134]])

In [178]:
torch.add(y,z)

tensor([[1.1237, 1.2516, 1.1552],
        [0.6773, 0.9228, 1.1134]])

In [62]:
y.add_(z)

tensor([[2.3744, 2.3891, 2.4122],
        [2.0069, 2.3241, 1.5685]])

In [67]:
y-z

tensor([[1.5649, 0.9584, 1.4722],
        [1.9860, 0.6574, 1.0572]])

In [66]:
torch.sub(y,z)

tensor([[1.5649, 0.9584, 1.4722],
        [1.9860, 0.6574, 1.0572]])

In [64]:
torch.mul(y,z)

tensor([[0.9610, 1.7090, 1.1337],
        [0.0210, 1.9368, 0.4010]])

In [172]:
# Lets print all the rows but one column and similarly all the column with 1 row
k=torch.rand(5,3)
print(k)
print(k[:,1:])
print(k[:,[0]])

tensor([[0.5620, 0.2868, 0.8868],
        [0.6812, 0.0260, 0.0513],
        [0.3546, 0.8236, 0.3618],
        [0.6191, 0.8358, 0.4818],
        [0.4800, 0.9501, 0.8236]])
tensor([[0.2868, 0.8868],
        [0.0260, 0.0513],
        [0.8236, 0.3618],
        [0.8358, 0.4818],
        [0.9501, 0.8236]])
tensor([[0.5620],
        [0.6812],
        [0.3546],
        [0.6191],
        [0.4800]])


In [73]:
# We can get the actual value of the particular row and column we can use .item()
k[0,0].item()

0.0445365309715271

In [75]:
#convert a tensor into numpy array
import numpy as np
a= torch.ones(1,5)
b= a.numpy()
a


tensor([[1., 1., 1., 1., 1.]])

In [76]:
b

array([[1., 1., 1., 1., 1.]], dtype=float32)

In [83]:
#Convert a numpy array to a tensor
c=np.ones(5)
d=torch.from_numpy(c)
d

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

In [84]:
# sometimes when the tensor is defined there is an argument requires_grad=True, by default it is false, it helps in optimization later on
e=torch.ones(3,4, requires_grad=True)
e

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], requires_grad=True)

<span style="color:#FF5733">This section will talk about the autograd and how we can calculate gradients from it. Gradient is important for optimization. 

In [129]:
x1=torch.randn(3, requires_grad=True)
x1

tensor([ 1.6793, -0.3925, -0.0202], requires_grad=True)

<span style="color:blue">The requires_grad=True argument tells PyTorch that we want to compute gradients with respect to this tensor during backpropagation

In [130]:
y1=x1+2
y1

tensor([3.6793, 1.6075, 1.9798], grad_fn=<AddBackward0>)

<span style="color:blue">grad_fn tells you that y1 was created by an addition operation, and PyTorch has recorded this operation as part of the computation graph enabling PyTorch to efficiently compute gradients

In [131]:
y2=x1*x1*2
y2

tensor([5.6404e+00, 3.0815e-01, 8.1969e-04], grad_fn=<MulBackward0>)

In [132]:
y3= y2.mean()
y3

tensor(1.9831, grad_fn=<MeanBackward0>)

<span style="color:blue">To calculate the gradient, all we need to now do is y3.backward() which will do dy3/dx1.. For scalar , no argument is needed but for vector we need to put the argument as same size as x1 

In [133]:
#y3.backward() # for scalar


In [134]:
v= torch.tensor([0.1,0.02,0.003], dtype=torch.float32)
y2.backward(v) #for vector

In [135]:
x1.grad

tensor([ 6.7174e-01, -3.1402e-02, -2.4293e-04])

<span style="color:blue">Let's say now that we don't want the requires_grad=true, so that pytorch wont track the history in computational graph: 
We essentially have three options: 
1) x1.requires_grad_(False), remember that whenever there is underscore _ it will modify the variable in place
2) x.detach()
3) with torch.no_grad()

In [140]:
x1.requires_grad_(False)
x1

tensor([ 1.6793, -0.3925, -0.0202])

In [141]:
y2.detach()


tensor([5.6404e+00, 3.0815e-01, 8.1969e-04])

<span style="color:blue">Let's look at the trainning iteration where we want to make sure that for each iteration the x1.grad computes the same value. It is done by setting the grad value to zero after each iteration.

In [145]:
weights=torch.ones(4, requires_grad=True)
for epoch in range(3):
    model_output=(weights*3).sum()
    model_output.backward()
    print(weights.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])


In [146]:
weights=torch.ones(4, requires_grad=True)
for epoch in range(3):
    model_output=(weights*3).sum()
    model_output.backward()
    print(weights.grad)
    weights.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


<span style="color:red"> Backpropagation

In [147]:
import torch
x=torch.tensor(1.0)
y=torch.tensor(2.0)

w=torch.tensor(1.0, requires_grad=True)

#Forward pass and  compute the loss
y_hat=w*x
loss=(y_hat-y)**2

print(loss)

#Backward pass

loss.backward()
print(w.grad)

## update weights



tensor(1., grad_fn=<PowBackward0>)
tensor(-2.)


<span style="color:blue"> For this part we will do everything manually only using numpy array and then translate the idea to pytorch ... We will start with our prediction and then compute gradient and then the loss and finally update the parameter


In [157]:
import numpy as np 

# f=w*x
X=np.array([1,2,3,4], dtype=np.float32)
Y=np.array([2,4,6,8], dtype=np.float32)

w=0.0

#model prediction
def forward(x): 
    return w*x

#loss =MSE

def loss(y,y_predicted):
    return((y_predicted-y)**2).mean()

#gradient : 
def gradient(x,y,y_predicted):
    return np.dot(2*x,y_predicted-y).mean()

print(f'Prediction before training: f(5)={forward(5):.3f}')

#Training
learning_rate = 0.01
n_iters=30

for epoch in range(n_iters):
    #prediction=forward pass 
    y_pred=forward(X)

    #loss
    l=loss(Y,y_pred)

    #gradient
    dw=gradient(X,Y,y_pred)

    #update weights
    w-=learning_rate*dw
    if epoch %2==0:
        print(f'epoch{epoch+1}: w={w:.3f}, loss= {l:.8f}')

print(f'Prediction after training : f(5) = {forward(5):.3f}')

    


    
    

Prediction before training: f(5)=0.000
epoch1: w=1.200, loss= 30.00000000
epoch3: w=1.872, loss= 0.76800019
epoch5: w=1.980, loss= 0.01966083
epoch7: w=1.997, loss= 0.00050331
epoch9: w=1.999, loss= 0.00001288
epoch11: w=2.000, loss= 0.00000033
epoch13: w=2.000, loss= 0.00000001
epoch15: w=2.000, loss= 0.00000000
epoch17: w=2.000, loss= 0.00000000
epoch19: w=2.000, loss= 0.00000000
epoch21: w=2.000, loss= 0.00000000
epoch23: w=2.000, loss= 0.00000000
epoch25: w=2.000, loss= 0.00000000
epoch27: w=2.000, loss= 0.00000000
epoch29: w=2.000, loss= 0.00000000
Prediction after training : f(5) = 10.000


<span style="color:blue"> Let's see how this translates to the pytorch framework by computing gradient using backward leaving the loss computation and forward pass as manually for now.

In [160]:
import torch

# f=w*x
X=torch.tensor([1,2,3,4], dtype=torch.float32)
Y=torch.tensor([2,4,6,8], dtype=torch.float32)

w=torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

#model prediction
def forward(x): 
    return w*x

#loss =MSE

def loss(y,y_predicted):
    return((y_predicted-y)**2).mean()

#gradient : 
def gradient(x,y,y_predicted):
    return np.dot(2*x,y_predicted-y).mean()

print(f'Prediction before training: f(5)={forward(5):.3f}')

#Training
learning_rate = 0.01
n_iters=100

for epoch in range(n_iters):
    #prediction=forward pass 
    y_pred=forward(X)

    #loss
    l=loss(Y,y_pred)

    #gradient= Backward pass
    l.backward() #dl/dw

    #update weights
    with torch.no_grad():
        w-=learning_rate*w.grad

    #zero gradient
    w.grad.zero_()
    
    if epoch %2==0:
        print(f'epoch{epoch+1}: w={w:.3f}, loss= {l:.8f}')

print(f'Prediction after training : f(5) = {forward(5):.3f}')

    


    
    

Prediction before training: f(5)=0.000
epoch1: w=0.300, loss= 30.00000000
epoch3: w=0.772, loss= 15.66018772
epoch5: w=1.113, loss= 8.17471695
epoch7: w=1.359, loss= 4.26725292
epoch9: w=1.537, loss= 2.22753215
epoch11: w=1.665, loss= 1.16278565
epoch13: w=1.758, loss= 0.60698116
epoch15: w=1.825, loss= 0.31684780
epoch17: w=1.874, loss= 0.16539653
epoch19: w=1.909, loss= 0.08633806
epoch21: w=1.934, loss= 0.04506890
epoch23: w=1.952, loss= 0.02352631
epoch25: w=1.966, loss= 0.01228084
epoch27: w=1.975, loss= 0.00641066
epoch29: w=1.982, loss= 0.00334642
epoch31: w=1.987, loss= 0.00174685
epoch33: w=1.991, loss= 0.00091188
epoch35: w=1.993, loss= 0.00047601
epoch37: w=1.995, loss= 0.00024848
epoch39: w=1.996, loss= 0.00012971
epoch41: w=1.997, loss= 0.00006770
epoch43: w=1.998, loss= 0.00003534
epoch45: w=1.999, loss= 0.00001845
epoch47: w=1.999, loss= 0.00000963
epoch49: w=1.999, loss= 0.00000503
epoch51: w=1.999, loss= 0.00000262
epoch53: w=2.000, loss= 0.00000137
epoch55: w=2.000, l

<span style="color:blue">As we can see that this method is not as efficient as doing it manually 

<span style="color:blue"> Now lets encorporate the loss and forward pass using pytorch module

In [164]:
import torch
import torch.nn as nn # neural network module
# f=w*x
X=torch.tensor([[1],[2],[3],[4]], dtype=torch.float32)
Y=torch.tensor([[2],[4],[6],[8]], dtype=torch.float32)

X_test=torch.tensor([5], dtype=torch.float32)
n_samples, n_features = X.shape
print(n_samples, n_features)

input_size=n_features
output_size=n_features
model=nn.Linear(input_size, output_size)

#loss =MSE
#gradient : 
def gradient(x,y,y_predicted):
    return np.dot(2*x,y_predicted-y).mean()

print(f'Prediction before training: f(5)={model(X_test).item():.3f}')

#Training
learning_rate = 0.01
n_iters=100

loss = nn.MSELoss()
optimizer= torch.optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(n_iters):
    #prediction=forward pass 
    y_pred=model(X)

    #loss
    l=loss(Y,y_pred)

    #gradient= Backward pass
    l.backward() #dl/dw

    #update weights
    optimizer.step()
    #zero gradient
    optimizer.zero_grad()
    
    if epoch %2==0:
        [w,b]=model.parameters()
        print(f'epoch{epoch+1}: w={w[0][0]:.3f}, loss= {l:.8f}')

print(f'Prediction after training : f(5) = {model(X_test).item():.3f}')

    


    
    

4 1
Prediction before training: f(5)=-0.995
epoch1: w=-0.042, loss= 33.00695801
epoch3: w=0.439, loss= 16.06250000
epoch5: w=0.773, loss= 7.90224981
epoch7: w=1.006, loss= 3.97134089
epoch9: w=1.168, loss= 2.07674885
epoch11: w=1.282, loss= 1.16259944
epoch13: w=1.361, loss= 0.72052658
epoch15: w=1.417, loss= 0.50576591
epoch17: w=1.457, loss= 0.40047130
epoch19: w=1.485, loss= 0.34790421
epoch21: w=1.506, loss= 0.32074594
epoch23: w=1.521, loss= 0.30584341
epoch25: w=1.532, loss= 0.29686293
epoch27: w=1.541, loss= 0.29075563
epoch29: w=1.548, loss= 0.28605267
epoch31: w=1.553, loss= 0.28204709
epoch33: w=1.558, loss= 0.27839777
epoch35: w=1.562, loss= 0.27494073
epoch37: w=1.565, loss= 0.27159661
epoch39: w=1.569, loss= 0.26832667
epoch41: w=1.572, loss= 0.26511210
epoch43: w=1.575, loss= 0.26194423
epoch45: w=1.577, loss= 0.25881782
epoch47: w=1.580, loss= 0.25573036
epoch49: w=1.583, loss= 0.25268072
epoch51: w=1.585, loss= 0.24966781
epoch53: w=1.588, loss= 0.24669106
epoch55: w=1.

<span style="color:blue"> Here we have used the model from the module itself, but we can customize it as follow 

In [169]:
import torch
import torch.nn as nn # neural network module
# f=w*x
X=torch.tensor([[1],[2],[3],[4]], dtype=torch.float32)
Y=torch.tensor([[2],[4],[6],[8]], dtype=torch.float32)

X_test=torch.tensor([5], dtype=torch.float32)
n_samples, n_features = X.shape
print(n_samples, n_features)

input_size=n_features
output_size=n_features

class LinearRegression(nn.Module):

    def __init__(self, input_dim, output_dim):
        super(LinearRegression,self).__init__()
        self.lin = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.lin(x)
    
#model=nn.Linear(input_size, output_size)
model=LinearRegression(input_size, output_size)
#loss =MSE
#gradient : 
def gradient(x,y,y_predicted):
    return np.dot(2*x,y_predicted-y).mean()

print(f'Prediction before training: f(5)={model(X_test).item():.3f}')

#Training
learning_rate = 0.01
n_iters=100

loss = nn.MSELoss()
optimizer= torch.optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(n_iters):
    #prediction=forward pass 
    y_pred=model(X)

    #loss
    l=loss(Y,y_pred)

    #gradient= Backward pass
    l.backward() #dl/dw

    #update weights
    optimizer.step()
    #zero gradient
    optimizer.zero_grad()
    
    if epoch %2==0:
        [w,b]=model.parameters()
        print(f'epoch{epoch+1}: w={w[0][0]:.3f}, loss= {l:.8f}')

print(f'Prediction after training : f(5) = {model(X_test).item():.3f}')

    


    
    

4 1
Prediction before training: f(5)=1.315
epoch1: w=0.687, loss= 25.37613678
epoch3: w=1.108, loss= 12.22059631
epoch5: w=1.399, loss= 5.88659477
epoch7: w=1.601, loss= 2.83694196
epoch9: w=1.742, loss= 1.36860013
epoch11: w=1.839, loss= 0.66160798
epoch13: w=1.906, loss= 0.32118303
epoch15: w=1.953, loss= 0.15724745
epoch17: w=1.985, loss= 0.07828671
epoch19: w=2.007, loss= 0.04023900
epoch21: w=2.023, loss= 0.02188968
epoch23: w=2.033, loss= 0.01302525
epoch25: w=2.040, loss= 0.00872762
epoch27: w=2.045, loss= 0.00662920
epoch29: w=2.049, loss= 0.00558994
epoch31: w=2.051, loss= 0.00506097
epoch33: w=2.052, loss= 0.00477805
epoch35: w=2.053, loss= 0.00461391
epoch37: w=2.053, loss= 0.00450730
epoch39: w=2.054, loss= 0.00442873
epoch41: w=2.054, loss= 0.00436397
epoch43: w=2.054, loss= 0.00430619
epoch45: w=2.054, loss= 0.00425208
epoch47: w=2.053, loss= 0.00420005
epoch49: w=2.053, loss= 0.00414933
epoch51: w=2.053, loss= 0.00409954
epoch53: w=2.053, loss= 0.00405053
epoch55: w=2.05