My attempt at providing a beginner's tutorial on pytorch. Unlike the ones in the internet or Pytorch website, we start with very basic ideas of tensor (the primary datatype) and go through designing simple feed forward networks with Pytorch. The tutorial requires only basic understanding of feed-forward network. When I was learning Pytorch I found out that almost all the tutorials started with RNNs, CNNs. For someone who had just started with deep learning, it was really difficlut to comprehend. I think, a tutorial should start from the very basics, getting familiar with the different modules in Pytorch and then taking the first step towards designing very simple feed-forward neural networks. Hence my motivation to create a tutorial like this. 

# Pytorch installation

In [3]:
import torch

In [6]:
import numpy as np

# Tensors

#1. The only datatype the model will understand is tensor (multi dimensional matrix)
#2. You will have to convert the feature vector to a tensor 
#3. The entries in the tensor could be float, int,....
#4. https://pytorch.org/docs/stable/tensors.html (lists all the available types)

In [4]:
# general way of creating tensor - torch.tensor(<list/numpy array>, dtype=<dtype>)
x = torch.tensor([1,2,3,4])
y = torch.tensor([1,2,3,4],dtype=torch.float32)
z = torch.FloatTensor([1,2,3,4])

In [5]:
z

tensor([1., 2., 3., 4.])

In [7]:
x = np.array([1,2,3,4])
x_t = torch.tensor(x)

In [8]:
x_t

tensor([1, 2, 3, 4], dtype=torch.int32)

In [9]:
# you can get back numpy array back from tensor
x_n = x_t.numpy()

In [10]:
x_n

array([1, 2, 3, 4])

In [11]:
# similar to numpy there are different ways of creating tensors
x = torch.ones((1,8))
y = torch.zeros((1,2))

In [12]:
# similar way of accessing elements as numpy
y[0][0] = 1.0

In [13]:
y[0,0] # this also works

tensor(1.)

In [14]:
# reshaping matrix
# We want to reshape the matrix of form 3x3 to 1x9
z = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])
z.shape

torch.Size([3, 3])

In [15]:
w = z.reshape(1,9)
w.shape

torch.Size([1, 9])

In [16]:
# in many cases that we will encounter, we won't know beforehand the shape
# In the previous example lets say we do not know if the matrix is 3x3 or 4x4... 
# But we want to 'flatten' the tensor
# So the output would be 1xn.. depending on the shape of the input tensor
q = torch.ones((4,4))
t = torch.ones((8,8))

In [17]:
w = q.reshape(1,-1)

In [18]:
w.shape

torch.Size([1, 16])

In [19]:
w = t.reshape(2,-1)

In [20]:
w.shape

torch.Size([2, 32])

In [21]:
# Check these out yourself....
# Other useful functions torch.squeeze(), torch.unsqueeze(), torch.detach()....

# Calculating gradients

In [1]:
# Now that we know how to create and manipulate tensors, we move onto the second part which is calculating gradients
# You donot need to calculate gradients from scratch..
# Autograd module in Pytorch does it for you

In [24]:
x.requires_grad

False

In [25]:
x = torch.ones((1,7),requires_grad=True)

In [26]:
y = torch.sum(x*x)

In [27]:
y.backward()

In [28]:
print(x.grad)

tensor([[2., 2., 2., 2., 2., 2., 2.]])


In [29]:
w = torch.ones((1,7))

In [30]:
t = w*w

In [31]:
t.backward() # we did not set requires_grad to True for w

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

# Creating an architecture

<img src="Images/neural_net.png">

<img src="Images/linear_matrix.png">

In [105]:
import torch.nn as nn
import torch.nn.functional as F

In [53]:
class Classifier(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Classifier, self).__init__()
        self.W_1 = nn.Parameter(torch.randn(input_size, hidden_size))
        self.b_1 = nn.Parameter(torch.randn(1, hidden_size))
        self.W_2 = nn.Parameter(torch.randn(hidden_size, output_size))
        self.b_2 = nn.Parameter(torch.randn(1, output_size))
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, inp):
        x = inp @ self.W_1 + self.b_1 # inp @ self.W_1 can be replaced by torch.matmul(inp, self.W_1)
        h = self.relu(x)
        y = h @ self.W_2 + self.b_2
        y = self.sigmoid(y)
        return y

In [54]:
clf = Classifier(4,6,2)

In [55]:
inp = torch.randn(1,4)

In [56]:
inp.shape

torch.Size([1, 4])

In [57]:
clf(inp) # forward pass for one datapoint in your dataset

tensor([[0.0657, 0.5474]], grad_fn=<SigmoidBackward>)

In [49]:
## You can design any deep learning architecture from scratch by first initializing the parameters in init and then 
## specfying the sequence of operation in forward pass in the forward function

In [50]:
## But of course Pytorch provides you higher level APIs so that you do not need to implement everything from scratch
## However, it makes sense to know these as in future you might come up with your own architecture for which 
# a ready-made implementation is not available

In [51]:
# Lets look back at the architecture we have already created and see how we can make life easier by using the 
# higher level APIs

In [52]:
# We start with the linear module which packs the whole operation Wx + b

In [63]:
class Classifier_1(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Classifier_1, self).__init__()
        self.i2h = nn.Linear(input_size, hidden_size) # W1x + b1
        self.h2o = nn.Linear(hidden_size, output_size) # w2h + b2
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, inp):
        x = self.i2h(inp) # inp @ self.W_1 can be replaced by torch.matmul(inp, self.W_1)
        h = self.relu(x)
        y = self.h2o(h)
        y = self.sigmoid(y)
        return y

In [64]:
clf = Classifier_1(4,6,2)

In [77]:
inp = torch.randn(1,4)

In [66]:
clf(inp)

tensor([[0.4420, 0.4314]], grad_fn=<SigmoidBackward>)

# Back propagation and training

# Loss

 Loss functions are already implemented in neural network module https://pytorch.org/docs/stable/nn.html#loss-functions
 
 Have a look over the rest
 
 We will see an example with cross-entropy loss (This is the one we are going to use predominantly)

<img src="Images/cross_entropy_loss.png">

In [69]:
## Takes as input the unnormalized probabilities and the true/groundtruth class
## Since the softmax is already performed by the function, you need not perform softmax yourself
## For some other loss function, you might have to explicitly do it

In [70]:
criterion = nn.CrossEntropyLoss() # create an instance of the loss function

In [81]:
# we consider one input instance and assume the ground-truth label as 1 (2 class so either 0 or 1)
label = torch.LongTensor([1]) ## recall pytorch only understands tensor
out = clf(inp)

In [82]:
loss = criterion(out, label)

In [83]:
loss

tensor(0.7143, grad_fn=<NllLossBackward>)

# Optimization through backpropagation/training

In [71]:
# we will use the optimization module in pytorch
import torch.optim as optim

In [85]:
optimizer = optim.SGD(clf.parameters(),lr=0.001) ## we will use Stochastic gradient descent SGD, lr - learning rate 

In [86]:
optimizer.zero_grad() # flushes out any previously calculated gradient

In [87]:
loss.backward() # calculates gradient with respect to the parameters dL/dx

In [88]:
optimizer.step() # updates the parameters based on the optmization strategy x = x - lr*dL/dx

In [89]:
# the above four lines of code constitute the whole backpropagation

# Batching

In [90]:
# We have till now only been backpropagating with just one example...
# In practice we optimize over a batch of input datapoints
# We will now see how to deal with batches
# Actually not much modification is needed

In [91]:
inp = torch.randn(6,4) # we have 6 data points each with 4 features

In [92]:
label = torch.LongTensor([1,0,0,1,1,0]) # labels for these random outputs

In [93]:
out = clf(inp)

In [94]:
out.shape # output corresponding to the 6 data points

torch.Size([6, 2])

In [95]:
loss = criterion(out, label) # calculated across the whole batch (6 data points) and then calculating the mean

In [96]:
loss

tensor(0.6819, grad_fn=<NllLossBackward>)

In [97]:
# the training is again similar
optimizer = optim.SGD(clf.parameters(),lr=0.001)
optimizer.zero_grad()
loss.backward()
optimizer.step()

In [98]:
# It is pretty easy to extend from a single data point to a batch...
# I often find it useful when designing a architecture to first design it for a single data point and then 
# extending it to batch

# Evaluation

In [100]:
# Once you have trained the model, you will like to check how good it does for the test set
# We will just need the forward pass
# Optimization is not needed

In [102]:
inp_test = torch.randn(1,4)

In [103]:
clf.eval() # We want to use the model in evaluation mode.. 

Classifier_1(
  (i2h): Linear(in_features=4, out_features=6, bias=True)
  (h2o): Linear(in_features=6, out_features=2, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)

In [104]:
out = clf(inp_test) # Perform forward pass in the example

In [106]:
out_prob = F.softmax(out, dim=1) 
# remember cross-entropy loss was doing the softmax for us.. For evaluation we have to do it explicitly

In [109]:
out_prob

tensor([[0.5154, 0.4846]], grad_fn=<SoftmaxBackward>)

In [108]:
torch.argmax(out_prob) # the class with the highest probability is the inferred class

tensor(0)

In [136]:
torch.argmax(out_prob).item() # get back value from pytorch tensor

0

In [111]:
# Given you know the true/ground-truth label, you can compute the accuracy, fraction of cases where the 
# inferred class matches the output class

In [101]:
# Lets now work with a real-world dataset...
# We will try with the classic MNIST dataset...
# Handwritten digit recognition dataset with the labels being digits (0-9)

<img src="Images/mnist.png">

# Data preprocessing

In [110]:
# We would like to train the model in batches...
# Ideally examples in a batch should be picked at random...
# Of course we can code it ourselves, but pytorch provides a Datset module to do just that...

In [112]:
# original data available at http://yann.lecun.com/exdb/mnist/ but is somewhat difficult to use
# I have done some preprocessing so that it is easier to use
# Each image is 28x28 which I have flattened to 1x784
# There are two files, train and test which should be used for training and testing respectively
# Each line in the files is a datapoint
# the first element in each line is the label rest are the feature values

In [114]:
data_points = []
class_labels = []

with open('Data/mnist_train_file.txt') as fs:
    for line in fs:
        data = list(map(int, line.strip().split(','))) 
        label = data[0]
        datapoint = data[1:]
        data_points.append(datapoint)
        class_labels.append(label)

In [115]:
len(class_labels)

60000

In [116]:
len(data_points[0])

784

In [113]:
from torch.utils.data import Dataset, DataLoader

In [120]:
class Mnist_dataset(Dataset):
    def __init__(self, data_points, class_labels):
        super(Dataset, self).__init__()
        self.data = data_points
        self.labels = class_labels
    
    def __len__(self):
        # returns length of the dataset
        return len(self.labels)
    
    def __getitem__(self, index):
        # retrieves an item of a given index
        d = torch.FloatTensor(self.data[index])
        l = torch.LongTensor([self.labels[index]])
        return d,l

In [124]:
mnist_data = Mnist_dataset(data_points, class_labels)
dataloader = DataLoader(mnist_data, batch_size=32, shuffle=True)
# this will resturn a batch of 32 examples which you can directly set as input to the model clf()

torch.Size([784])


In [131]:
clf = Classifier_1(784, 1056, 10)

In [135]:
for data, label in dataloader:
    out = clf(data)
    break

In [137]:
# Now design a classifier for MNIST putting together all the elements that we have learnt....