# Getting started with Pytorch

________________________________________________________

<font color='red'>**ATTENTION!**</font>

This material was gathered from the Medium article **[An easy introduction to Pytorch for Neural Networks](https://towardsdatascience.com/an-easy-introduction-to-pytorch-for-neural-networks-3ea08516bff2)** and is for replication purposes only. I do not claim the rights to this Medium article nor do I take responsibility for the author's work. This Jupyter example has been created for educational purposes only. Credit goes to the original author, **[George Seif](https://medium.com/@george.seif94)**.

________________________________________________________

At its core, the development of Pytorch was aimed at being as similar to Python’s Numpy as possible. Doing so would allow an easy and smooth interaction between regular Python code, Numpy, and Pytorch allowing for faster and easier coding.

To get started, we can install Pytorch via pip:

In [None]:
#!pip install torch torchvision

If you’re interested in looking at specific features, the [Pytorch docs](https://pytorch.org/docs/stable/index.html) are amazing.

Lets load our packages needed for this exercise

In [12]:
import pandas as pd
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F

### Tensors

In PyTorch, tensors can be declared using the simple Tensor object. The below code creates a tensor of size (3, 3) — i.e. 3 rows and 3 columns, filled with floating point zeros:

In [2]:
x = torch.Tensor(3, 3)
print(x)
print("")
print(x.shape)


tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Size([3, 3])


We can also create tensors filled random floating point values:

In [3]:
x = torch.rand(3, 3)
print(x)

tensor([[0.5344, 0.8970, 0.2451],
        [0.6515, 0.4329, 0.3200],
        [0.0344, 0.1011, 0.1795]])


Multiplying tensors, adding them, and other basic math is super easy with Pytorch:

In [4]:
x = torch.ones(3,3)
y = torch.ones(3,3) * 4
z = x + y
print(z)

tensor([[5., 5., 5.],
        [5., 5., 5.],
        [5., 5., 5.]])


Even Numpy-like slicing functions are available with Pytorch tensors!

In [5]:
x = torch.ones(3,3) * 5
y = x[:, :3]
print(y)

tensor([[5., 5., 5.],
        [5., 5., 5.],
        [5., 5., 5.]])


So Pytorch tensors can very much be used and worked with in the same way as Numpy arrays. Now we’ll look at how we can build Deep Networks with these easy Pytorch tensors as our building blocks!

# Building Neural Networks with Pytorch

With Pytorch, neural networks are defined as Python classes. The class which defines the network extends the torch.nn.Module from the Torch library. Let’s create a class for a Convolutional Neural Network (CNN) which we’ll apply on the MNIST dataset.
Check out the code below which defines our network!

In [6]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, kernel_size=(3, 3), padding=1)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=(3, 3), padding=1)
        self.max_pool = nn.MaxPool2d(2, 2)
        self.global_pool = nn.AvgPool2d(7)
        self.fc1 = nn.Linear(64, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.max_pool(x)

        x = F.relu(self.conv2(x))
        x = F.relu(self.conv2(x))
        x = self.max_pool(x)

        x = F.relu(self.conv2(x))
        x = F.relu(self.conv2(x))
        x = self.global_pool(x)

        x = x.view(-1, 64)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        x = F.log_softmax(x)

        return x
model = Net()

The two most important functions in a Pytorch network class are the *__* *init* *__()* and the *forward()* functions. The *__* *init* *__()* is used to define any network layers that your model will use. The *forward()* function is where you actually set up the model by stacking all the layers together.

For our model, we’ve defined 2 convolutional layers in the init function, one of which we’ll re-use a few times (conv2). We have a max-pooling layer and a global average pooling layer to be applied near the end. Finally we have our Full-Connected (FC) layers and a softmax to get the final output probabilities.

In the forward function, we define exactly how our layers stack up together to form the full model. It’s a standard network with stacked conv, pooling, and FC layers. The beauty of Pytorch is that we can print out the shape and result of any tensor within the intermediate layers with just a simple print statement wherever you want in the *forward()* function!

# Training, Testing, and Saving

### Loading up data

Time to get our data ready for training! We’ll starting but getting the necessary imports ready, initialise parameters, and making sure Pytorch is setup to use the GPU. The line below which uses `torch.device()` checks if Pytorch was installed with CUDA support and if so uses the GPU!

In [17]:
num_epochs = 10
batch_size = 32
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

We can retrieve the MNIST dataset straight from Pytroch. We’ll download the data and put the train and test sets into separate tensors. Once that data is loaded, we’ll pass it to a torch *DataLoader* which just gets it ready to pass to the model with a specific batch size and optional shuffling.

In [9]:
# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='data',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='data',
                                          train=False,
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Training

It’s training time!

The optimzer (we’ll use Adam) and loss function (we’ll use cross entropy) are defined quite similarly to other deep learning libraries like TensorFlow, Keras, and MXNet.

In [10]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()

In Pytorch, all network models and datasets much be explicitly transferred from CPU to GPU. We do this by applying the `.to()` function to our model below. Later, we’ll do the same for our image data.

In [None]:
model.to(device)

Finally, we can write out our training loop. Check out the code below to see how it works!

1. All Pytorch training loops will go through each epoch and each batch in the training data loader.
2. On each loop iteration, the image data and labels are transferred to the GPU.
3. Each training loop also explicitly applies the forward pass, backward pass, and optimisation steps.
4. The model is applied to the images in the batch and then the loss for that batch is calculated.
5. The gradients are calculated and back-propagated through the network

In [13]:
# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = loss_function(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))



Epoch [1/10], Step [100/1875], Loss: 2.2637
Epoch [1/10], Step [200/1875], Loss: 1.4102
Epoch [1/10], Step [300/1875], Loss: 0.8213
Epoch [1/10], Step [400/1875], Loss: 0.4231
Epoch [1/10], Step [500/1875], Loss: 0.4085
Epoch [1/10], Step [600/1875], Loss: 0.4400
Epoch [1/10], Step [700/1875], Loss: 0.3668
Epoch [1/10], Step [800/1875], Loss: 0.5550
Epoch [1/10], Step [900/1875], Loss: 0.1462
Epoch [1/10], Step [1000/1875], Loss: 0.2231
Epoch [1/10], Step [1100/1875], Loss: 0.2215
Epoch [1/10], Step [1200/1875], Loss: 0.2195
Epoch [1/10], Step [1300/1875], Loss: 0.6095
Epoch [1/10], Step [1400/1875], Loss: 0.2616
Epoch [1/10], Step [1500/1875], Loss: 0.2854
Epoch [1/10], Step [1600/1875], Loss: 0.1278
Epoch [1/10], Step [1700/1875], Loss: 0.0089
Epoch [1/10], Step [1800/1875], Loss: 0.0954
Epoch [2/10], Step [100/1875], Loss: 0.1180
Epoch [2/10], Step [200/1875], Loss: 0.1079
Epoch [2/10], Step [300/1875], Loss: 0.2683
Epoch [2/10], Step [400/1875], Loss: 0.0686
Epoch [2/10], Step [500

### Testing and Saving

Testing a network’s performance in Pytorch sets up a similar loop as in the training phase. The main difference being that we don’t need to do a backward propagation of the gradients. We’ll still do the forward-pass and just get the label with the maximum probability at the output of the network.

In this case, after 10 epochs our network got an accuracy of 99.16% on the test set!

In [14]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the MNIST test images: {} %'.format(100 * correct / total))



Accuracy of the network on the MNIST test images: 99.16 %


To save the model to disk to use later, just use the `torch.save()` function and voila!

In [15]:
torch.save(model.state_dict(), 'model.ckpt')