# Day III

In today's lesson, we will go over how to train and test deep learning models.



First begin by performing the necessary installation of pytorch.

In [None]:
!pip3 install torch torchvision


## Initial Example

Now, we are needing to import the pytorch library with some subfunctions.

In [None]:
# Import libraries
import torch
import torchvision
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

Now we are going to use a well-known dataset called MNIST. We will use the dataset function and access the dataset. We will use that function to load a training dataset and a testing dataset.

In [None]:
# Create test and training sets
train = datasets.MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

test = datasets.MNIST('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

Once extracted, we now need to place the dataset into the pytorch data pipeline. This is a fancy way of saying that the dataset needs to be configured properly to be loaded while we are training. In this case, we are slicing the data into smaller training and test sets where each set has a batch size of 10.

In [None]:
# This section will shuffle our input/training data so that we have a randomized shuffle of our data and do not risk feeding data with a pattern. Anorther objective here is to send the data in batches. This is a good step to practice in order to make sure the neural network does not overfit our data. NN’s are too prone to overfitting just because of the exorbitant amount of data that is required. For each batch size, the neural network will run a back propagation for new updated weights to try and decrease loss each time.
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)

Now, our next goal is to actually create the neural network model. It is standard practive in Pytorch to create a neural network python class. Within the class, we initialize, using the init function within our class, the layers we need. 

Notice that within the init function, we are currently using only one type of layer known as linear. From what we learned previously in the slides, linear is in fact known as fully-connect layers; linear and fully-connected are synonymous.

The forward function is used to determine the order of the layers that were initialized in the init function. The neural network is constructed like this: input(28x28) -> fc1 -> output(64) -> fc2 -> output(64) -> fc3 -> output(64)-> fc1 -> output(10) -> softmax layer -> final output(10)

In [None]:
# Initialize our neural net
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return F.log_softmax(x, dim=1)

Now our goal is to initialize the neural network and create our loss function and optimizer. It is typically standard for a classification problem to begin with a cross-entropy loss function and for the optimizer to be ADAM.

lr is an acronym for learning rate. This is a value that we determine and can tweak accordingly. typically the learning rate by default is either 0.001 or 0.0001.

In [None]:

# Calculate our neural network, loss function, and optimizer
net = Net() 
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

print(net)

NOW IT IS TIME TO BEGIN TRAINING!

The First for loop is how many epochs or cycles we perform to train the data. The second loop iterates through the batch size.



1.   Collect the input and output label values from the data
2.   set gradients to 0 before calculating the loss; this is a necessary step in order to properly calculate the loss. 
3. pass the input through the neural network
4. calculate the loss
5. send the error or loss through the network to update the neural network
6. update the optimizer
7. repeat



In [None]:


# begin training.
# with torch.no_grad(): # sets gradients to 0 before calculating loss.
for epoch in range(5): # we use 5 epochs
    for data in trainset:  # `data` is a batch of data
        X, y = data  # X is the batch of features, y is the batch of targets.

        net.zero_grad()  

        output = net(X.view(-1,784))  # pass in the reshaped batch (recall they are 28x28 atm, -1 is needed to show that output can be n-dimensions. This is PyTorch exclusive syntax)

        loss = criterion(output, y)  # calc and grab the loss value

        loss.backward()  # apply this loss backwards thru the network's parameters
        optimizer.step()  # attempt to optimize weights to account for loss/gradients
    print(loss)  

### Output:
### tensor(0.6039, grad_fn=)
### tensor(0.1082, grad_fn=)
### tensor(0.0194, grad_fn=)
### tensor(0.4282, grad_fn=)
### tensor(0.0063, grad_fn=)


### Output: 
### Accuracy:  0.915

Calculate the Accuracy. Now to calculate the accuracy, the process is still the same, however, there is no need to 4-6. Rather, the process is to send the input into the neural network and then calculate the accuracy. 

In [None]:
# Get the Accuracy
correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net(X.view(-1,784))
        #print(output)
        for idx, i in enumerate(output):
            #print(torch.argmax(i), y[idx])
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

## Convolutional Network Example

First, we need to load the necessary pytorch libraries

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms

Now we will begin loading the dataset and necessary variables for the dataset.

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Let's review  some of the training images.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(batch_size)))

Now we will need to define a Convolutional Neural Network

In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Now it is time to send the network to the gpu to train faster.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

net.to(device)

Now we will need to define the loss function criterion and the optimizer.

In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


Let's begin to train the network. This is when things start to get interesting. We simply have to loop over our data iterator, and feed the inputs to the network and optimize.



In [None]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs, labels = data[0].to(device), data[1].to(device) # send to GPU

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Before we continue, it would be great to save our model. 

In [None]:
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

Let's test the network on our test data. We have trained the network for 2 passes over the training dataset. But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the test set to get familiar.

In [None]:
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

Next, let’s load back in our saved model (note: saving and re-loading the model wasn’t necessary here, we only did it to illustrate how to do so):



In [None]:
net = Net()
net.load_state_dict(torch.load(PATH))

Okay, now let us see what the neural network thinks these examples above are:



In [None]:
outputs = net(images)


The outputs are energies for the 10 classes. The higher the energy for a class, the more the network thinks that the image is of the particular class. So, let’s get the index of the highest energy:



In [None]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

The results seem pretty good.

Let us look at how the network performs on the whole dataset.

In [None]:
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

That looks way better than chance, which is 10% accuracy (randomly picking a class out of 10 classes). Seems like the network learnt something.

Hmmm, what are the classes that performed well, and the classes that did not perform well:

In [None]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print("Accuracy for class {:5s} is: {:.1f} %".format(classname,
                                                   accuracy))