## **Pytorch on MNIST hand written recognition data set**

MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are grayscale, 28x28 pixels, and centered to reduce preprocessing and get started quicker.

In [None]:
import torch
import torchvision
from typing import Optional

we'll define the hyperparameters we'll be using for the experiment. Here the number of epochs defines how many times we'll loop over the complete training dataset, while learning_rate and momentum are hyperparameters for the optimizer we'll be using later on.

In [None]:
n_epochs = 3
batch_size_train = 64
batch_size_test = 1000
learning_rate = 0.01
momentum = 0.5
log_interval = 10

random_seed = 1
torch.backends.cudnn.enabled = False
torch.manual_seed(random_seed)

Basic variables created by Zayd Hammoudeh for use in the tool

In [None]:
data_dir = "~/.data/"
num_classes = 10

We'll also need DataLoaders for the dataset. This is where TorchVision comes into play. It lets us load the MNIST dataset in a handy way. We'll use a batch_size of 64 for training and size 1000 for testing on this dataset. The values 0.1307 and 0.3081 used for the Normalize() transformation below are the global mean and standard deviation of the MNIST dataset, we'll take them as a given here.

In [None]:
train_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST(data_dir, train=True, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,))
                             ])),
  batch_size=batch_size_train, shuffle=True)

test_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST(data_dir, train=False, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,))
                             ])),
  batch_size = batch_size_test, shuffle=True)

Now let's take a look at some examples. We'll use the test_loader for this.

In [None]:
examples = enumerate(test_loader)
batch_idx, (example_data, example_targets) = next(examples)

In [None]:
example_data.shape

So one test data batch is a tensor of shape: This means we have 1000 examples of 28x28 pixels in grayscale (i.e. no rgb channels, hence the one). We can plot some of them using matplotlib.

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.tight_layout()
    plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
    plt.title("Ground Truth: {}".format(example_targets[i]))
    plt.xticks([])
    plt.yticks([])
fig

In [None]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

Here's an example model. Two 2-D convolutional layers followed by two fully-connected (or linear) layers. For activation function we'll choose rectified linear units (ReLUs in short) and as a means of regularization we'll use two dropout layers. In PyTorch a nice way to build a network is by creating a new class for the network we wish to build.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, num_classes)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

initialize the network and the optimizer.

In [None]:
network = Net()
optimizer = optim.SGD(network.parameters(), lr=learning_rate, momentum=momentum)

Time to build our training loop. First we want to make sure our network is in training mode. Then we iterate over all training data once per epoch. Loading the individual batches is handled by the DataLoader. First we need to manually set the gradients to zero using optimizer.zero_grad() since PyTorch by default accumulates gradients. We then produce the output of our network (forward pass) and compute a negative log-likelihodd loss between the output and the ground truth label. The backward() call collects a new set of gradients which we propagate back into each of the network's parameters using optimizer.step(). For more detailed information about the inner workings of PyTorch's automatic gradient system, see the official docs for autograd.

In [None]:
# utility containers to record how training process goes, used for later visulaztions
train_losses = []
train_counter = []
test_losses = []
test_counter = [i*len(train_loader.dataset) for i in range(n_epochs + 1)]

In [None]:
def train(epoch):
    network.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = network(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}\n'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            train_losses.append(loss.item())
            train_counter.append((batch_idx*64) + ((epoch-1)*len(train_loader.dataset)))

In [None]:
def test():
    network.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = network(data)
            test_loss += F.nll_loss(output, target, size_average=False).item()
            pred = output.data.max(1, keepdim=True)[1]
            correct += pred.eq(target.data.view_as(pred)).sum()
    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)
    print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Time to run the training! We'll manually add a test() call before we loop over n_epochs to evaluate our model with randomly initialized parameters.

In [None]:
test()
for epoch in range(1, n_epochs + 1):
    train(epoch)
    test()

In [None]:
fig = plt.figure()
plt.plot(train_counter, train_losses, color='blue')
plt.scatter(test_counter, test_losses, color='red')
plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
plt.xlabel('number of training examples seen')
plt.ylabel('negative log likelihood loss')
fig

Class `BaseFFNoDropout` used as the base for models 3.1-3.4

In [None]:
class BaseFF(nn.Module):
    def __init__(self, hidden_dim: int, act: nn.Module, 
                 p_dropout: Optional[float] = None):
        super().__init__()
        self._module = nn.Sequential(nn.Linear(28*28, hidden_dim),
                                     act())
        if p_dropout is not None:
            self._module.add_module("Dropout", nn.Dropout(p_dropout))
        self._module.add_module("Out_Lin", nn.Linear(hidden_dim, num_classes))
        self._module.add_module("Softmax", nn.Softmax(dim=1))

    def forward(self, x):
        y = x.view(-1, 28 * 28)
        return self._module(x)

`train_test_report` is used to standardize training, testing, and reporting of the results for each of the seven networks.

In [None]:
def train_test_report():
    test()
    optimizer = optim.SGD(network.parameters(), lr=learning_rate, momentum=momentum)
    for epoch in range(1, n_epochs + 1):
        train(epoch)
        test()
    fig = plt.figure()
    plt.plot(train_counter, train_losses, color='blue')
    plt.scatter(test_counter, test_losses, color='red')
    plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
    plt.xlabel('number of training examples seen')
    plt.ylabel('negative log likelihood loss')
    fig

Class `Prob3p1` is the network used for part 3.1 of the homework.

In [None]:
class Prob3p1(BaseFF):
    def __init__(self):
        super().__init__(16, nn.Sigmoid)

network = Prob3p1()
train_test_report()

Class `Prob3p2` is the network used for part 3.2 of the homework.

In [None]:
class Prob3p2(BaseFF):
    def __init__(self):
        super().__init__(128, nn.Sigmoid)
        
        
network = Prob3p2()
train_test_report()

Class `Prob3p3` is the network used for part 3.3 of the homework.

In [None]:
class Prob3p3(BaseFF):
    def __init__(self):
        super().__init__(128, nn.ReLU)


network = Prob3p3()
train_test_report()

Class `Prob3p4` is the network used for part 3.4 of the homework.

In [None]:
class Prob3p4(BaseFF):
    def __init__(self):
        super().__init__(128, nn.ReLU, 0.5)


network = Prob3p4()
train_test_report()

Class `BaseConv` used as the base for models 3.5-3.7

In [None]:
class BaseConv(nn.Module):
    def __init__(self, inc_pool: bool = False, inc_sec_conv: bool = False):
        super().__init__()
        self._conv = nn.Sequential(nn.Conv2d(1, 10, kernel_size=5),
                                   nn.ReLU())
        self._conv_out_dim = 576
                 
        if inc_sec_conv:
            self._conv.append("pool", nn.max_pool2d(kernel_size=2))
            self._conv.add_module("Conv2", nn.Conv2d(10, 20, kernel_size=5))
            self._conv.add_module("Conv2ReLU", nn.ReLU())
            self._conv_out_dim = 320
        if inc_pool:
            self._conv.add_module("dropout", nn.Dropout2d())
            self._conv.append("pool", nn.max_pool2d(kernel_size=2))
            if not inc_sec_conv: self._conv_out_dim = 529
            
        # All three networks have the same output structure of 128 ReLUs with
        # 50% dropout
        hidden_dim, p_dropout = 128, 0.5
        self._module = nn.Sequential(nn.Linear(self._conv_out_dim, hidden_dim), 
                                     nn.ReLU(),  
                                     nn.Dropout(p_dropout),  
                                     nn.Linear(hidden_dim, num_classes),
                                     nn.Softmax(dim=1))

    def forward(self, x):
        y = self._conv(x)
        y = y.view(-1, self._conv_out_dim)
        return self._module(y)

Class `Prob3p5` is the network used for part 3.5 of the homework.

In [None]:
class Prob3p5(BaseConv):
    def __init__(self):
        super().__init__()


network = Prob3p5()
train_test_report()

Class `Prob3p6` is the network used for part 3.6 of the homework.

In [None]:
class Prob3p6(BaseConv):
    def __init__(self):
        super().__init__(inc_pool=True)


network = Prob3p6()
train_test_report()

Class `Prob3p7` is the network used for part 3.7 of the homework.

In [None]:
class Prob3p7(BaseConv):
    def __init__(self):
        super().__init__(inv_sec_conv=True)


network = Prob3p7()
train_test_report()