In [8]:
%matplotlib inline

Convolutional Neural Network
=====================

In this exercise, you will practice how to impelement a convolutional neural network (CNN) for image classification with PyTorch. Specifically, you need to implement one of the most famous CNN - the LeNet, and apply it on a handwritten digits dataset - MNIST. After finishing building the network, you also need to run the training algorithm and compare the performance of LeNet and a multi-layer perceptron (We've already implemented for you). You can also do some hyperparameter tuning or model modification to check how it will affect the classification performance.






Training an image classifier
----------------------------

Normally, the algorithm for training a image classifier includes the  following steps:

1. Load and normalize the training and test datasets using ``torchvision``
2. Define a neural network model
3. Define a loss function and optimizer
4. Train the network on the training data
5. Validate the network on the validation data
6. Test the network on the test data

In [9]:
import torch
import torchvision
import torchvision.transforms as transforms

# Hyperparameters

After you finish building the neural network model, you can try different values of hyperparameters and check how it will affect the performance of your model, e.g., increase/decrease batch size and learning_rate, or increase the width of the convolutional layers.

In [10]:
# TODO: try different values of hyperparameters and check how it will affect the classification performance.

batch_size=128
learning_rate=0.0001

Torchvision and datasets
----------------

PyTorch has a package called
``torchvision``, which includes data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data.DataLoader``.

This provides a huge convenience and avoids writing boilerplate code. For this exercise, we will use the MNIST dataset which is a large database of handwritten digits.

The output of torchvision datasets are PILImage images of range [0, 1].
We transform them to Tensors of normalized range [-1, 1].

In [11]:
# We normalize the data by its mean and variance.
transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                download=True, transform=transform)


# training validation split 
train_set, val_set = torch.utils.data.random_split(trainset, [50000, 10000])


trainloader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

valloader = torch.utils.data.DataLoader(val_set, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)



<div class="alert alert-info"><h4>Note</h4><p>If running on Windows and you get a BrokenPipeError, try setting
    the num_worker of torch.utils.data.DataLoader() to 0.</p></div>

Build the LeNet
----------------
Build the network according to the instruction. 

In [12]:
import torch.nn as nn
import torch.nn.functional as F


# TODO: Implement the LeNet according to the description.
class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # Here is an example of the convolutional layer where 
        # input channel=1, output channel=6, kernel size=5, padding=2
        # for this layer (only) we set padding=2 because LeNet is
        # expecting an image of size 32x32 instead of 28x28 (MNIST Image size)
        # implement other layers by yourself.
        self.conv1 = nn.Conv2d(1, 6, 5, padding=2)
        print(self.conv1)
        self.pool2 = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(6, 16, 5)
        self.pool4 = nn.MaxPool2d(2)
        self.linear5 = nn.Linear(400 , 120)
        self.linear6 = nn.Linear(120, 84)
        self.linear7 = nn.Linear(84, 10)

    def forward(self, x):
        convolution1 = self.pool2(F.relu(self.conv1(x)))
        convolution2 = self.pool4(F.relu(self.conv2(convolution1)))
        # flatten 
        convolution2 = convolution2.view(-1, )
        out = F.relu(self.linear5(convolution2))
        out = F.relu(self.linear6(out))
        out = self.linear7(out)
        return out
# We've implemented a multi-layer perceptron model so that you can try to run the training algorithm
# and compare it with LeNet in terms of the classification performance.
class MLP(nn.Module):

    def __init__(self):
        super(MLP, self).__init__()
        self.input = nn.Linear(28 * 28, 512)
        self.hidden = nn.Linear(512, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.sigmoid(self.input(x))
        x = torch.sigmoid(self.hidden(x))
        x = self.output(x)
        return x

net = MLP()

# Uncomment this line after you implement it
net = LeNet()

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))


Loss Function and Optimizer
----------------
Let's use a Classification Cross-Entropy loss and SGD with momentum.



In [13]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)

Training the network
----------------

This is when things start to get interesting.
We simply have to loop over our data iterator, and feed the inputs to the
network and optimize. After each epoch, we print the statistics.



In [14]:
for epoch in range(10):  # loop over the dataset multiple times
    
    train_loss = 0.0
    train_acc = 0.0
    val_loss = 0.0
    val_acc = 0.0
    test_loss = 0.0
    test_acc = 0.0
    
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        
        
        
        # statistics
        train_loss += loss.item()
        pred = torch.max(outputs, 1)[1]
        train_correct = (pred == labels).sum()
        train_acc += train_correct.item()

        
    # To get the best learned model, we need to do some statisticcs.
    # After training, we pick the model with best validation accuracy.
    with torch.no_grad():
        net.eval()

        for inputs, labels in valloader:

            predicts = net(inputs)

            loss = criterion(predicts, labels)
            val_loss += loss.item()
            pred = torch.max(predicts, 1)[1]
            val_correct = (pred == labels).sum()
            val_acc += val_correct.item()

        for inputs, labels in testloader:

            predicts = net(inputs)
            pred = torch.max(predicts, 1)[1]
            test_correct = (pred == labels).sum()
            test_acc += test_correct.item()

        net.train()
    print("Epoch %d" % epoch )

    print('Training Loss: {:.6f}, Training Acc: {:.6f}, Validation Acc: {:.6f}, Test Acc: {:.6f}'.format(train_loss / (len(train_set))*32,train_acc / (len(train_set)), val_acc / (len(val_set)),test_acc / (len(testset))))        

print('Finished Training')

AttributeError: 'LeNet' object has no attribute 'conv2'