# Convolutional Neural Networks

In the [previous notebook](./pytorchIntro.ipynb) we have seen how you can train a neural network with pytorch. Next we will learn about the torchvision package and how you can use it to classify images. As our challenge for this notebook, we will use the [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats/data) Kaggle Challenge. (You can ask a Kursleiter for the data set.)

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as tf
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

To handle the loading and sampling of the data for the training we use the torchvision ``ImageFolder`` class. It expects the image data arranged in the following way: For each category, there is a folder in the root folder with the name of the category as name and containing all images of this category.

In our case you will need two folders (for test and training), each of them containing a folder ``dog`` and a folder ``cat`` containing the images. Enter the paths of your folders here:

In [2]:
# edit me
train_folder = '../dogs_vs_cats/train'
test_folder = '../dogs_vs_cats/test'

Next we define a class for our convolutional network. It has 3 convolutional layers and a linear output layer. We also make use of the nn.Sequential module this time.

In [3]:
class DogCatNet(nn.Module):
    def __init__(self):
        super(DogCatNet, self).__init__()
        
        conv_layers = [
            # 64 x 64 x 3
            nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size = 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2),
            # 32 x 32 x 32
            nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size = 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2),
            # 16 x 16 x 64
            nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2)
            # 8 x 8 x 128 = 8192
        ]
        self.add_module('convolutions', nn.Sequential(*conv_layers))
        self.add_module('output', nn.Linear(8192,2))
    
    def forward(self, x):
        x = self.convolutions(x)
        x = self.output(x.view((x.size()[0], -1)))
        return x

For more information on the layers (and additional hyperparameters) we refer to the [documentation](https://pytorch.org/docs/stable/nn.html). Also note that we need to change the format of the tensor once we switch from convolutions to full connected layers.

Let us now define the ``ImageFolder`` we use for training and testing. The transform option allows us to attach a function that is executed for each image upon loading. For training, we randomly crop the image as a form of data augmentation, resize it to 64x64, then transform it to a tensor and normalize it. (The mean and std parameters are coming from the ImageNet dataset.) For testing, we do not crop the image.

In [4]:
train_transformation = tf.Compose([
    tf.RandomResizedCrop(64, scale = (0.5, 1.0)),
    tf.ToTensor(),
    tf.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
train_set = ImageFolder(train_folder, transform = train_transformation)

test_transformation = tf.Compose([
    tf.Resize([64,64]),
    tf.ToTensor(),
    tf.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
test_set = ImageFolder(test_folder, transform = test_transformation)

For both sets we also define a ``DataLoader`` which is pretty much an iterator over the dataset that also takes care of shuffling, batch_size and some other things:

In [5]:
train_loader = DataLoader(train_set, shuffle = True, batch_size = 32, num_workers = 3)
test_loader = DataLoader(test_set, shuffle = True, batch_size = 32, num_workers = 3, drop_last = True)

The parameter ``num_workers`` describes how many threads the DataLoader will use to load the data. The correct number naturally depends on your hardware. In case you are training with a GPU and you want to achieve optimal performance, this is probably the number of CPU cores on your machine.

And this is what a training epoch and a test run would look like:

In [11]:
learning_rate = 1e-4
model = DogCatNet()
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

def train_epoch():
    loss_fn = nn.CrossEntropyLoss()
    model.train()
    loss_sum = 0
    steps = 0
    for x, label in train_loader:
        optimizer.zero_grad()
        y = model(x)
        loss = loss_fn(y, label)
        loss.backward()
        optimizer.step()
        loss_sum += float(loss)
        steps += 1
        print("Current loss: {}, accumulated loss: {}".format(float(loss), loss_sum / steps))
    average_loss = loss_sum / steps
    return average_loss

def test():
    loss_fn = nn.CrossEntropyLoss()
    model.eval()
    num_errors = 0
    loss_sum = 0
    steps = 0
    for x, label in test_loader:
        y = model(x)
        loss = loss_fn(y, label)
        num_errors += int((y.max(1)[1] == label).float().sum())
        loss_sum += float(loss)
        steps += 1
    average_loss = loss_sum / steps
    return average_loss, num_errors

Now you should be ready to classify some images of dogs and cats. Have fun! :-)

In [1]:
train_result = train_epoch()
print("Average loss after training epoch: {}".format(train_result))
test_loss, errors = test()
print("Average test loss: {}".format(test_loss))
print("Number of incorrectly classified images: {}".format(errors))