# Dogs-vs-cats classification with CNNs

In this notebook, we'll train a convolutional neural network (CNN, ConvNet) to classify images of dogs from images of cats using PyTorch. This notebook is largely based on the blog post [Building powerful image classification models using very little data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) by François Chollet.

**Note that using a GPU with this notebook is highly recommended.**

First, the needed imports.

In [None]:
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader

from torchvision import datasets, transforms, models

from distutils.version import LooseVersion as LV

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    
print('Using PyTorch version:', torch.__version__, ' Device:', device)
assert(LV(torch.__version__) >= LV("1.0.0"))

TensorBoard is a tool for visualizing progress during training.  Although TensorBoard was created for TensorFlow, it can also be used with PyTorch.  It is easiest to use it with the tensorboardX module.


In [None]:
try:
    import tensorboardX
    import os, datetime
    logdir = os.path.join(os.getcwd(), "logs",
                          "dvc-"+datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
    print('TensorBoard log directory:', logdir)
    os.makedirs(logdir)
    log = tensorboardX.SummaryWriter(logdir)
except ImportError as e:
    log = None

## Data

The training dataset consists of 2000 images of dogs and cats, split in half.  In addition, the validation set consists of 1000 images, and the test set of 22000 images.  Here are some random training images:

![title](imgs/dvc.png)

### Downloading the data

In [None]:
datapath = "/media/data/dogs-vs-cats/train-2000"
(nimages_train, nimages_validation, nimages_test) = (2000, 1000, 22000)

### Data augmentation

First, we'll resize all training and validation images to a fixed size. 

Then, to make the most of our limited number of training examples, we'll apply random transformations to them each time we are looping over them. This way, we "augment" our training dataset to contain more data. There are various transformations available in `torchvision`, see [torchvision.transforms](https://pytorch.org/docs/stable/torchvision/transforms.html) for more information.

In [None]:
input_image_size = (150, 150)

data_transform = transforms.Compose([
        transforms.Resize(input_image_size),
        transforms.RandomAffine(degrees=0, translate=None,
                                scale=(0.8, 1.2), shear=0.2),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor()
    ])

noop_transform = transforms.Compose([
        transforms.Resize(input_image_size),
        transforms.ToTensor()
    ])

Let's see a couple of training images with and without the augmentation.

In [None]:
orig_dataset = datasets.ImageFolder(root=datapath+'/train',
                                     transform=noop_transform)
orig_loader = DataLoader(orig_dataset, batch_size=9,
                         shuffle=False, num_workers=0)

batch, _ = next(iter(orig_loader))
batch = batch.numpy().transpose((0, 2, 3, 1))

plt.figure(figsize=(10,10))
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(batch[i,:,:,:])
    plt.suptitle('only resized training images', fontsize=16, y=0.93)
    
augm_dataset = datasets.ImageFolder(root=datapath+'/train',
                                     transform=data_transform)
augm_loader = DataLoader(augm_dataset, batch_size=9,
                         shuffle=False, num_workers=0)

batch, _ = next(iter(augm_loader))
batch = batch.numpy().transpose((0, 2, 3, 1))

plt.figure(figsize=(10,10))
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(batch[i,:,:,:])
    plt.suptitle('augmented training images', fontsize=16, y=0.93)

Let's insert the augmented images also to a TensorBoard event file. 

In [None]:
if log is not None:
    log.add_images('augmented', batch, dataformats='NHWC')


### Data loaders

Let's now define our real data loaders for training, validation, and test data.

In [None]:
batch_size = 25

print('Train: ', end="")
train_dataset = datasets.ImageFolder(root=datapath+'/train',
                                     transform=data_transform)
train_loader = DataLoader(train_dataset, batch_size=batch_size,
                         shuffle=True, num_workers=4)
print('Found', len(train_dataset), 'images belonging to',
     len(train_dataset.classes), 'classes')

print('Validation: ', end="")
validation_dataset = datasets.ImageFolder(root=datapath+'/validation',
                                     transform=noop_transform)
validation_loader = DataLoader(validation_dataset, batch_size=batch_size,
                         shuffle=False, num_workers=4)
print('Found', len(validation_dataset), 'images belonging to',
     len(validation_dataset.classes), 'classes')

print('Test: ', end="")
test_dataset = datasets.ImageFolder(root=datapath+'/test',
                                     transform=noop_transform)
test_loader = DataLoader(test_dataset, batch_size=batch_size,
                         shuffle=False, num_workers=4)
print('Found', len(test_dataset), 'images belonging to',
     len(test_dataset.classes), 'classes')

## Option 1: Train a small CNN from scratch

Similarly as with MNIST digits, we can start from scratch and train a CNN for the classification task. However, due to the small number of training images, a large network will easily overfit, regardless of the data augmentation.

### Initialization

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, (3, 3))
        self.pool1 = nn.MaxPool2d((2, 2))
        self.conv2 = nn.Conv2d(32, 32, (3, 3))
        self.pool2 = nn.MaxPool2d((2, 2))
        self.conv3 = nn.Conv2d(32, 64, (3, 3))
        self.pool3 = nn.MaxPool2d((2, 2))
        self.fc1 = nn.Linear(17*17*64, 64)
        self.fc1_drop = nn.Dropout(0.5)
        self.fc2 = nn.Linear(64, 1)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = F.relu(self.conv3(x))
        x = self.pool3(x)

        # "flatten" 2D to 1D
        x = x.view(-1, 17*17*64)
        x = F.relu(self.fc1(x))
        x = self.fc1_drop(x)
        return torch.sigmoid(self.fc2(x))

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.03)
criterion = nn.BCELoss()

print(model)

### Learning

In [None]:
def train(epoch, log_interval=100):
    # Set model to training mode
    model.train()
    epoch_loss = 0
    
    # Loop over each batch from the training set
    for batch_idx, (data, target) in enumerate(train_loader):
        # Copy data to GPU if needed
        data = data.to(device)
        target = target.to(device)
    
        # Zero gradient buffers
        optimizer.zero_grad() 
        
        # Pass data through the network
        output = model(data)
        output = torch.squeeze(output)

        # Calculate loss
        loss = criterion(output, target.to(torch.float32))
        epoch_loss += loss.data.item()

        # Backpropagate
        loss.backward()
        
        # Update weights
        optimizer.step()
        
    print('Train Epoch: {}, Loss: {:.4f}'.format(epoch, loss.data.item()))

In [None]:
def evaluate(loader, loss_vector=None, accuracy_vector=None):
    model.eval()
    loss, correct = 0, 0
    for data, target in loader:
        data = data.to(device)
        target = target.to(device)

        output = torch.squeeze(model(data))

        loss += criterion(output, target.to(torch.float32)).data.item()

        pred = output>0.5
        pred = pred.to(torch.int64)
        correct += pred.eq(target.data).cpu().sum()

    loss /= len(validation_loader)
    if loss_vector is not None:
        loss_vector.append(loss)

    accuracy = 100. * correct.to(torch.float32) / len(loader.dataset)
    if accuracy_vector is not None:
        accuracy_vector.append(accuracy)
    
    print('Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        loss, correct, len(loader.dataset), accuracy))

In [None]:
%%time

epochs = 20

lossv, accv = [], []
for epoch in range(1, epochs + 1):
    train(epoch)
    with torch.no_grad():
        print('\nValidation set:')
        evaluate(validation_loader, lossv, accv)

In [None]:
plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), lossv)
plt.title('validation loss')

plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), accv)
plt.title('validation accuracy');

### Inference

In [None]:
%%time
with torch.no_grad():
    evaluate(test_loader)

## Option 2: Reuse a pre-trained CNN

Another option is to reuse a pretrained network.  Here we'll use the [VGG16](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.vgg16) network architecture with weights learned using ImageNet.  We remove the top layers and freeze the pre-trained weights, and then stack our own, randomly initialized, layers on top of the VGG16 network.


### Learning 1: New layers

In [None]:
class PretrainedNet(nn.Module):
    def __init__(self):
        super(PretrainedNet, self).__init__()
        self.vgg_features = models.vgg16(pretrained=True).features

        # Freeze the VGG16 layers
        for param in self.vgg_features.parameters():
            param.requires_grad = False

        self.fc1 = nn.Linear(512*4*4, 64)
        self.fc2 = nn.Linear(64, 1)

    def forward(self, x):
        x = self.vgg_features(x)

        # flatted 2D to 1D
        x = x.view(-1, 512*4*4)

        x = F.relu(self.fc1(x))
        return torch.sigmoid(self.fc2(x))

model = PretrainedNet().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
#optimizer = optim.RMSprop(model.parameters())
criterion = nn.BCELoss()

print(model)


In [None]:
%%time

epochs = 10

lossv, accv = [], []
for epoch in range(1, epochs + 1):
    train(epoch)
    with torch.no_grad():
        print('\nValidation set:')
        evaluate(validation_loader, lossv, accv)

# torch.save(model, "dvc-vgg16-reuse.pt")

In [None]:
plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), lossv)
plt.title('validation loss')

plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), accv)
plt.title('validation accuracy');


### Learning 2: Fine-tuning

Once the top layers have learned some reasonable weights, we can continue training by unfreezing the last convolution block of VGG16 so that it may adapt to our data. The learning rate should be smaller than usual.

In [None]:
# Print all the layers of VGG16
#for name, child in model.vgg_features.named_children():
#    print(name, child)

In [None]:
for name, layer in model.vgg_features.named_children():
    for param in layer.parameters():
        if int(name) >= 24:
            param.requires_grad = True
        print(name, layer, len(param), param.requires_grad)


In [None]:
params = filter(lambda p: p.requires_grad, model.parameters())
#optimizer = optim.SGD(model.parameters(), lr=1e-3)
optimizer = optim.RMSprop(params, lr=1e-5)
criterion = nn.BCELoss()


In [None]:
%%time

epochs = 20

lossv, accv = [], []
for epoch in range(1, epochs + 1):
    train(epoch)
    with torch.no_grad():
        print('\nValidation set:')
        evaluate(validation_loader, lossv, accv)

#torch.save(model, "dvc-vgg16-finetune.pt")


Note that before continuing the training, we create a separate TensorBoard log directory:

In [None]:
plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), lossv)
plt.title('validation loss')

plt.figure(figsize=(5,3))
plt.plot(np.arange(1,epochs+1), accv)
plt.title('validation accuracy');


### Inference

In [None]:
%%time
with torch.no_grad():
    evaluate(test_loader)