# Transfer Learning to classify the Dogs vs Cats images

In this notebook, I will use a pre-trained network to solved challenging problems in computer vision. Specifically, I will use networks **trained** on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using convolutional layers. Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models`, I can download these pre-trained networks and use them in your applications, hence I'll include `models` in the imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

In [2]:
# Allow the Google Colab file browser to access my Google Drive files
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Most of the pretrained models require the input to be 224x224 images. Also, I'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [3]:
data_dir = '/content/drive/MyDrive/Colab Notebooks/Dogs vs Cats Classifier/Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [4]:
model = models.densenet121(pretrained=True)
model

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /root/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth
100%|██████████| 30.8M/30.8M [00:00<00:00, 274MB/s]


DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for my specific problem. That means I need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [5]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

With the model built, I now need to train the classifier. However, now we're using a **really deep** neural network. If I tried to train this on a CPU like normal, it will take a long, long time. Instead, I am going to use the GPU available on Google Colab to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch.

In [11]:
# Agnostically set the device that will be used for computation
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

epochs = 2
steps = 0  # Initialise a step number in order to keep track of how many batches have been processed
print_every = 4   # During each epoch, we will be printing the results every certain number of steps

train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    # Train the model
    for images, labels in trainloader:
        
        steps += 1
        
        images, labels = images.to(device), labels.to(device)
    
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if steps % print_every == 0:
            # Test the model
            test_loss = 0
            accuracy = 0
            
            model.eval()  # Set the evaluation mode, so that there are no dropouts

            # Turn off gradients for validation, saves memory and computations
            with torch.no_grad():
                for images, labels in testloader:

                    images, labels = images.to(device), labels.to(device)

                    log_ps = model(images)
                    test_loss += criterion(log_ps, labels)

                    ps = torch.exp(log_ps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor))
                    
            print("Epoch: {}/{}.. ".format(e+1, epochs),
            "Training Loss: {:.3f}.. ".format(running_loss/print_every),
            "Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),
            "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
        
        model.train()   # Set the training mode, so that there are dropout is switched back on

Epoch: 1/2..  Training Loss: 1.078..  Test Loss: 0.492..  Test Accuracy: 0.652
Epoch: 1/2..  Training Loss: 1.712..  Test Loss: 0.381..  Test Accuracy: 0.805
Epoch: 1/2..  Training Loss: 2.101..  Test Loss: 0.174..  Test Accuracy: 0.974
Epoch: 1/2..  Training Loss: 2.371..  Test Loss: 0.151..  Test Accuracy: 0.961
Epoch: 1/2..  Training Loss: 2.575..  Test Loss: 0.090..  Test Accuracy: 0.976
Epoch: 1/2..  Training Loss: 2.739..  Test Loss: 0.074..  Test Accuracy: 0.977
Epoch: 1/2..  Training Loss: 2.909..  Test Loss: 0.104..  Test Accuracy: 0.960
Epoch: 1/2..  Training Loss: 3.284..  Test Loss: 0.129..  Test Accuracy: 0.949
Epoch: 1/2..  Training Loss: 3.476..  Test Loss: 0.088..  Test Accuracy: 0.968
Epoch: 1/2..  Training Loss: 3.679..  Test Loss: 0.055..  Test Accuracy: 0.980
Epoch: 1/2..  Training Loss: 3.885..  Test Loss: 0.055..  Test Accuracy: 0.979
Epoch: 1/2..  Training Loss: 4.137..  Test Loss: 0.086..  Test Accuracy: 0.969
Epoch: 1/2..  Training Loss: 4.386..  Test Loss: 0.0

KeyboardInterrupt: ignored

There is no need to keep training and testing, since it is already clear that this neural network is already accurate enough.