# Transfer Learning

In this notebook, we'll learn how to use pre-trained networks to solve challenging problems in computer vision. Specifically, we'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers.

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` we can download these pre-trained networks and use them in our applications. We'll include `models` in our imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import time
from collections import OrderedDict
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
data_dir = r"Cat_Dog_data"

train_transforms = transforms.Compose([transforms.RandomRotation(30),                #Rotate image in either direction by 30°
                                       transforms.RandomResizedCrop(224),            #Crop image to random size & aspect ratio,
                                                                                     #crop is finally resized to specified size
                                       transforms.RandomHorizontalFlip(),            #Horizontally flip the image randomly with 
                                                                                     #a given probability; default = 0.5
                                       transforms.ToTensor(),                        #Convert images to PyTorch tensors
                                       transforms.Normalize([0.485, 0.456, 0.406],   #Normalizing images
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),                        #Resize images to 255X255 pixels
                                     transforms.CenterCrop(224),                     #Crop given image at the center with 224X224 pixels
                                     transforms.ToTensor(),                          #Convert images to PyTorch tensors
                                     transforms.Normalize([0.485, 0.456, 0.406],
                                                          [0.229, 0.224, 0.225])])

train_dataset = datasets.ImageFolder(data_dir + "/train", transform = train_transforms)
test_dataset = datasets.ImageFolder(data_dir + "/test", transform = test_transforms)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 64, shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 64)

We can write device agnostic code which will automatically use CUDA if it's enabled:

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture.

In [4]:
model = models.densenet121(pretrained = True)
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general,  pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

Freezing parameters so we don't backprop through them

In [5]:
for param in model.parameters():
    param.requires_grad = False

Replacing Classifier Layer to this Problem Statement

In [6]:
classifier = nn.Sequential(OrderedDict([
                            ("fc1", nn.Linear(1024,256)),
                            ("relu", nn.ReLU()),
                            ("drop_out", nn.Dropout(0.2)),
                            ("fc2", nn.Linear(256,2)),
                            ("output", nn.LogSoftmax(dim=1))]))
model.classifier = classifier
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

Train the model to classify the cat and dog images

In PyTorch, we move our model parameters and other tensors to the GPU memory using `model.to('cuda')`.
Similarly we move images, labels tensors to GPU

In [8]:
criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr = 0.003)

#Assign device to the model
model.to(device)

steps = 0
print_every = 10
epochs = 1
for i in range(epochs):
    running_loss = 0
    for images, labels in train_loader:
        
        #Move images and labels tensors to the GPU
        images, labels = images.to(device), labels.to(device)
        logps = model(images)                                       #or use model.forward(images) but model(images) is more preferable
        loss = criterion(logps, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        steps += 1
        
        if steps % print_every == 0:
            loss_test = 0
            accuracy = 0
            with torch.no_grad():
                model.eval()
                for images, labels in test_loader:
                    
                    #Move images and labels tensors to the GPU
                    images, labels = images.to(device), labels.to(device)
                    logps_test = model(images)
                    loss_test += criterion(logps_test, labels)

                    ps_test = torch.exp(logps_test)
                    top_ps, top_class = ps_test.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor))
                    
            model.train()
            
            print(f'Epoch: {i+1}/{epochs} ',
                  f'Train Loss: {running_loss/print_every:.3f} ',
                  f'Test Loss: {loss_test/len(test_loader):.3f} ',
                  f'Accuracy: {accuracy/len(test_loader):.3f}')
            running_loss = 0

Epoch: 1/1  Train Loss: 0.622  Test Loss: 0.141  Accuracy: 0.970
Epoch: 1/1  Train Loss: 0.235  Test Loss: 0.086  Accuracy: 0.969
Epoch: 1/1  Train Loss: 0.237  Test Loss: 0.059  Accuracy: 0.982
Epoch: 1/1  Train Loss: 0.141  Test Loss: 0.062  Accuracy: 0.975
Epoch: 1/1  Train Loss: 0.209  Test Loss: 0.049  Accuracy: 0.983
Epoch: 1/1  Train Loss: 0.180  Test Loss: 0.058  Accuracy: 0.980
Epoch: 1/1  Train Loss: 0.158  Test Loss: 0.051  Accuracy: 0.981
Epoch: 1/1  Train Loss: 0.149  Test Loss: 0.049  Accuracy: 0.981
Epoch: 1/1  Train Loss: 0.171  Test Loss: 0.055  Accuracy: 0.981
Epoch: 1/1  Train Loss: 0.179  Test Loss: 0.046  Accuracy: 0.984
Epoch: 1/1  Train Loss: 0.145  Test Loss: 0.043  Accuracy: 0.984
Epoch: 1/1  Train Loss: 0.162  Test Loss: 0.092  Accuracy: 0.967
Epoch: 1/1  Train Loss: 0.149  Test Loss: 0.045  Accuracy: 0.982
Epoch: 1/1  Train Loss: 0.177  Test Loss: 0.066  Accuracy: 0.977
Epoch: 1/1  Train Loss: 0.127  Test Loss: 0.052  Accuracy: 0.979
Epoch: 1/1  Train Loss: 0