# Transfer Learning

We will use pre-trained networks to solve challenging problems in computer vision. Specifically, we will use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/stable/torchvision/models.html).

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It is used to train deep neural networks using an architecture called convolutional layers.

Once trained, these models work astonishingly well as feature detectors for images they were not trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we will use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your application. We will include `models` in our imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pretrained models require the input to be 224x224 images. We also need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`. 

In [11]:
# Uncomment these lines to download and unzip files as  directory
#!wget https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip
#!unzip Cat_Dog_data.zip

In [3]:
data_dir = 'Cat_Dog_data'

# Define Transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])
test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here
train_data = datasets.ImageFolder(data_dir, transform=train_transforms)
test_data = datasets.ImageFolder(data_dir, transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, 
                                          shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as `DenseNet`. Let's print out the model architecture so we can see what is going on.

In [8]:
model = models.densenet121(pretrained=True)
#model

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it will not work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, pre-trained networks can be viewed as good feature detectors that can be used as the input for simple feed-forward classifiers.

In [10]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
  param.requires_grad = False

from collections import OrderedDict

classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))

model.classifier = classifier

With our model built, we need to train the classifier. However, now we are using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we are going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It is also possible to train on multiple GPUs, further decreasing training time.