In [0]:
# ms-python.python added
import os
try:
	os.chdir(os.path.join(os.getcwd(), 'intro-to-pytorch'))
	print(os.getcwd())
except:
	pass

In [0]:
from IPython import get_ipython


 # Transfer Learning

 In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html).

 ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

 Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

 With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [0]:
get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from tqdm import tqdm


Download and extract dataset.

In [4]:
%%bash
if [ -d "Cat_Dog_data" ]; then
    echo 'Dataset already exists' >&2
else
    wget -cq https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip
    unzip -qq Cat_Dog_data.zip
fi

Dataset already exists


 Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [0]:
data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([
    transforms.RandomRotation(180),
    transforms.RandomHorizontalFlip(),
    transforms.RandomResizedCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize(255),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)


 We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [6]:
model = models.densenet121(pretrained=True)
model


DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

 This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [0]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier


 With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

 PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [0]:
import time



In [9]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")


Device = cpu; Time per batch: 3.529 seconds
Device = cuda; Time per batch: 0.008 seconds


 You can write device agnostic code which will automatically use CUDA if it's enabled like so:
 ```python
 # at beginning of the script
 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 ...

 # then whenever you get a new Tensor or Module
 # this won't copy if they are already on the desired device
 input = data.to(device)
 model = MyModule(...).to(device)
 ```

 From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

 >**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen.

In [10]:
## Use a pretrained model to classify the cat and dog images
model = models.resnet18(pretrained=True)
model


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [0]:
for param in model.parameters():
    param.requires_grad = False


In [0]:
classifier = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(512, 2)),
    ('output', nn.LogSoftmax(dim=1))
]))
model.fc = classifier


In [13]:
if torch.cuda.is_available():
    model = model.cuda()

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

epochs = 30
for epoch in tqdm(range(epochs)):
    model.train()
    train_loss = 0
    for inputs, labels in trainloader:
        if torch.cuda.is_available():
            inputs = inputs.cuda()
            labels = labels.cuda()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        train_loss += loss.item()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    else:
        with torch.no_grad():
            model.eval()
            val_loss = 0
            accuracy = 0
            for inputs, labels in testloader:
                if torch.cuda.is_available():
                    inputs = inputs.cuda()
                    labels = labels.cuda()

                outputs = model.forward(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                _, predictions = outputs.topk(1)
                equals = (labels == predictions.view(*labels.shape))
                accuracy += torch.mean(equals.type(torch.FloatTensor))
                
            train_loss = train_loss / len(trainloader)
            val_loss = val_loss / len(testloader)
            accuracy = accuracy / len(testloader)
            print('Train loss {}'.format(train_loss))
            print('Val loss {}'.format(val_loss))
            print('Val accuracy {}%'.format(accuracy*100))


  3%|▎         | 1/30 [02:15<1:05:18, 135.13s/it]

Train loss 0.33118793565187266
Val loss 0.08026151219382882
Val accuracy 97.03125%


  7%|▋         | 2/30 [04:29<1:03:01, 135.05s/it]

Train loss 0.2767910262866115
Val loss 0.0696010010316968
Val accuracy 97.3828125%


 10%|█         | 3/30 [06:45<1:00:46, 135.07s/it]

Train loss 0.28569397482682357
Val loss 0.06851580685470253
Val accuracy 97.109375%


 13%|█▎        | 4/30 [08:59<58:29, 134.99s/it]  

Train loss 0.28135786622508685
Val loss 0.06865878372918814
Val accuracy 97.265625%


 17%|█▋        | 5/30 [11:14<56:10, 134.82s/it]

Train loss 0.28072998141446576
Val loss 0.06710013614501804
Val accuracy 97.1484375%


 20%|██        | 6/30 [13:29<53:55, 134.81s/it]

Train loss 0.2830399453047324
Val loss 0.0696104429429397
Val accuracy 97.1875%


 23%|██▎       | 7/30 [15:43<51:40, 134.80s/it]

Train loss 0.2833266179077327
Val loss 0.07396828016499057
Val accuracy 97.1875%


 27%|██▋       | 8/30 [17:58<49:25, 134.82s/it]

Train loss 0.2764005506072532
Val loss 0.06834228369407355
Val accuracy 97.34375%


 30%|███       | 9/30 [20:12<47:06, 134.58s/it]

Train loss 0.2800450504926795
Val loss 0.09028112052474171
Val accuracy 96.484375%


 33%|███▎      | 10/30 [22:29<45:03, 135.16s/it]

Train loss 0.2852176328782331
Val loss 0.14255161419278012
Val accuracy 94.4140625%


 37%|███▋      | 11/30 [24:44<42:49, 135.25s/it]

Train loss 0.2882022143299268
Val loss 0.07821638800669461
Val accuracy 96.9921875%


 40%|████      | 12/30 [26:59<40:30, 135.01s/it]

Train loss 0.27772186434065754
Val loss 0.08434158670715988
Val accuracy 96.640625%


 43%|████▎     | 13/30 [29:13<38:12, 134.83s/it]

Train loss 0.2743874869986691
Val loss 0.08247254763264208
Val accuracy 96.796875%


 47%|████▋     | 14/30 [31:28<35:57, 134.87s/it]

Train loss 0.2766892606477169
Val loss 0.07270547600928694
Val accuracy 97.265625%


 50%|█████     | 15/30 [33:43<33:44, 134.99s/it]

Train loss 0.2938603833741085
Val loss 0.07275097200181335
Val accuracy 97.109375%


 53%|█████▎    | 16/30 [35:58<31:28, 134.86s/it]

Train loss 0.2760962632230737
Val loss 0.07386972800595686
Val accuracy 96.8359375%


 57%|█████▋    | 17/30 [38:12<29:10, 134.63s/it]

Train loss 0.2773353308015926
Val loss 0.06888059850316494
Val accuracy 97.2265625%


 60%|██████    | 18/30 [40:26<26:53, 134.43s/it]

Train loss 0.2662428885655986
Val loss 0.06682261105161161
Val accuracy 97.3828125%


 63%|██████▎   | 19/30 [42:40<24:38, 134.37s/it]

Train loss 0.2736058107111603
Val loss 0.07136550890281797
Val accuracy 97.3046875%


 67%|██████▋   | 20/30 [44:54<22:23, 134.31s/it]

Train loss 0.28117078680291097
Val loss 0.06842934670858085
Val accuracy 97.265625%


 70%|███████   | 21/30 [47:08<20:07, 134.16s/it]

Train loss 0.2713683175812052
Val loss 0.06976670091971755
Val accuracy 97.1875%


 73%|███████▎  | 22/30 [49:22<17:53, 134.19s/it]

Train loss 0.28212599005465483
Val loss 0.08747871723026038
Val accuracy 96.71875%


 77%|███████▋  | 23/30 [51:36<15:38, 134.09s/it]

Train loss 0.27563721777617256
Val loss 0.07869766046060249
Val accuracy 96.9140625%


 80%|████████  | 24/30 [53:50<13:24, 134.06s/it]

Train loss 0.2927540852688253
Val loss 0.07034993029665201
Val accuracy 96.9140625%


 83%|████████▎ | 25/30 [56:04<11:09, 133.98s/it]

Train loss 0.27066829976287077
Val loss 0.08681103194830939
Val accuracy 96.40625%


 87%|████████▋ | 26/30 [58:18<08:56, 134.01s/it]

Train loss 0.2843304980249906
Val loss 0.09671325564850122
Val accuracy 96.1328125%


 90%|█████████ | 27/30 [1:00:32<06:42, 134.07s/it]

Train loss 0.28756981108083646
Val loss 0.06794032659381628
Val accuracy 97.109375%


 93%|█████████▎| 28/30 [1:02:46<04:28, 134.08s/it]

Train loss 0.28967483788305387
Val loss 0.06590971373952925
Val accuracy 97.3828125%


 97%|█████████▋| 29/30 [1:05:00<02:14, 134.03s/it]

Train loss 0.27740253241394053
Val loss 0.06915921666659415
Val accuracy 97.1875%


100%|██████████| 30/30 [1:07:14<00:00, 133.90s/it]

Train loss 0.268968932161277
Val loss 0.07073029701132327
Val accuracy 97.34375%



