<a href="https://colab.research.google.com/github/aljebraschool/Deep_Learning_Fundamental_Projects/blob/main/Part_8_Transfer_Learning_(Exercises).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer Learning

In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html).

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!unzip /content/drive/MyDrive/cat_and_dog_data.zip -d /content/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/Cat_Dog_data/train/cat/cat.1199.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.3551.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.9293.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.7597.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.100.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.1831.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.6473.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.11946.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.8035.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.5093.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.376.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.3416.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.9303.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.9122.jpg  
  inflating: /content/Cat_Dog_data/train/cat/cat.3565.jpg  
  inflating: /content/Cat_Dog_data/t

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [None]:
data_dir = '/content/Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(260),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

In [None]:
model = models.densenet121(pretrained=True)

model

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /root/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth
100%|██████████| 30.8M/30.8M [00:00<00:00, 167MB/s]


DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [None]:
images,labels = next(iter(trainloader))

images = images[0].shape

print(images)

torch.Size([3, 224, 224])


This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [None]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))

model.classifier = classifier

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [None]:
import time

In [None]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if ii==3:
            break

    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

Device = cpu; Time per batch: 5.389 seconds
Device = cuda; Time per batch: 0.008 seconds


In [None]:
## TODO: Use a pretrained model to classify the cat and dog images

#build a new model using the pretrained densenet model
model = models.densenet121(pretrained = True )

In [None]:
#import optim for optimization and step
import torch.optim as optim


#loop through the parameters(weights and bias) in the model (just created)
for param in model.parameters():
  param.requires_grad = False

#new build a new classifier part for the model
classifier = nn.Sequential(OrderedDict([
                                ('fc1', nn.Linear(1024, 256)),
                                ('relu', nn.ReLU()),
                                ('dropout', nn.Dropout(0.2)),
                                ('fc2', nn.Linear(256, 2)),
                                ('output', nn.LogSoftmax(dim = 1))]))



model.classifier = classifier

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr = 0.05)

model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

In [None]:

epochs = 1

for e in range(epochs):
  train_loss = 0

  #loop through the train data set
  for images, labels in trainloader:

    #clear any previous gradient
    optimizer.zero_grad()
    #images = images.view(images.shape[0], -1)


    #train the dataset
    log_ps = model.forward(images)

    #find the loss
    loss = criterion(log_ps, labels)

#This is the section where the model is being trained to identify patterns in the data, by default dropout is applied in this section
    #do back propagation
    loss.backward()

    #optimize your steps
    optimizer.step()

    #total loss
    train_loss += loss.item()

  else:
      test_loss = 0
      accuracy = 0


      #this block turns off the gradient
      with torch.no_grad():

      #this turns off the dropout technique cus we don't want our data to randomly drop out weights during validation
        model.eval()

        #model the data set
        for images, labels in testloader:
          log_ps = model.forward(images)
          loss = criterion(log_ps, labels)
          test_loss += loss.item()

          #take the exponent of the log probabality
          ps = torch.exp(log_ps)

          #get the highest probability for each batch accross the column(class)
          top_p, top_class = ps.topk(1, dim = 1)

          #check if the class probability value is equal to the actual value
          equals = top_class == labels.view(*top_class.shape)

          #find the average of the accurate values predicted by the model
          accuracy += torch.mean(equals.type(torch.FloatTensor))

      #when you're done testing, turn on your dropout again
      model.train()

      #find the average of the loss of each data set
      train_loss = train_loss / len(trainloader)
      test_loss = test_loss/len(testloader)

      print("Epoch: {}/{}..".format(e+1, epochs),
            "Train Loss: {:>5.3f}..".format(train_loss),
            "Test Loss: {:>5.3f}..".format(test_loss),
            "Accuracy: {:>5.3f}..".format(accuracy/len(testloader)))








Epoch: 1/1.. Train Loss: 0.190.. Test Loss: 0.054.. Accuracy: 0.982..


In [None]:
#save my model for future use
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

In [None]:
#check the model keys - numbers of hidden lays and its learned parameter (see last variable at the bottom(classifier))
model.state_dict().keys()

In [None]:
#save the model state dictionary
model_state = model.state_dict()

#this is saved in the file "cat_dog_classifier_model"
torch.save(model_state, 'cat_dog_classifier_model.pth')