# Pre-trained and transfer learning with PyTorch

You've seen how to implement pre-trained networks and use transfer learning with Keras in [the previous notebook](./2.pretrained_and_transfer_learning_with_keras.ipynb), now it's time to do the same using `pytorch`. The idea and theory is still the same, only the implementation differs.

## 1. Pre-trained model usage

In this chapter, we will learn how to import a **PyTorch** model that has been pre-trained to recognize various objects in full-color images.

### 1.1 Importing an existing model

Head on over to the official [torchvision webpage](https://pytorch.org/vision/stable/models.html) to check out what models are there to pick and choose from for our object recognition task.

For this exercise, **you** can choose whatever network you want. Look online what models are often used, or look at the evaluation metrics on the given webpage to determine which one is fit for the job.

In [12]:

import torch
import torchvision.models as models
from torchsummary import summary

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


# Make a model of your choice. Make sure you import the pre-trained model, and not just the architecture.
model = models.mobilenet_v2()
model = model.to(device)
# Take a look at the model.
input_dim = (3,224, 224)
summary(model, input_dim)

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 1280, 7, 7]          --
|    └─ConvBNReLU: 2-1                   [-1, 32, 112, 112]        --
|    |    └─Conv2d: 3-1                  [-1, 32, 112, 112]        864
|    |    └─BatchNorm2d: 3-2             [-1, 32, 112, 112]        64
|    |    └─ReLU6: 3-3                   [-1, 32, 112, 112]        --
|    └─InvertedResidual: 2-2             [-1, 16, 112, 112]        --
|    |    └─Sequential: 3-4              [-1, 16, 112, 112]        896
|    └─InvertedResidual: 2-3             [-1, 24, 56, 56]          --
|    |    └─Sequential: 3-5              [-1, 24, 56, 56]          5,136
|    └─InvertedResidual: 2-4             [-1, 24, 56, 56]          --
|    |    └─Sequential: 3-6              [-1, 24, 56, 56]          8,832
|    └─InvertedResidual: 2-5             [-1, 32, 28, 28]          --
|    |    └─Sequential: 3-7              [-1, 32, 28, 28]          10,000
|  

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 1280, 7, 7]          --
|    └─ConvBNReLU: 2-1                   [-1, 32, 112, 112]        --
|    |    └─Conv2d: 3-1                  [-1, 32, 112, 112]        864
|    |    └─BatchNorm2d: 3-2             [-1, 32, 112, 112]        64
|    |    └─ReLU6: 3-3                   [-1, 32, 112, 112]        --
|    └─InvertedResidual: 2-2             [-1, 16, 112, 112]        --
|    |    └─Sequential: 3-4              [-1, 16, 112, 112]        896
|    └─InvertedResidual: 2-3             [-1, 24, 56, 56]          --
|    |    └─Sequential: 3-5              [-1, 24, 56, 56]          5,136
|    └─InvertedResidual: 2-4             [-1, 24, 56, 56]          --
|    |    └─Sequential: 3-6              [-1, 24, 56, 56]          8,832
|    └─InvertedResidual: 2-5             [-1, 32, 28, 28]          --
|    |    └─Sequential: 3-7              [-1, 32, 28, 28]          10,000
|  

### 1.2 Preparing your images

As with Keras, pre-trained models are trained on images that went through a specific preprocessing step. To effectively use the model, we need to go through the right preprocessing steps with our images as well.

In [10]:
import torch
from torchvision import transforms
from PIL import Image

# Load your image.
img = Image.open("./../assets/rick.jpg")

# Make a transform object based on your specific model preprocessing transformations.
transform = transforms.Compose([           
  transforms.Resize(256),                   
  transforms.CenterCrop(224),                
  transforms.ToTensor(),                     
  transforms.Normalize(                      
  mean=[0.485, 0.456, 0.406],                
  std=[0.229, 0.224, 0.225]                  
  )])

# Transform your image.
img = transform(img)

# Prepare a batch for the model to accept.
batch = torch.unsqueeze(img, 0)

### 1.3 Predicting the class of your image

Getting a torchvision model to work is as easy as pie, but interpreting the results is a bit more involved...

In [7]:
# Put the model into evaluation mode.
model.eval()

# Predict.
out = model(batch)

# Print out the prediction index with the highest likelihood.
print(out.max(1))

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

To properly interpret this probability index, you need to look up how your pre-trained model final layer is structured. There are some text files floating out there for most popular pre-trained networks that map the indices of the output nodes to actual labels, but the torchvision models somehow do not come with these equipped.

## 2. Transfer learning

### 2.1 Importing, preprocessing and augmenting the data

As with the Keras exercise, we want to retrain part of the CNN on **hot dogs** and **non hot dogs**. This will effectively re-purpose the feature extraction capabilities of the general models to accurately identify hot dogs.

Pytorch's way of augmenting data is also part of the `transforms` sub-library. Augment your data as part of your transform composition to preprocess and augment in one go!

If you have not already done so, get your hot dog dataset [here](https://www.kaggle.com/dansbecker/hot-dog-not-hot-dog)

In [8]:
# Make a transform object based on your specific model preprocessing transformations.
image_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),
        transforms.RandomRotation(degrees=15),
        transforms.RandomHorizontalFlip(),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ]),
    'valid': transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225]),
    ]),
    'test': transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])
}

# Set train and valid directory paths.
train_directory = "./../assets/train"
valid_directory = "./../assets/valid"
test_directory = "./../assets/test"

# Batch size.
bs = 8

# Number of classes.
num_classes = 2

# Load data from folders.
data = {
     'train': datasets.ImageFolder(root=train_directory, transform=image_transforms['train']),
     'valid': datasets.ImageFolder(root=valid_directory, transform=image_transforms['valid']),
     'test': datasets.ImageFolder(root=test_directory, transform=image_transforms['test'])
 }

# Size of data, to be used for calculating average loss and accuracy.
train_data_size = len(data['train'])
valid_data_size = len(data['valid'])
test_data_size = len(data['test'])

# Create iterators for the Data loaded using DataLoader module.
train_data = DataLoader(data['train'], batch_size=bs, shuffle=True)
valid_data = DataLoader(data['valid'], batch_size=bs, shuffle=True)
test_data = DataLoader(data['test'], batch_size=bs, shuffle=True)

NameError: name 'datasets' is not defined

### 2.2 Importing part of an existing model

As with the Keras notebook, we load in a pre-trained model, and freeze the feature extraction layers so they cannot be retrained.

In [None]:
import torchvision.models as models

# Make a model of your choice. Make sure you import the pre-trained model, and not just the architecture.
model = models.mobilenet_v2()
model = model.to(device)

# Freeze model parameters.
for param in model.parameters():
    param.requires_grad = False

### 2.3 Adding flattening and dense layers

Now, we'll replace the the final layer with a concoction of our own!

In [None]:
# Change the final layer of model for Transfer Learning. Make sure you have only two output classes.
model_inputs = model.classifier[1].in_features

model.classifier = nn.Sequential(
    nn.Linear(model_inputs, 64),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, 2),
    nn.Softmax()
)

### 2.4 Training and evaluating the model

In [None]:
from torch import nn, optim

# Create appropriate loss function and optimizer.
loss_func = nn.CrossEntropyLoss
optimizer = optim.Adam(model.parameters())

As PyTorch is sometimes a bit involved, the training and validation loop is given to you here for free:

In [None]:
import time
epochs = 6
history = []

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

for epoch in range(epochs):
        epoch_start = time.time()
        print("Epoch: {}/{}".format(epoch+1, epochs))
        # Set to training mode.
        model.train()
        
        # Loss and accuracy within the epoch.
        train_loss = 0.0
        train_acc = 0.0
        valid_loss = 0.0
        valid_acc = 0.0
        for i, (inputs, labels) in enumerate(train_data_loader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # Clean existing gradients.
            optimizer.zero_grad()
            
            # Forward pass - compute outputs on input data using the model.
            outputs = model(inputs)
            
            # Compute loss.
            loss = loss_criterion(outputs, labels)

            # Backpropagate the gradients.
            loss.backward()

            # Update the parameters.
            optimizer.step()

            # Compute the total loss for the batch and add it to train_loss.
            train_loss += loss.item() * inputs.size(0)

            # Compute the accuracy.
            ret, predictions = torch.max(outputs.data, 1)
            correct_counts = predictions.eq(labels.data.view_as(predictions))
            
            # Convert correct_counts to float and then compute the mean.
            acc = torch.mean(correct_counts.type(torch.FloatTensor))

            # Compute total accuracy in the whole batch and add to train_acc.
            train_acc += acc.item() * inputs.size(0)

            print("Batch number: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}".format(i, loss.item(), acc.item()))
        
        # Validation - No gradient tracking needed.
        with torch.no_grad():
            # Set to evaluation mode.
            model.eval()

            # Validation loop.
            for j, (inputs, labels) in enumerate(valid_data_loader):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # Forward pass - compute outputs on input data using the model.
                outputs = model(inputs)
                
                # Compute loss.
                loss = loss_criterion(outputs, labels)
                
                # Compute the total loss for the batch and add it to valid_loss.
                valid_loss += loss.item() * inputs.size(0)

                # Calculate validation accuracy.
                ret, predictions = torch.max(outputs.data, 1)
                correct_counts = predictions.eq(labels.data.view_as(predictions))

                # Convert correct_counts to float and then compute the mean.
                acc = torch.mean(correct_counts.type(torch.FloatTensor))

                # Compute total accuracy in the whole batch and add to valid_acc.
                valid_acc += acc.item() * inputs.size(0)

                print("Validation Batch number: {:03d}, Validation: Loss: {:.4f}, Accuracy: {:.4f}".format(j, loss.item(), acc.item()))

        # Find average training loss and training accuracy.
        avg_train_loss = train_loss/train_data_size
        avg_train_acc = train_acc/float(train_data_size)

        # Find average training loss and training accuracy.
        avg_valid_loss = valid_loss/valid_data_size
        avg_valid_acc = valid_acc/float(valid_data_size)
        history.append([avg_train_loss, avg_valid_loss, avg_train_acc, avg_valid_acc])
        epoch_end = time.time()
        print("Epoch : {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}%, nttValidation : Loss : {:.4f}, Accuracy: {:.4f}%, Time: {:.4f}s".format(epoch, avg_train_loss, avg_train_acc*100, avg_valid_loss, avg_valid_acc*100, epoch_end-epoch_start))

Finally, let's test our model one last time using the test set. Here's some helper code to make the testing easier:

In [None]:
import matplotlib.pyplot as plt

def predict(model, test_image_name):
    transform = image_transforms['test']
    test_image = Image.open(test_image_name)
    plt.imshow(test_image)
    test_image_tensor = transform(test_image)
    if torch.cuda.is_available():
        test_image_tensor = test_image_tensor.view(1, 3, 224, 224).cuda()
    else:
        test_image_tensor = test_image_tensor.view(1, 3, 224, 224)
    with torch.no_grad():
        model.eval()
        # Model outputs log probabilities
        out = model(test_image_tensor)
        ps = torch.exp(out)
        topk, topclass = ps.topk(1, dim=1)
        idx_to_class = ['Not a hot dog!', 'Yum, hot dog!']

        print("Output class :  ", idx_to_class[topclass.cpu().numpy()[0][0]])

How did you do? How many hot dogs did you successfully identify? Aren't you glad you'll never mistake a banana for a hot dog again?