# Transfer Learning
_How can we re-use models that have already been trained?_<br>
Depending on the task, we may want to salvage other models for parts, perhaps we're expanding functionality or have a similar problem to one we've solved before. Instead of starting from scratch every time, depending on the task, we can use pre-trained models as a starting point and then tweak them to our specific needs.
<p>
Convolutional NNs are often used for feature extraction, with the layers starting with low level features like edges and shapes, and progressing up to more advanced features. We can simply tune a previously trained model's weights with a new dataset to achieve decent results much faster than retraining from scratch.
<p>
Typically we will need to train a new classifier, as in most cases we will be using a model to detect new classes and possibly a different amount of total classes.

In [20]:
# Imports
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

*Note: majority of pre-trained models from torchvision require images to be 224x224px in size, but other models may have different requirements.* We will also need to perform the same normalisation used when the model was being trained.
<br>***Make sure to understand how each model you use actually works and what inputs it requires.***
<p>For this model specifically, each color channel was normalised seperately.
<br>Means: [0.485, 0.456, 0.406]
<br>Standard Deviations: [0.229, 0.224, 0.225]
<p> Let's start by loading our data, in this case we will be using Cats and Dogs.

In [21]:
data_dir = "data/Cat_Dog_data"
means = [0.485, 0.456, 0.406]
stds = [0.229, 0.224, 0.225]

# Resize the data and augment it, then normalise
train_transforms = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.RandomRotation(45),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(means, stds)
])

# Resize and normalise
test_transforms = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(means, stds)
])

# Pass transforms in here, then run the next cell to see how the transforms look
import os
train_data = datasets.ImageFolder(os.path.join(data_dir, "train"), transform=train_transforms)
test_data = datasets.ImageFolder(os.path.join(data_dir, "test"), transform=test_transforms)

trainloader = DataLoader(train_data, batch_size=128, shuffle=True)
testloader = DataLoader(test_data, batch_size=128, shuffle=False)

In [22]:
# Pytorch offers a variety of pretrained models https://pytorch.org/docs/0.3.0/torchvision/models.html
# We will be using one called "DenseNet"
model = models.densenet169(pretrained=True)
# You can take a look at a massive list of information with...
model
# But the main thing we care about is the number of outputs the model gives,
# As that is what we need to pass to a classifier
model.classifier

Linear(in_features=1664, out_features=1000, bias=True)

The DenseNet model was trained using the [ImageNet](https://image-net.org) dataset, which is over 14 million images, although the classifications are not the same, we can use the feature detection part of the model and just attach a new classifier to it.
<br>_Note: For image recognition, pre-trained models like this are VERY good at detection features, we should make use of them!_
<p>As this model is already sufficient we have no need to adjust it, so we will lock/freeze it by disabling gradient calculation.

In [23]:
for param in model.parameters():
    param.requires_grad = False

Now we will make our own classifier for the Cats and Dogs data we are using
<br>_Remember: The classifier needs to accept the output of the model!_

In [24]:
from collections import OrderedDict
# Input size required: 1664
cat_dog_classifier = nn.Sequential(OrderedDict([
    ("fc1", nn.Linear(in_features=1664, out_features=64)),
    ("relu",nn.ReLU()),
    ("fc2",nn.Linear(in_features=64, out_features=10)),
    ("output", nn.LogSoftmax(dim=1)),
]))

# Overwrite the old classifier
model.classifier = cat_dog_classifier

## Training the new classifier
This model is incredibly complex compared to what we are used to, were now at the point where performance and memory usage are pretty important. So we will be using CUDA if available to speed up processing.
<br>_The difference can be factors of 100, use whenever possible_

In [25]:
#torch.cuda.is_available()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device);

In [26]:
# Recap
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Use cuda
model = models.densenet169(pretrained=True) # Load a pretrained model
for param in model.parameters(): # Freeze the model
    param.requires_grad = False
model.classifier = cat_dog_classifier # Replace the old classifier
criterion = nn.NLLLoss() # Choose a loss function
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003) # Optimise only the classifier
model.to(device); # Move it all to the GPU

In [28]:
epochs = 3
steps = 0
running_loss = 0
print_every = 50
max_accuracy = 0
accuracy_results = []

for epoch in range(epochs):
    print(f"Epoch {epoch+1}")
    print(steps)
    for inputs, labels in trainloader:
        steps += 1
        print(f"Step: {steps}")
        # Move input and label tensors to the default device
        inputs, labels = inputs.to(device), labels.to(device)

        # write the training loop. call the loss "loss" so that the line below will work
        #print("> Making predictions")
        predictions = model(inputs)
        #print("> Calculating loss")
        loss = criterion(predictions, labels)
        #print("> Backpropogating")# Compute the loss
        loss.backward()
        #print("> Stepping")# Backpropogate
        optimizer.step()
        #print("> Resetting gradients")# Update the parameters (weights and biases)
        optimizer.zero_grad() # Reset gradients

        running_loss += loss.item()
        #print(f"> Running loss: {running_loss}")

        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            print("~ Changing to evaluation mode")
            model.eval()
            # ...
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    test_loss += batch_loss.item()

                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    # Save the best model
                    if accuracy >= max_accuracy:
                        max_accuracy = accuracy
                        torch.save(model.state_dict(), 'catordog.epic')


            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            accuracy_results.append([steps, accuracy/len(testloader)])
            running_loss = 0
            print("~ Reverting to training mode")
            model.train()
            # REMEMBER TO REACTIVATE THE TRAIN MODE

Epoch 1
0
Step: 1
Step: 2
Step: 3
Step: 4
Step: 5
Step: 6
Step: 7
Step: 8
Step: 9
Step: 10
Step: 11
Step: 12
Step: 13
Step: 14
Step: 15
Step: 16
Step: 17
Step: 18
Step: 19
Step: 20
Step: 21
Step: 22
Step: 23
Step: 24
Step: 25
Step: 26
Step: 27
Step: 28
Step: 29
Step: 30
Step: 31
Step: 32
Step: 33
Step: 34
Step: 35
Step: 36
Step: 37
Step: 38
Step: 39
Step: 40
Step: 41
Step: 42
Step: 43
Step: 44
Step: 45
Step: 46
Step: 47
Step: 48
Step: 49
Step: 50
~ Changing to evaluation mode
Epoch 1/3.. Train loss: 0.071.. Test loss: 0.050.. Test accuracy: 0.982
~ Reverting to training mode
Step: 51
Step: 52
Step: 53
Step: 54
Step: 55
Step: 56
Step: 57
Step: 58
Step: 59
Step: 60
Step: 61
Step: 62
Step: 63
Step: 64
Step: 65
Step: 66
Step: 67
Step: 68
Step: 69
Step: 70
Step: 71
Step: 72
Step: 73
Step: 74
Step: 75
Step: 76
Step: 77
Step: 78
Step: 79
Step: 80
Step: 81
Step: 82
Step: 83
Step: 84
Step: 85
Step: 86
Step: 87
Step: 88
Step: 89
Step: 90
Step: 91
Step: 92
Step: 93
Step: 94
Step: 95
Step: 96
Step

In [32]:
for step, result in accuracy_results:
    print(f"@ Step {step}: {result} accuracy")


@ Step 50: 0.9820772051811218 accuracy
@ Step 100: 0.9786075353622437 accuracy
@ Step 150: 0.9763097435235977 accuracy
@ Step 200: 0.98203125 accuracy
@ Step 250: 0.983984375 accuracy
@ Step 300: 0.9859834551811218 accuracy
@ Step 350: 0.973828125 accuracy
@ Step 400: 0.9684972435235977 accuracy
@ Step 450: 0.9875459551811219 accuracy
@ Step 500: 0.9875459551811219 accuracy
