# Transfer Learning

Transfer Learning is a machine learning method where we **reuse a pre-trained model** as the starting point for a model **on a new task**.

The [`torchvision.models`](https://pytorch.org/vision/stable/models.html) includes models which were trained for different tasks:
- image classification
- pixelwise semantic segmentation
- object detection
- instance segmentation
- person keypoint detection
- video classification
- optical flow.

----------------
Example: 
```python
import torchvision.models as models
alexnet = models.alexnet() # constructs the model with random weights
alexnet = models.alexnet(pretrained=True) 
```
Image classification models were trained on ImageNet. Thus, the models expect:
- Input images: 3-channel RGB images in range [0, 1] and with shape (3 x H x W); where H & W >= 224.
- Images should be normalized with `
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`

An example can be found [here](https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101).

------------------

Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.



In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

import os

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
data_dir = r'E:\POSTDOC\PYTHON_CODES\DATASETS\dogs-vs-cats-kaggle'
train_dir = os.path.join(data_dir, 'train')
test_dir = os.path.join(data_dir, 'test1')

batch_size = 512

train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

train_data = datasets.ImageFolder(train_dir, transform=train_transforms)
test_data = datasets.ImageFolder(test_dir, transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

print(f'number of batches with size {batch_size} in the training set is {len(trainloader)}')

number of batches with size 512 in the training set is 49


## Load a pretrained model

Note: the loaded checkpoint will be saved somewhere like this `C:\Users\ashkan/.cache\torch\checkpoints\densenet121-a639ec97.pth`

In [3]:
model = models.resnet18(pretrained=True)
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Co

This model has two main parts: the feature extractor and the classifier. 

<img src='assets/transfer_learning.png' width=600px>

For the above model, we need to replace `(fc): Linear(in_features=512, out_features=1000, bias=True)`.

In [4]:
# Freeze all parameters. Thus, backpropagation won't go through them.
for param in model.parameters():
    param.requires_grad = False

# Replace the classifier part
model.fc = nn.Linear(in_features=512, out_features=2, bias=True) 

criterion = nn.CrossEntropyLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

In [5]:
if torch.cuda.is_available():
    my_device = torch.device('cuda')
else:
    my_device = torch.device('cpu')
print('Device: {}'.format(my_device))

model.to(my_device);

Device: cuda


In [6]:
def train_loop(trainloader, model, criterion, optimizer, testloader = None):
    
    if testloader is not None:
        steps = 0
        print_every = 5 # Evaluate the model every 5 steps within each epoch.
        running_loss = 0
    
    total_loss = 0
    
    for images, labels in trainloader:
        
        images = images.to(my_device)
        labels = labels.to(my_device)
        
        optimizer.zero_grad()

        log_ps = model(images)
        loss = criterion(log_ps, labels)
        
        total_loss += loss.item()
        running_loss += loss.item()
        
        loss.backward()
        optimizer.step()
        
        # Additional step to print out the intermediate results 
        #  before finishing one epoch.
        if testloader is not None:
            if steps % print_every == 0:
                test_loss,test_acc = test_loop(testloader, model, criterion)

                print(f"---> Train loss: {running_loss/print_every:.3f}.. "
                      f"Test loss: {test_loss:.3f}.. "
                      f"Test accuracy: {test_acc:.3f}")
                
                running_loss = 0
            
    train_loss = total_loss / len(trainloader.dataset)
    
    return train_loss

def test_loop(testloader, model, criterion):
    tot_test_loss = 0
    test_correct = 0  # Number of correct predictions on the test set

    # Turn off gradients for validation, saves memory and computations
    with torch.no_grad():
        
        # set model to evaluation mode
        model.eval()
        
        for images, labels in testloader:
            
            images = images.to(my_device)
            labels = labels.to(my_device)
            
            log_ps = model(images)
            loss = criterion(log_ps, labels)
            tot_test_loss += loss.item()

            ps = torch.exp(log_ps)
            top_p, top_class = ps.topk(1, dim=1)
            equals = top_class == labels.view(*top_class.shape)
            test_correct += equals.sum().item()
    
    test_loss = tot_test_loss / len(testloader.dataset)
    test_acc = test_correct / len(testloader.dataset)
    
    # set model back to train mode
    model.train()
    
    return test_loss, test_acc

In [None]:
epochs = 1

train_losses, test_losses = [], []
    
for e in range(epochs):
    print(f'Epoch: {e+1}/{epochs}')
    
    train_loss = train_loop(trainloader, model, criterion, optimizer, testloader)
    
    test_loss,test_acc = test_loop(testloader, model, criterion)

    # Keep track of losses at the completion of epoch
    train_losses.append(train_loss)
    test_losses.append(test_loss)

    print("End of Epoch: {}/{}.. ".format(e+1, epochs),
          "Training Loss: {:.3f}.. ".format(train_loss),
          "Test Loss: {:.3f}.. ".format(test_loss),
          "Test Accuracy: {:.3f}".format(test_acc))

Epoch: 1/1
---> Train loss: 0.183.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.022.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.004.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.001.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
---> Train los

**Exercise**

1. Split the train set into train and validation subsets, and use the validation set during training.
2. Use features extracted by the first layer (layer 1) and apply a FC layer as a classifier. How would be the accuracy?
3. Use features extracted by the third layer (layer 3) and apply a FC layer. How much the accuracy will change?
4. Use `nn.NLLLoss()` instead of `nn.CrossEntropyLoss()` as the criterion. What changes should be made?
5. Use other models such as `densenet121`:
```python
model = models.densenet121(pretrained=True)
# freeze parameters
# add classifier
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)
criterion = nn.NLLLoss()
```

6. Use part of the training set for training the model.

    You can use:

```python

import torch.utils.data as data_utils
import numpy as np

indices = np.random.randint(0, len(dataset), size=(500, 1))
dataset_500 = data_utils.Subset(dataset, indices)
    
```

    or:

```python

dataset_500 = torch.utils.data.Subset(dataset, np.random.choice(len(dataset), 500, replace=False))
    
```