# Transfer Learning

Transfer Learning is a machine learning method where we **reuse a pre-trained model** as the starting point for a model **on a new task**.

The [`torchvision.models`](https://pytorch.org/vision/stable/models.html) includes models which were trained for different tasks:
- image classification
- pixelwise semantic segmentation
- object detection
- instance segmentation
- person keypoint detection
- video classification
- optical flow.

----------------
Example: 
```python
import torchvision.models as models
alexnet = models.alexnet() # constructs the model with random weights
alexnet = models.alexnet(pretrained=True) 
```
Image classification models were trained on ImageNet. Thus, the models expect:
- Input images: 3-channel RGB images in range [0, 1] and with shape (3 x H x W); where H & W >= 224.
- Images should be normalized with `
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`

An example can be found [here](https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101).

------------------

Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.



In [1]:
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

import os

  from .autonotebook import tqdm as notebook_tqdm


## Prepare Data

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
data_dir = r'E:\POSTDOC\PYTHON_CODES\DATASETS\dogs-vs-cats-kaggle'
train_dir = os.path.join(data_dir, 'train')

In [3]:
batch_size=512

train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),              # PIL image ===> NCHW [0,1]
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

In [4]:
train_dataset = datasets.ImageFolder(train_dir, transform=train_transforms)
valid_dataset = datasets.ImageFolder(train_dir, transform=test_transforms)

In [5]:
print(f'Train dataset object size {len(train_dataset)}')
print(f'Validation dataset object size {len(train_dataset)}')

Train dataset object size 25000
Validation dataset object size 25000


In [6]:
N_data = len(train_dataset)
N_valid = int(0.2*N_data)
N_train = int(N_data - N_valid)

print(N_train)
print(N_valid)

20000
5000


In [7]:
import random
import numpy as np

In [8]:
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)  # Sets the seed for generating random numbers.
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

In [9]:
seed_value = 12345

set_seed(seed_value)
trainset, _ = torch.utils.data.random_split(train_dataset, [N_train, N_valid])
set_seed(seed_value)
_, validset = torch.utils.data.random_split(valid_dataset, [N_train, N_valid])

In order to have a shorter training time, let's reduce the number of samples:

In [10]:
from torch.utils.data import SequentialSampler

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, sampler=SequentialSampler(range(4096)))
validloader = torch.utils.data.DataLoader(validset, batch_size=batch_size, sampler= SequentialSampler(range(1024)))

print(f'number of batches with size {batch_size} in the training set is {len(trainloader)}')
print(f'number of batches with size {batch_size} in the validation set is {len(validloader)}')

number of batches with size 512 in the training set is 8
number of batches with size 512 in the validation set is 2


Note that in this case, the dataset attribute of `DataLoader`\'s object cannot represent the number of samples used for training.

In [11]:
print(len(trainloader.dataset))
print(len(validloader.dataset))

20000
5000


In [12]:
print(len(trainloader.sampler))
print(len(validloader.sampler))

4096
1024


In [13]:
trainloader.batch_size

512

Note: Instead of this relatively tricky procedure in order to split training data into train and validation subsets, you can separate data outside of PyTorch's environment. Then, just load them using `datasets.ImageFolder`. For example, download the dataset from http://files.fast.ai/data/dogscats.zip. 

## Load a pretrained model

Note: the loaded checkpoint will be saved somewhere like this `C:\Users\ashkan/.cache\torch\checkpoints\densenet121-a639ec97.pth`

In [14]:
model = models.resnet18(pretrained=True)
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

This model has two main parts: the feature extractor and the classifier. 

<img src='assets/transfer_learning.png' width=600px>

For the above model, we need to replace `(fc): Linear(in_features=512, out_features=1000, bias=True)`.

## Prepare the model

In [15]:
# Freeze all parameters. Thus, backpropagation won't go through them.
for param in model.parameters():
    param.requires_grad = False

# Replace the classifier part
model.fc = nn.Linear(in_features=512, out_features=2, bias=True) 

criterion = nn.CrossEntropyLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

In [16]:
if torch.cuda.is_available():
    my_device = torch.device('cuda')
else:
    my_device = torch.device('cpu')
print('Device: {}'.format(my_device))

model.to(my_device);

Device: cuda


## Train the model

In [17]:
def train_loop(trainloader, model, criterion, optimizer, testloader = None):
    
    if testloader is not None:
        steps = 0
        print_every = 5 # Evaluate the model every 5 steps within each epoch.
        running_loss = 0
    
    total_loss = 0
    
    for images, labels in trainloader:
        
        images = images.to(my_device)
        labels = labels.to(my_device)
        
        optimizer.zero_grad()  

        log_ps = model(images)
        loss = criterion(log_ps, labels)
        
        total_loss += loss.item()
        running_loss += loss.item()
        
        loss.backward()
        optimizer.step()
        
        # Additional step to print out the intermediate results 
        #  before finishing one epoch.
        if testloader is not None:
            if steps % print_every == 0:
                test_loss,test_acc = test_loop(testloader, model, criterion)

                print(f"---> Train loss: {running_loss/(print_every*trainloader.batch_size):.6f}.. "
                      f"Test loss: {test_loss:.6f}.. "
                      f"Test accuracy: {test_acc:.3f}")
                
                running_loss = 0
            
#     train_loss = total_loss / len(trainloader.dataset)
    train_loss = total_loss / len(trainloader.sampler)
    
    return train_loss

def test_loop(testloader, model, criterion):
    tot_test_loss = 0
    test_correct = 0  # Number of correct predictions on the test set

    # Turn off gradients for validation, saves memory and computations
    with torch.no_grad():
        
        # set model to evaluation mode
        model.eval()
        
        for images, labels in testloader:
            
            images = images.to(my_device)
            labels = labels.to(my_device)
            
            log_ps = model(images)
            loss = criterion(log_ps, labels)
            tot_test_loss += loss.item()

            ps = torch.exp(log_ps)
            top_p, top_class = ps.topk(1, dim=1)
            equals = top_class == labels.view(*top_class.shape)
            test_correct += equals.sum().item()
    
    # test_loss = tot_test_loss / len(testloader.dataset)
    # test_acc = test_correct / len(testloader.dataset)
    test_loss = tot_test_loss / len(testloader.sampler)
    test_acc = test_correct / len(testloader.sampler)
    
    # set model back to train mode
    model.train()
    
    return test_loss, test_acc

In [18]:
epochs = 2

train_losses, test_losses = [], []
    
for e in range(epochs):
    print(f'Epoch: {e+1}/{epochs}')
    
    train_loss = train_loop(trainloader, model, criterion, optimizer, validloader)
    
    valid_loss,valid_acc = test_loop(validloader, model, criterion)

    # Keep track of losses at the completion of epoch
    train_losses.append(train_loss)
    test_losses.append(valid_loss)

    print("End of Epoch: {}/{}.. ".format(e+1, epochs),
          "Training Loss: {:.6f}.. ".format(train_loss),
          "Valid Loss: {:.6f}.. ".format(valid_loss),
          "Valid Accuracy: {:.3f}".format(valid_acc))

Epoch: 1/2
---> Train loss: 0.000272.. Test loss: 0.000962.. Test accuracy: 0.820
---> Train loss: 0.000212.. Test loss: 0.000746.. Test accuracy: 0.846
---> Train loss: 0.000188.. Test loss: 0.000553.. Test accuracy: 0.936
---> Train loss: 0.000155.. Test loss: 0.000478.. Test accuracy: 0.946
---> Train loss: 0.000139.. Test loss: 0.000388.. Test accuracy: 0.950
---> Train loss: 0.000122.. Test loss: 0.000344.. Test accuracy: 0.956
---> Train loss: 0.000111.. Test loss: 0.000327.. Test accuracy: 0.954
---> Train loss: 0.000102.. Test loss: 0.000294.. Test accuracy: 0.957
End of Epoch: 1/2..  Training Loss: 0.000813..  Valid Loss: 0.000294..  Valid Accuracy: 0.957
Epoch: 2/2
---> Train loss: 0.000111.. Test loss: 0.000264.. Test accuracy: 0.957
---> Train loss: 0.000096.. Test loss: 0.000252.. Test accuracy: 0.958
---> Train loss: 0.000084.. Test loss: 0.000243.. Test accuracy: 0.958
---> Train loss: 0.000091.. Test loss: 0.000236.. Test accuracy: 0.956
---> Train loss: 0.000089.. Test

## Test your model on Kaggle's test set and create a submission file

As you know, for this dataset, you do not have a separate test set with labels. You have test data inside a folder named `test1`. For these images, labels are not provided. 

In [19]:
from torch.utils.data import Dataset, DataLoader, ConcatDataset
from PIL import Image

class DataWithNoLabels(Dataset):
    def __init__(self, file_list, dirpath , transform = None):
        self.file_list = file_list
        self.dir = dirpath
        self.transform = transform
            
    def __len__(self):
        return len(self.file_list)
    
    def __getitem__(self, idx):
        img = Image.open(os.path.join(self.dir, self.file_list[idx])) # PIL image [0, 255] - H*W*3
        if self.transform:
            img = self.transform(img)
            img = img.numpy()
            return img.astype('float32'), self.file_list[idx]

In [20]:
test_dir = os.path.join(data_dir, 'test1')
test_files = os.listdir(test_dir)

print(len(test_files))

testset = DataWithNoLabels(test_files, test_dir, transform = test_transforms)
print(len(testset))

12500
12500


In [21]:
testloader = DataLoader(testset, batch_size = 256, shuffle=False, 
                        sampler=SequentialSampler(range(1024))) # Don't use it in practice.

In [22]:
# from tqdm.notebook import tqdm
import pandas as pd

model.eval()

fn_list = []    # file names are stored in this list
pred_list = []  # outputs

for x, fn in testloader:  # A Pytorch tensor (x) contains all data in a batch and a list (fn) contains associated file names.
    model.eval()
    with torch.no_grad():
        
        x = x.to(my_device)
        
        output = model(x)
        
        pred = torch.argmax(output, dim=1)
        
        fn_list += [n[:-4] for n in fn]        # remove ".jpg" from file names
        pred_list += [p.item() for p in pred]  # pred is a PyTorch tensor which contains a list 

model.train()

submission = pd.DataFrame({"id":fn_list, "label":pred_list})
submission.to_csv('preds_resnet18.csv', index=False)

In [23]:
print(len(fn))

256


In [24]:
x.shape

torch.Size([256, 3, 224, 224])

In [25]:
with torch.no_grad():
    output = model(x)
    pred = torch.argmax(output, dim=1)
    print(pred)

tensor([0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,
        1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1,
        0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,
        0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
        0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0,
        1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0,
        0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
        0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1,
        1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1], device='cuda:0')


**Exercise**

1. Split the train set into train and validation subsets, and use the validation set during training.
2. Use features extracted by the first layer (layer 1) and apply a FC layer as a classifier. How would be the accuracy?
3. Use features extracted by the third layer (layer 3) and apply a FC layer. How much the accuracy will change?
4. Use `nn.NLLLoss()` instead of `nn.CrossEntropyLoss()` as the criterion. What changes should be made?
5. Use other models such as `densenet121`:
```python
model = models.densenet121(pretrained=True)
# freeze parameters
# add classifier
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)
criterion = nn.NLLLoss()
```

6. Use part of the training set for training the model.

    You can use:

```python

import torch.utils.data as data_utils
import numpy as np

indices = np.random.randint(0, len(dataset), size=(500, 1))
dataset_500 = data_utils.Subset(dataset, indices)
    
```

    or:

```python

dataset_500 = torch.utils.data.Subset(dataset, np.random.choice(len(dataset), 500, replace=False))
    
```

In [26]:
# 
# testset = datasets.ImageFolder(test_dir, transform=test_transforms)  # Note: test set is not divided into separate folders