# IE4424 Part2_Transfer_Learning


## Acknowledgment 

This lab experiment is modified based on the Pytorch official tutorial.

Acknowledgment to Sasank Chilamkurthy https://chsasank.github.io

In this part, you will learn how to train a convolutional neural network for
image classification using transfer learning. You can read more about the transfer
learning at cs231n notes https://cs231n.github.io/transfer-learning/


In [None]:
%matplotlib inline

In [None]:
from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

plt.ion()   # interactive mode

## 1. Loading data


We will use torchvision and torch.utils.data packages for loading the
data.

The problem we're going to solve today is to train a model to classify
**ants** and **bees**. We have about 120 training images each for ants and bees.
There are 75 validation images for each class. Usually, this is a very
small dataset to generalize upon, if trained from scratch. Since we
are using transfer learning, we should be able to generalize reasonably
well.

This dataset is a very small subset of ImageNet.
The dataset is already included in the directory, if not, you can download the data from https://download.pytorch.org/tutorial/hymenoptera_data.zip and extract it to the current directory.


### 1.1 Loading data and defining pytorch dataloader

In [None]:
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=0)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### 1.2 Visualizing a few training images to better understand data augmentation


In [None]:
def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

## 2. Training the model


Now, let's write a general function to train a model. Here, we will
illustrate:

-  Scheduling the learning rate
-  Saving the best model

In the following, parameter ``scheduler`` is an LR scheduler object from
``torch.optim.lr_scheduler``.



### 2.1 Defining function to train the model

In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=24):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
        t1 = time.time()

        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")#2024 modification CPU > GPU
        model = model.to(device)#2024 modification CPU > GPU

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        t2=time.time()
        print('Time:'+str(t2-t1))
        print()
        

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

### 2.2 Defining function to visualize the model prediction



In [None]:
def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title('predicted: {}'.format(class_names[preds[j]]))
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

## 3. Convolutional Network (ConvNet) as fixed feature extractor

We need to freeze all the network parameters except the final layer. We need
to set ``requires_grad == False`` to freeze the parameters so that the
gradients are not computed in ``backward()``.

You can read more about this in the documentation
here https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward


### 3.1 Loading pretrained model and defining new classfier layer

In [None]:
model_resnet18 = torchvision.models.resnet18(pretrained=True)
for param in model_resnet18.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_resnet18.fc.in_features
model_resnet18.fc = nn.Linear(num_ftrs, 2)

model_resnet18 = model_resnet18.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_resnet18.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

### 3.2 Printing the modified model

In [None]:
print(model_resnet18)

## Exercise 2.1 Visualizing network structure using Torchsummury

Visualize the network structure and complete Exercise 2.1 in the answer sheet.

In [None]:
from torchsummary import summary
summary(model_resnet18, input_size=(3, 224, 224)) # Here the input size is channel x width x height

### 3.3 Training using train data and evaluating using validation data

On CPU this will take about half the time compared to the previous scenario.
This is expected as gradients don't need to be computed for most of the
network. However, forward does need to be computed.


In [None]:
model_resnet18 = train_model(model_resnet18, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=5)

### 3.4 Visualizing predictions

In [None]:
visualize_model(model_resnet18)

plt.ioff()
plt.show()

Exercise 2.2 Study of using VGG as fixed feature extractor
----------------------------------
We have shown how to use a ResNet18 network as a feature extractor. Here, you will complete the missing code that uses a VGG network as a feature extractor.

In ResNet18, the classification  layer is called fc, and in VGG the fully connected layers are defined using sequential, where you can use the index to find the specific layer in the sequential.


### E2.2.1 Loading pretrained VGG11 model

In [None]:
model_vgg11 = models.vgg11(pretrained=True)
model_vgg11 = model_vgg11.to(device)#2024 modification CPU > GPU
# Print the original VGG11
print(model_vgg11)

### E2.2.2 Using torchsummary to show the original model

In [None]:
summary(model_vgg11, input_size=(3, 224, 224)) # Here the input size is channel x width x height

### E2.2.3 Modifying the VGG11 network (To do)

In [None]:
for param in model_vgg11.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default

num_ftrs = model_vgg11.classifier[-1].in_features
# To do
# hint: replace the last linear(fc) layer as a new classifier with the correct input and output dimension


model_vgg11 = model_vgg11.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
# To Do
# define the optimizer, choose the correct parameters to be trained
# use lr=0.001, momentum=0.9


# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

### E2.2.4 Printing the modified network

In [None]:
# Print the modified VGG11
print(model_vgg11)

### E2.2.5 Using Torchsummary to show the modified network structure

In [None]:
summary(model_vgg11, input_size=(3, 224, 224)) # Here the input size is channel x width x height

### E2.2.6 Training the modified network

In [None]:
model_vgg11 = train_model(model_vgg11, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=5)

### E2.2.7 Visualizing the predictions

In [None]:
visualize_model(model_vgg11)

plt.ioff()
plt.show()