#Machine Learning Assignment 4
## Applied Deep Learning
### 1 CNN for Fish Species Classification
Note: data is from the Kaggle Nature Conservancy Fisheries Monitoring challenge (https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring)

#### 1.1 Model Training
##### 1.1.1 Tasks

In [1]:
# reference: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
from __future__ import print_function 
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

PyTorch Version:  1.4.0
Torchvision Version:  0.5.0


In [2]:
# Top level data directory. Here we assume the format of the directory conforms 
#   to the ImageFolder structure
data_dir = "Part1_images"

# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "resnet"

# Number of classes in the dataset
num_classes = 3

(a) Training a Resnet18 model from scratch:

Below is the code that shows how to run the resnet18 model from scratch, but using the functions that were built later.
I will not run the code here, since the functions that were built below won't run. The defined functions include ``initialize_model``and ``train_model``.

Initialize the scratch version of the model: specifically, when we apply the scratch version, we set the paramter ``feature_extract = False``, and ``use_pretained = False``.

``scratch_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=False)
scratch_model = scratch_model.to(device)
scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=LR, momentum=Mom)
scratch_criterion = nn.CrossEntropyLoss()
_,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))``

(b) Fine-tuning all layers of the ImageNet pre-trained Resnet18 model:

Similarly, will not run the sample below:

Initialize the fine-tuning version of the model: specifically, when we apply the fine tuning pretrained version, we set the paramter ``feature_extract = False``, and ``use_pretained = True``.

``finetuning_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=True)
finetuning_model = scratch_model.to(device)
finetuning_optimizer = optim.SGD(finetuning_model.parameters(), lr=LR, momentum=Mom)
finetuning_criterion = nn.CrossEntropyLoss()
_,finetuning_hist = train_model(finetuning_model, dataloaders_dict, finetuning_criterion, finetuning_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))``

(c) Using the ImageNet pre-trained Resnet18 model as a fixed feature extractor:

Similarly, will not run the sample below:

Initialize the feature-extraction version of the model: specifically, when we apply the feature extraction version, we set the paramter ``feature_extract = True``, and ``use_pretained = True``.

``fe_model,_ = initialize_model(model_name, num_classes, feature_extract=True, use_pretrained=True)
fe_model = scratch_model.to(device)
fe_optimizer = optim.SGD(fe_model.parameters(), lr=LR, momentum=Mom)
fe_criterion = nn.CrossEntropyLoss()
_,fe_hist = train_model(fe_model, dataloaders_dict, fe_criterion, fe_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))``

##### Before looking at the result, can you speculate how the training will be different among these three ways? Which way do you think is the most appropriate given this specific fishing dataset, and why?

As mentioned above, for a resnet18 model from scratch, we don't use the pre-trained model and set feature_extract to false; for a fine-tuning pretrained resnet18 model, we ues the pre-trained model but set feature_extract to false; for a feature-extraction pretrained resnet18 model, we use the pretrained model and set feature_extract to true. The "feature_extract" here refers to the step of setting Model Parameters’ ``.requires_grad`` attribute, which sets the ``.requires_grad`` attribute of the parameters in the model to True by default. 

When feature extracting, we only want to update the parameters of the last layer, and we do not need to compute the gradients of the parameters that we are not changing, so for efficiency we set the ``.requires_grad`` attribute to False. In other words, if ``feature_extract=True``, we manually set all of the parameter’s ``.requires_grad`` attributes to False by running the defined ``set_parameter_requires_grad`` function. When we are finetuning we can leave all of the ``.required_grad``’s set to the default of True.

##### 1.1.2 Model Training/Fine-tuning:
Initialize model: 

In [3]:
# initiate num_epochs and other parameters. Will be changed later accordingly
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False): 
    since = time.time()

    val_acc_history = []
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

The code below is to help switch between the feature extraction mode and scratch/pretrained-fine-tuning mode. ``.requires_grad``=True for fine-tuning and scratch model, ``.requires_grad``=False for feature extraction model.

In [4]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

Initialize and Reshape the Networks:

In [5]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224
        
    else:
        print("Invalid model name, exiting...")
        exit()
        
    return model_ft, input_size

# Initialize the scratch model:
feature_extract = False
scratch_model, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=False)

# Print the model we just instantiated
print(scratch_model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Detect if we have a GPU available or just to use CPU:

In [6]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cpu


(b) Appropriate data transformation and augmentation:


In [7]:
# Data augmentation and normalization for training:
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
print("Initializing Datasets and Dataloaders...")

Initializing Datasets and Dataloaders...


(c) Loss function and optimizer of your choice:

Below are the model Training and Validation Code. The function trains for the specified number of epochs and after each epoch runs a full validation step. It also keeps track of the best performing model (in terms of validation accuracy), and at the end of training returns the best performing model. After each epoch, the training and validation accuracies are printed.

In [8]:
# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

(d) Training policy of your choice, including number of epochs, batch size, learning
rate, and momentum:

In [9]:
# Batch size for training (change depending on how much memory you have)
batch_size = 8

# Number of epochs to train for 
num_epochs = 15

# Learning rate:
LR = 0.01

# Momentum:
Mom = 0.9


In [10]:
# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}


In [None]:
# Send the model to CPU
scratch_model = scratch_model.to(device)
            
scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=LR, momentum=Mom)
scratch_criterion = nn.CrossEntropyLoss()
_,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are 
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.
params_to_update = scratch_model.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in scratch_model.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in scratch_model.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=LR, momentum=Mom)

Epoch 0/14
----------
train Loss: 1.4540 Acc: 0.3927
val Loss: 1.0243 Acc: 0.4950

Epoch 1/14
----------


#### 1.2 Result 

##### (a) Generate a plot of training and validation loss curves for each of the three training cases from Section 2.1. Mark in the plots the number of epoch where you select as the best model(s), and explain your choice.

In [None]:
# the pretrained fine-tuning model model:
