## How transfer learning works:
<b>Part 1:</b> Train the Backbone
1. Train a CNN (for example Resnet) on a large scale dataset like ImageNet where it learns how to create useful feature maps.
    - This reduces image dimensions and increases the number of channels
2. Do some pooling on the feature maps, then flatten and input into fully connected layers that predict the 1000 categories of ImageNet

<b>Part 2:</b> Use Backbone to form final model.
1. Get rid of the last layer that predicts the 1000 categories of ImageNet and instead make the output just the number of neurons you need.
2. You can freeze the weights of the convolutional layers, if you do then this is called the feature extraction method.
3. Retrain on your data so the final fully connected layers improve.


In [4]:
# Imports

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import torchvision
from torchvision import datasets, models, transforms
import time
import os
import copy

plt.ion()   # interactive mode, damn didnt even know this existed

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use gpu if available

## Data Augmentation Techniques

- transforms.RandomResizedCrop(224):
    - New image is a random crop of original image of size 224x224
- transforms.RandomHorizontalFlip():
    - Image is transformed with a horizontal flip 50% of the time
- transforms.toTensor():
    - Just converts the 3 channel image into a Tensor
- transforms.Normalize():
    - 2 main Benifits:
        - Zero-Centering: Subtracting the mean of the pixel values makes the mean of the data approximately zero. This helps ensure that the network doesn't learn spurious biases based on the overall brightness or color of the images in the dataset.
        - Scaling: Dividing by the standard deviation scales the pixel values, making them have a roughly consistent magnitude. This can help the training process converge faster and be more numerically stable.




In [37]:
mean = np.array([0.485, 0.456, 0.406]) # mean of imagenet dataset
std = np.array([0.229, 0.224, 0.225])  # std of imagenet dataset

dataTransforms = {
    'train': 
        transforms.Compose([
        transforms.RandomResizedCrop(224),          # crop image to 224x224
        transforms.RandomHorizontalFlip(),          # flip image horizontally
        transforms.ToTensor(),                      # convert image to tensor
        transforms.Normalize(mean, std)]),          # normalize image
        
    'val': 
        transforms.Compose([
        transforms.Resize(256),                     # resize image to 256x256
        transforms.CenterCrop(224),                 # crop image to 224x224
        transforms.ToTensor(),                      # convert image to tensor
        transforms.Normalize(mean, std)]),          # normalize image
    
    'test': 
        transforms.Compose([
        transforms.Resize(256),                     # resize image to 256x256
        transforms.CenterCrop(224),                 # crop image to 224x224
        transforms.ToTensor(),                      # convert image to tensor
        transforms.Normalize(mean, std)]),          # normalize image
}

In [39]:
data_dir = 'dataset'
sets = ['train', 'val', 'test']

Image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), dataTransforms[x]) for x in sets} # create datasets

print(Image_datasets['train']) # Check if it worked

Dataset ImageFolder
    Number of datapoints: 600
    Root location: dataset\train
    StandardTransform
Transform: Compose(
               RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=bilinear, antialias=warn)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
               Normalize(mean=[0.485 0.456 0.406], std=[0.229 0.224 0.225])
           )
