# TC 5033
## Deep Learning
## Transfer Learning

<br>

#### Activity 2c: Exploring Transfer Learning with CIFAR-10
<br>

- Objective:

    In this activity, you'll study the concept of Transfer Learning, a powerful technique to improve the performance of your models by leveraging pre-trained architectures. The provided notebook offers a complete solution using a specific pre-trained model on the CIFAR-10 dataset. Your task is to extend this by trying out two other pre-trained models.
    
- Instructions:

    This activity should be submitted in the same format as previous activities. Remember to include the names of all team members in a markdown cell at the beginning of the notebook. The grade obtained in this notebook will be averaged with that of Activity 2b, for the grade of Activity 2.    

    Study the Provided Code: The provided notebook has a complete Transfer Learning solution using a particular pre-trained model. Make sure you understand the flow of the code and the role of each component.

    Select Two Other Pre-trained Models: Choose two different pre-trained models available in PyTorch's model zoo.

    Apply Transfer Learning: Add cells to implement Transfer Learning using the two models you've chosen. Train these models on the CIFAR-10 dataset.

    Evaluation: After training, evaluate your models' performance. Compare the results with the provided solution and try to interpret why there might be differences.

    Documentation: In a markdown cell, summarize your findings. Include any challenges you faced, how you overcame them, and any interesting insights you gained from comparing the different pre-trained models.

- Note:

    Although the provided code serves as a guide, you're encouraged to implement the new solutions on your own. The goal is to reinforce your understanding of Transfer Learning and how to apply it effectively.


#Lead Professor
###Dr. Jose Antonio Cantoral-Ceballos
###Asistant Professor
###Dr. Carlos Villaseñor
#Team 40
#Members:
####Iossif Moises Palli Laura A01794457
####Astrid Rosario Bernaga Torres A01793080
####Cecilia Acevedo Rodríguez A01793953
####Fredy Reyes Sánchez A01687370

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as datasets
import torchvision.transforms as T
from torchvision import models
import matplotlib.pyplot as plt

### Download dataset

In [None]:
# DATA_PATH = '/media/pepe/DataUbuntu/Databases/cifar-10/cifar-10-batches-py'
DATA_PATH = '/home/pepe/Documents/github_repos/datasets/cifar-10-batches-py'

# Number of training examples to be used
NUM_TRAIN = 45000
# Batch size for training
MINIBATCH_SIZE = 64
# Transformation of images using ImageNet values (scaling and normalization)
transform_imagenet = T.Compose([
                T.Resize(224),
                T.ToTensor(),
                T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
            ])

# Image transformation for CIFAR-10 (only normalization)
transform_cifar = T.Compose([
                T.ToTensor(),
                T.Normalize([0.491, 0.482, 0.447], [0.247, 0.243, 0.261])
            ])

# Training set loader
cifar10_train = datasets.CIFAR10(DATA_PATH, train=True, download=True,
                             transform=transform_imagenet)
train_loader = DataLoader(cifar10_train, batch_size=MINIBATCH_SIZE,
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

# Validation set loader
cifar10_val = datasets.CIFAR10(DATA_PATH, train=True, download=True,
                           transform=transform_imagenet)
val_loader = DataLoader(cifar10_val, batch_size=MINIBATCH_SIZE,
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, len(cifar10_val))))

# Testing set loader
cifar10_test = datasets.CIFAR10(DATA_PATH, train=False, download=True,
                            transform=transform_imagenet)
test_loader = DataLoader(cifar10_test, batch_size=MINIBATCH_SIZE)

### Using  GPUs

In [None]:
# Check if a CUDA-compatible GPU is available
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print(device)

### Display images

In [None]:
# List of class labels for CIFAR-10 dataset
classes = ['Plane', 'Car', 'Bird', 'Cat', 'Deer','Dog', 'Frog', 'Horse', 'Ship', 'Truck']

# Function to plot and display an image
def plot_figure(image):
    plt.imshow(image.permute(1,2,0))
    plt.axis('off')
    plt.show()

# Randomly sample an image index from the test data
rnd_sample_idx = np.random.randint(len(test_loader))
print(f'La imagen muestreada representa un: {classes[test_loader.dataset[rnd_sample_idx][1]]}')
image = test_loader.dataset[rnd_sample_idx][0]
image = (image - image.min()) / (image.max() -image.min() )
plot_figure(image)

### Calculate accuracy


In [None]:
def accuracy(model, loader):
    num_correct = 0
    num_total = 0
    model.eval()
    model = model.to(device=device)
    with torch.no_grad():
        for (xi, yi) in loader:
            xi = xi.to(device=device, dtype = torch.float32)
            yi = yi.to(device=device, dtype = torch.long)
            scores = model(xi) # mb_size, 10
            _, pred = scores.max(dim=1) #pred shape (mb_size )
            num_correct += (pred == yi).sum() # pred shape (mb_size), yi shape (mb_size, 1)
            num_total += pred.size(0)
        return float(num_correct)/num_total

### Preloaded charging model

In [None]:
model_resnet18 = models.resnet18(pretrained=True)

### Explore the model

In [None]:
for i, w in enumerate(model_resnet18.parameters()):
    print(i, w.shape, w.requires_grad)

In [None]:
model_resnet18

### Fit into our model

In [None]:
model_aux = nn.Sequential(*list(model_resnet18.children()))
model_aux

In [None]:
model_aux = nn.Sequential(*list(model_resnet18.children())[:-1])

In [None]:
model_aux

In [None]:
for i, parameter in enumerate(model_aux.parameters()):
    parameter.requires_grad = False

In [None]:
for i, parameter in enumerate(model_aux.parameters()):
    print(i, parameter.requires_grad)

### Training loop

In [None]:
def train(model, optimiser, epochs=100):
#     def train(model, optimiser, scheduler = None, epochs=100):
    model = model.to(device=device)
    for epoch in range(epochs):
        for i, (xi, yi) in enumerate(train_loader):
            model.train()
            xi = xi.to(device=device, dtype=torch.float32)
            yi = yi.to(device=device, dtype=torch.long)
            scores = model(xi)

            cost = F.cross_entropy(input= scores, target=yi)

            optimiser.zero_grad()
            cost.backward()
            optimiser.step()

        acc = accuracy(model, val_loader)
#         if epoch%5 == 0:
        print(f'Epoch: {epoch}, costo: {cost.item()}, accuracy: {acc},')
#         scheduler.step()

In [None]:
hidden1 = 256
hidden = 256
lr = 5e-4
epochs = 3
# model1 = nn.Sequential(nn.Flatten(),
#                        nn.Linear(in_features=32*32*3, out_features=hidden1), nn.ReLU(),
#                        nn.Linear(in_features=hidden1, out_features=hidden), nn.ReLU(),
#                        nn.Linear(in_features=hidden, out_features=10))

model1 = nn.Sequential(model_aux,
                       nn.Flatten(),
                       nn.Linear(in_features=512, out_features= 10, bias= True))
optimiser = torch.optim.Adam(model1.parameters(), lr=lr, betas=(0.9, 0.999))

# train(model1, optimiser, epochs)

In [None]:
model1

In [None]:
train(model1, optimiser, epochs)

In [None]:
accuracy(model1, test_loader)

### Comparative models

In [None]:
#Load Model VGG16
#Select other pretrained models from PyTorch's model zoo.
#I will use VGG16 and SQUEEZENET pretrained models
model_vgg16 = models.vgg16(pretrained=True)
model_squeezenet = models.squeezenet1_1(pretrained=True)

### Model VGC16

In [None]:
#Apply transfer learning for VGG16
# Modify the output layer for VGG-16
model_vgg16.classifier[6] = nn.Linear(4096, 10)

In [None]:
#Fine-Tuning: Fine-tuning involves updating the weights of the pre-trained layers while keeping the initial layers (feature extraction layers) frozen.
#This allows the model to adapt to the specific dataset.
#We are able to achieve this by setting the requires_grad attribute of specific layers to True.

for param in model_vgg16.features.parameters():
    param.requires_grad = False

In [None]:
train(model_vgg16, optimiser, epochs)

In [None]:
accuracy(model_vgg16, test_loader)

##MODEL SQUEEZENET

In [None]:
#Apply transfer learning for Squeezenet
# Modify the output layer for Squeezenet
model_squeezenet.classifier[1] = nn.Conv2d(512, 10, kernel_size=(1, 1))

In [None]:
#Fine-Tuning: Fine-tuning involves updating the weights of the pre-trained layers while keeping the initial layers (feature extraction layers) frozen.
#This allows the model to adapt to the specific dataset.
#We are able to achieve this by setting the requires_grad attribute of specific layers to True.

for param in model_squeezenet.features.parameters():
    param.requires_grad = False

In [None]:
train(model_squeezenet, optimiser, epochs)

In [None]:
accuracy(model_squeezenet, test_loader)

#Comparative and comments of this models

**Model: ResNet-18**
- Architecture: ResNet-18 is a relatively shallow model compared to VGG16 and SqueezeNet. It has 18 layers.
- Number of Parameters: ResNet-18 has a moderate number of parameters.
- Pre-trained Dataset: Trained on the ImageNet dataset, which is a large and diverse dataset.
- Performance on CIFAR-10: Achieved an accuracy of around 79.7% on the CIFAR-10 dataset after three epochs.
- Training Time: The training time for ResNet-18 is relatively fast, thanks to its smaller size.
- Overfitting: Shows no signs of overfitting as accuracy on the validation set remains consistent.
- Fine-Tuning: This model is relatively easy to fine-tune for your specific task.

**Model: VGG16**
- Architecture: VGG16 is a deep model with 16 weight layers (13 convolutional layers and 3 fully connected layers).
- Number of Parameters: VGG16 has a large number of parameters, which may increase the risk of overfitting.
- Pre-trained Dataset: Also trained on ImageNet, offering a good feature extraction base.
- Performance on CIFAR-10: Achieved an accuracy of around 7.9% on the CIFAR-10 dataset, which is significantly lower than the other models.
- Training Time: Training a large model like VGG16 requires more time due to its depth.
- Overfitting: VGG16 seems to overfit the training data, as the accuracy on the validation set is much lower.
- Fine-Tuning: Fine-tuning a model with a large number of parameters may require careful regularization.

**Model: SqueezeNet**
- Architecture: SqueezeNet is a lightweight model with a small number of parameters.
- Number of Parameters: SqueezeNet has a very low number of parameters compared to the other models.
- Pre-trained Dataset: Also pre-trained on ImageNet, providing good feature extraction capabilities.
- Performance on CIFAR-10: Achieved an accuracy of around 10.07% on the CIFAR-10 dataset.
- Training Time: Training SqueezeNet is fast due to its small size.
- Overfitting: The model might not be prone to severe overfitting, given the consistent validation accuracy.
- Fine-Tuning: SqueezeNet is relatively easy to fine-tune, given its smaller size.

**Conclusion:**
- ResNet-18 offers a good balance of accuracy and training time. It's a suitable choice for various tasks and provides robust performance.
- VGG16, while powerful, may not be the best choice for the CIFAR-10 dataset due to its tendency to overfit. It may require more complex regularization techniques.
- SqueezeNet is lightweight and fast to train, making it a practical choice for scenarios with limited computational resources.