# Deep Learning for Scene Classification

## Introduction

Welcome to this Deep Learning project focused on scene classification. The primary objective is to build and evaluate Deep Learning models that can accurately classify images into one of the six predefined categories: Buildings, Forests, Glaciers, Mountains, Oceans, and Streets.

In this Jupyter Notebook, you'll find:

- An exploration of the dataset to better understand its structure and contents.
- Pre-processing steps, including image transforms and augmentations, tailored specifically for this dataset.
- Implementation of Deep Learning models from scratch to solve the classification problem.
- Evaluation of models using metrics like Accuracy, Precision, Recall, and F1 Score.
- Visualizations and logs that track model performance during training and validation phases.

This project is implemented locally, and Conda is used for package management, ensuring that all dependencies are correctly set up for anyone who wishes to reproduce this work.

Let's get started!


## Imports

Before diving into the code, it's important to understand the role of each library being imported. Below are the key libraries and their significance in the context of this project.

In [None]:
# PyTorch is an open-source machine learning library used for a variety of tasks,
# but primarily for training deep neural networks.
import torch

# nn is a sub-module in PyTorch that contains useful classes and functions to build neural networks.
import torch.nn as nn

# F is a sub-module in PyTorch that contains useful functions for building neural networks.
import torch.nn.functional as F

# DataLoader is a PyTorch utility for loading and batching data efficiently.
from torch.utils.data import DataLoader

# torchvision contains various utilities, pre-trained models, and datasets specifically
# geared towards computer vision tasks.
import torchvision

# transforms are a set of common image transformations that are often required when
# working with image data.
from torchvision import transforms

# ImageFolder is a utility for loading images directly from a directory structure where
# each sub-directory represents a different class.
from torchvision.datasets import ImageFolder

# random_split is a utility function to randomly split a dataset into non-overlapping
# new datasets of given lengths.
from torch.utils.data import random_split


# SummaryWriter is a PyTorch utility for logging information to be displayed in TensorBoard.
from torch.utils.tensorboard import SummaryWriter

# summary is a PyTorch utility for displaying the summary of a PyTorch model.
from torchinfo import summary

# tqdm is a Python library that adds a progress bar to an iterable object.
from tqdm import tqdm

# Matplotlib is a plotting library that is useful for visualizing data, plotting graphs, etc.
import matplotlib.pyplot as plt

# NumPy is a library for numerical operations and is especially useful for array and
# matrix computations.
import numpy as np


# PIL is a library for image processing.
from PIL import Image

# os is a Python module that provides a portable way of using operating system dependent
import os

# time is a module that provides various time-related functions.
import time

# random is a module that implements pseudo-random number generators for various distributions.
import random

# accuracy_score computes the accuracy classification score.
# confusion_matrix computes confusion matrix to evaluate the accuracy of a classification.
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score

# itertools is a module that provides various functions that work on iterators to produce
from itertools import product

# math is a module that provides access to the mathematical functions.
import math

## Checking CUDA Availability

In deep learning projects, it's common to leverage the power of GPUs for computation. CUDA is a parallel computing platform that allows us to use the GPU for these intensive calculations. The following code snippet checks if CUDA is available on the machine. If CUDA is available, it sets the device to "cuda"; otherwise, it falls back to using the CPU.

In [None]:
# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print the device being used
print(f"Using device: {device}")

## Setting the Random Seed for Reproducibility

For any machine learning experiment, reproducibility is crucial. Setting a random seed ensures that the random numbers generated by our code are the same across different runs, making the results reproducible. In this project, the random seed is set for PyTorch.

In [None]:
SEED = 34
# Set the seed for generating random numbers
torch.manual_seed(SEED)
random.seed(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

## Tensorboard Setup

In [None]:
%load_ext tensorboard

## Setting Constants

In [None]:
# constants

TRAIN_DIR = "data/train_set"
TEST_DIR = "data/test_set"
CLASSES = os.listdir(TRAIN_DIR)
CLASSES_COUNT = len(CLASSES)

## Exploratory Data Analysis (EDA)

### Introduction

The Exploratory Data Analysis (EDA) phase is an essential step in any data science or machine learning project. The primary objective of EDA is to gain insights into the dataset, understand its complexity, and discover underlying patterns, while also identifying outliers and anomalies that could impact the performance of machine learning models. The information gathered during EDA informs feature engineering, data cleaning, and ultimately, model selection and tuning. 

This section of the documentation will walk you through various components of the EDA process. From understanding the basic statistics and quality of the data to employing visual techniques for more complex and informative insights, we'll cover it all. 

Given the image-based nature of the dataset, our EDA will be tailored towards visual data. Specific focus areas will include image size distributions, class distributions, and data quality. By the end of this section, you should have a comprehensive understanding of the data's characteristics and peculiarities, empowering you to make well-informed decisions for the subsequent phases of the project.

### Sections in this EDA:

1. **Data Overview**: Provides a snapshot of the dataset, including its size and dimensions.
2. **Visualization**: Includes various data visualizations to better understand the data's features and labels.
3. **Image Characteristics**: Examines the properties of images in the dataset.


### Data Overview

#### Introduction

In this section, we will take an initial look at the dataset to understand its basic characteristics such as size, dimensions, and type of data. Understanding the data's structure is essential for later stages where more specific analyses and modeling will take place.

#### Key Points

- **Total Number of Train Images**: Number of images in the train dataset.
- **Class Distribution**: The number of images per class.
- **Image Dimensions**: Common or range of dimensions (width x height) among the dataset images.

#### Total Number of Train Images

In [None]:
total_tain_images = sum([len(files) for subdir, dirs, files in os.walk(TRAIN_DIR)])
total_test_images = sum([len(files) for subdir, dirs, files in os.walk(TEST_DIR)])
print(f"Total number of train images: {total_tain_images}")
print(f"Total number of test images: {total_test_images}")

We have a total of 14,034 images in our dataset. Given that this is an educational project, this should be more than sufficient for training a robust model. Large datasets are generally beneficial for deep learning models, providing them with more opportunities to learn nuanced features across different classes. Therefore, data scarcity is not a concern for us in this particular project.

#### Class Distribution

In [None]:
train_labels = os.listdir(TRAIN_DIR)
label_counter = {}
for folder in train_labels:
    folder_path = os.path.join(TRAIN_DIR, folder)
    num_files = len(os.listdir(folder_path))
    label_counter[folder] = num_files

classes = list(label_counter.keys())
count = list(label_counter.values())

plt.figure(figsize=(10, 6))
plt.barh(classes, count, color="blue")

plt.xlabel("Number of Images")
plt.ylabel("Class Label")
plt.title("Class Distribution")
plt.show()

The dataset exhibits a balanced distribution across different classes, which is advantageous for the learning process. This removes the need for techniques such as class balancing during training, simplifying the model development process.

#### Image Dimensions
In this section, we aim to explore the dimensions of the images in our dataset. Understanding the size and format of the images can provide insights into the preprocessing steps needed, as well as help in designing the architecture of the neural network.

In [None]:
image_sizes = []

for class_folder in os.listdir(TRAIN_DIR):
    class_folder_path = os.path.join(TRAIN_DIR, class_folder)

    if os.path.isdir(class_folder_path):
        for image_name in os.listdir(class_folder_path):
            image_path = os.path.join(class_folder_path, image_name)

            with Image.open(image_path) as img:
                width, height = img.size
                image_sizes.append((width, height))

# Compute some statistics
avg_size = np.mean(image_sizes, axis=0)
min_size = np.min(image_sizes, axis=0)
max_size = np.max(image_sizes, axis=0)

print(f"Average Size: {avg_size}")
print(f"Minimum Size: {min_size}")
print(f"Maximum Size: {max_size}")

We found that not all images have the same dimensions (150x150). Therefore, when loading the images, we will apply the corresponding transformation to standardize their size.

In [None]:
atypical_images = []

for class_folder in os.listdir(TRAIN_DIR):
    class_folder_path = os.path.join(TRAIN_DIR, class_folder)

    if os.path.isdir(class_folder_path):
        for image_name in os.listdir(class_folder_path):
            image_path = os.path.join(class_folder_path, image_name)

            with Image.open(image_path) as img:
                width, height = img.size
                if width != 150 or height != 150:
                    atypical_images.append((image_path, (width, height)))


# Display the atypical images
if atypical_images:
    print(f"Atypical image sizes found({len(atypical_images)}):")
    for img_path, size in atypical_images:
        print(f"{img_path}: {size}")
else:
    print("No atypical images found.")

### Visualization



In [None]:
random_classes = random.sample(CLASSES, 4)

plt.figure(figsize=(12, 12))

for idx, random_class in enumerate(random_classes):
    # Obtener una lista de todas las imágenes en una clase específica
    img_list = os.listdir(os.path.join(TRAIN_DIR, random_class))

    # Seleccionar una imagen al azar de la lista
    random_img_name = random.choice(img_list)

    # Ruta a la imagen
    img_path = os.path.join(TRAIN_DIR, random_class, random_img_name)

    # Cargar la imagen
    img = Image.open(img_path)

    # Mostrar la imagen
    plt.subplot(2, 2, idx + 1)
    plt.imshow(img)
    plt.axis("off")
    plt.title(f"Clase: {random_class}")

plt.show()

## Data Preprocessing and Loading

In this section, we deal with the vital aspect of preparing our data. These steps can significantly influence the model's performance.

- **Data Loading**: The ImageFolder class from torchvision is used to load our dataset from the disk.
- **Transformations**: The dataset is transformed to standardize the image size and apply data augmentation techniques such as random horizontal flips and random rotations.
- **Data Splitting**: The dataset is split into training and validation sets. 80% of the data is used for training, and the remaining 20% is used for validation.
- **DataLoader**: Finally, PyTorch's DataLoader is used to create mini-batches of data, which allows for more efficient model training.

By the end of this section, we have DataLoader instances for the training, validation, and testing sets, which can be used to train and evaluate the model.

In [None]:
def get_dataloaders(image_size, batch_size):
    
    # Create transforms for data augmentation
    train_transform = transforms.Compose(
        [
            transforms.Resize((image_size, image_size)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(10),
            transforms.RandomVerticalFlip(p=0.2),
            transforms.ToTensor(),  # Convert PIL Image to PyTorch tensor
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ]
    )

    test_transform = transforms.Compose(
        [
            transforms.Resize((image_size, image_size)),
            transforms.ToTensor(),  # Convert PIL Image to PyTorch tensor
        ]
    )

    # Load the training and test datasets from disk
    train_dataset = ImageFolder("data/train_set", transform=train_transform)
    test_dataset = ImageFolder("data/test_set", transform=test_transform)

    # Split the dataset into training and validation sets
    train_size = int(0.8 * len(train_dataset))
    valid_size = len(train_dataset) - train_size
    train_subset, validation_subset = random_split(train_dataset, [train_size, valid_size])

    # Create DataLoader instances to load data in batches
    train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=4)
    val_loader = DataLoader(validation_subset, batch_size=batch_size, shuffle=False, pin_memory=True, num_workers=4)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, pin_memory=True, num_workers=4)

    return train_loader, val_loader, test_loader

## Hyperparameters Used
This project involves training neural network models with various combinations of hyperparameters to identify the most effective setup for a given task. The main script includes functions for model training, monitoring, and evaluation, and utilizes different sets of hyperparameters for extensive experimentation.

The hyperparameters under consideration include the learning rate, batch size, type of optimizer, image size for data preprocessing, and the number of epochs for training.

The loss function used for the training is Cross-Entropy Loss. It is initialized and moved to the appropriate device (CPU or GPU).


In [None]:
LEARNING_RATES = [0.01, 0.001, 0.0001, ]
BATCH_SIZES = [32, 64, 128]
OPTMIZERS = ['SGD', 'Adam']
IMAGES_SIZES = [32, 64, 128]
EPOCHS = [30, 100]

criterion = nn.CrossEntropyLoss().to(device)

## Utility Functions for Model Training and Evaluation

In this section, we present a set of utility functions designed to facilitate the training, monitoring, and evaluation processes for various neural network architectures. By creating these generic functions, we aim to make the process of experimenting with different architectures more streamlined and comparable.

### Monitoring Function

This function assists in keeping track of important metrics like loss and accuracy during the training and validation phases. The function uses TensorBoard's SummaryWriter to log these metrics for easy visualization.

```python
def monitor_metrics(writer, epoch_num, loss, accuracy, phase):
```

### Training Function

This function handles the training process for each epoch, iterating over batches of data and updating the model parameters.

```python
def train_model(model, criterion, optimizer, train_loader, num_epochs=10):
```

### Running an Epoch

This function is called in each epoch during the training and evaluation phases. It performs forward and backward passes and computes loss and accuracy for the epoch.

```python
def run_epoch(epoch_num, model, loader, criterion, writer, optim=None, do_logging=False, is_training=True):
```

### Hyperparameter Tuning Function

This function is responsible for finding the best model given a set of hyperparameters. It trains models for all possible combinations of learning rates, batch sizes, optimizer types, image sizes, and epochs, and saves the best-performing model based on accuracy.

The `model_generator` functions a crucial role in the hyperparameter tuning process. It is responsible for generating a fresh instance of the neural network model based on certain parameters, such as `image_size`. This allows us to experiment with different model architectures and hyperparameters.

```python
def find_best_model(model_generator):
```

### Evaluation Function

After training, this function evaluates the performance of the model on a test dataset and generates useful metrics such as precision, recall, and F1-score.

```python
def evaluate_model(model, test_loader):
```

In [None]:
def monitor_metrics(writer, epoch_num, loss, accuracy, phase):
    writer.add_scalar(f'{phase} loss', loss, epoch_num)
    writer.add_scalar(f'{phase} accuracy', accuracy, epoch_num)

def run_epoch(epoch_num, model, loader, criterion, writer, optim=None, do_logging=False, is_training=True):
    if is_training:
        model.train()
        epoch_type = "Training"
    else:
        model.eval()
        epoch_type = "Validation"

    epoch_loss = 0.0
    all_labels = []
    all_predictions = []

    with torch.set_grad_enabled(
        is_training
    ):  # set gradient calculation to True or False depending on mode
        for images, labels in loader:
            all_labels.extend(labels.numpy())

            if is_training:
                optim.zero_grad()  # reset the gradients to 0 for all learnable parameters

            predictions = model(images.to(device))  # forward pass
            all_predictions.extend(
                torch.argmax(predictions, dim=1).cpu().numpy()
            )  # get the predicted class with highest probability

            loss = criterion(predictions, labels.to(device))  # compute the loss

            if is_training:
                loss.backward()  # compute the gradients for each learnable parameter
                optim.step()  # update the weights

            epoch_loss += loss.item()  # accumulate the loss for each batch

    avg_loss = epoch_loss / len(loader)  # compute the average loss for the epoch
    accuracy = (
        accuracy_score(all_labels, all_predictions) * 100
    )  # compute the accuracy for the epoch

    if do_logging:
        monitor_metrics(writer, epoch_num, avg_loss, accuracy, epoch_type)

    return avg_loss, accuracy


def train_model(
    model, train_loader, val_loader, criterion, optim, number_epochs, do_logging = False, logging_name = "log", patience=10
):
    best_val_loss = float("inf")
    writer = None

    if do_logging:
        writer = SummaryWriter('runs/' + model.name + "/" + logging_name)

    for epoch in tqdm(range(number_epochs)):
        # Train the model for one epoch
        run_epoch(
            epoch + 1, model, train_loader, criterion, writer, optim, do_logging, is_training=True
        )

        # Evaluate the model against validation set
        test_loss, _ = run_epoch(
            epoch + 1, model, val_loader, criterion, writer, do_logging, is_training=False
        )

        
        # early stopping: if the validation loss does not decrease for 10 consecutive epochs, stop training
        if test_loss < best_val_loss:
            best_val_loss = test_loss
            counter = 0  # Restablecer el contador
        else:
            counter += 1  # Incrementar el contador
            if counter >= patience:
                print("Early stopping")
                break


def train_model_with_hyperparameters(model_generator, lr, batch_size, optimizer_type, image_size, number_of_epochs):
    torch.manual_seed(SEED)
    random.seed(SEED)

    print(f"Training model with lr={lr}, batch_size={batch_size}, optimizer={optimizer_type}, image_size={image_size}, epoch={number_of_epochs}")

    train_loader, valid_loader, _ = get_dataloaders(
        image_size, batch_size
    )

    # Create the model
    model = model_generator(image_size).to(device)

    # Create the optimizer
    if optimizer_type == 'SGD':
        optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    elif optimizer_type == 'Adam':
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    else:
        raise Exception('Invalid optimizer type')
    
    logging_name = f"{model.name}-lr={lr}-ba={batch_size}-opt={optimizer_type}-img={image_size}"

    # Train the model
    train_model(model, train_loader, valid_loader, criterion, optimizer, number_of_epochs, True, logging_name)
    # Evaluate the model
    model_accuracy, _, _, _, _, _ = evaluate_model(model, valid_loader)

    return model, model_accuracy


def find_best_model(model_generator):
    best_accuracy = 0.0
    best_params = {}
    best_model = None


    # use itertools.product to get all possible combinations of hyperparameters
    all_combinations = list(product(LEARNING_RATES, BATCH_SIZES, OPTMIZERS, IMAGES_SIZES, EPOCHS))
    current_combination = 0
    total_combination_runs = len(all_combinations)

    # Iterate over all combinations
    for combination in all_combinations:
        lr, batch_size, optimizer_type, image_size, number_of_epochs = combination

        current_combination += 1
        print(f"Running combination {current_combination}/{total_combination_runs}")
        model, model_accuracy = train_model_with_hyperparameters(model_generator, lr, batch_size, optimizer_type, image_size, number_of_epochs)

        
        print(f"Accuracy: {model_accuracy:.2f}%")

        # Free up GPU memory
        if device.type == 'cuda':
            torch.cuda.empty_cache()

        # Save the best hyperparameters
        if model_accuracy > best_accuracy:
            best_accuracy = model_accuracy
            best_params = {
                "lr": lr,
                "batch_size": batch_size,
                "optimizer_type": optimizer_type,
                "image_size": image_size,
                "number_of_epochs": number_of_epochs,
            }
            best_model = model
            
    print(f"Best accuracy: {best_accuracy}")
    print(f"Best hyperparameters: {best_params}")

    return best_model, best_params


def evaluate_model(model, test_loader):
    all_labels = []
    all_predictions = []
    model.eval()
    with torch.no_grad():
        for _, (images, labels) in enumerate(test_loader):
            all_labels.extend(labels.numpy())

            predictions = model(images.to(device))  # forward pass
            all_predictions.extend(
                torch.argmax(predictions, dim=1).cpu().numpy()
            )  # get the predicted class with highest probability

    accuracy = accuracy_score(all_labels, all_predictions) * 100
    precision = precision_score(all_labels, all_predictions, average="macro")
    recall = recall_score(all_labels, all_predictions, average="macro")
    f1 = f1_score(all_labels, all_predictions, average="macro")

    return accuracy, precision, recall, f1, all_labels, all_predictions

def print_metrics(accuracy, precision, recall, f1, labels, predicions):
    print(f"Accuracy: {accuracy:.2f}%")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1: {f1:.2f}")

    # Compute confusion matrix
    cm = confusion_matrix(labels, predicions)
    print(cm)

## Project Solutions Overview

In this project, we aim to build and evaluate various neural network architectures to solve our specific classification problem. We will approach this task by employing three distinct architectures, each with its unique characteristics and advantages:

### Baseline Model
Our baseline model will be a simple Convolutional Neural Network (CNN) with a minimal number of layers. This model will consist of a few convolutional layers followed by pooling layers and fully connected layers towards the end. The purpose of establishing a baseline is to have a point of reference against which we can measure the performance of more complex models.

### ResNet Architecture
As our second model, we will use the [ResNet (Residual Network)](https://arxiv.org/pdf/1512.03385.pdf) architecture, which is well-known for its excellent performance in image classification tasks. ResNet introduces "skip connections" that bypass one or more layers, allowing for deeper networks without the problem of vanishing gradients. 

### DenseNet Architecture
Lastly, we will use [DenseNet (Densely Connected Convolutional Networks)](https://arxiv.org/pdf/1608.06993.pdf) for our third model. This architecture improves upon ResNet by connecting each layer to every other layer in a feed-forward fashion, thereby increasing computational efficiency and enhancing feature propagation.

---

## Working Modalities

### Data Splitting
The dataset will be partitioned into three sets: training, validation, and testing sets. 

### Hyperparameter Tuning
We will carry out multiple runs with different hyperparameters, using the validation set as our benchmark. After tuning the models to satisfaction on the validation set, they will be locked, and final evaluations will be performed on the test set.

### Evaluation Metrics
To evaluate the models, we will use a range of metrics suitable for multi-class classification problems. These include:
- `accuracy_score`: Overall accuracy of the model.
- `confusion_matrix`: To understand the classification errors.
- `precision_score`: Measures the accuracy of positive predictions.
- `recall_score`: Measures the ability to find all positive instances.
- `f1_score`: Harmonic mean of precision and recall.

By employing these metrics, we aim to provide a comprehensive evaluation of each model's performance.

---

## SimpleCNN: The Baseline Model

### Introduction
SimpleCNN is designed to serve as the baseline model for our image classification problem. It is a straightforward convolutional neural network (CNN) that captures essential features from images while remaining computationally inexpensive. This makes it an ideal starting point to understand the basic performance metrics we can achieve and sets the stage for comparison with more complex architectures.

### Architecture
The architecture is uncomplicated, consisting of three main blocks:

#### Convolutional Layers
- **Conv1**: A 3x3 convolutional layer with 16 filters, followed by a ReLU activation.
- **Conv2**: Another 3x3 convolutional layer with 32 filters, followed by a ReLU activation.
- **Conv3**: A final 3x3 convolutional layer with 64 filters, also followed by a ReLU activation.

Each convolutional layer is accompanied by a max-pooling layer with a size of 2x2, which reduces the dimensions of the image while keeping the important features.

#### Fully Connected Layers
- **FC1**: A fully connected layer with a ReLU activation that outputs to 512 units.
- **FC2**: Another fully connected layer that outputs to the number of classes in the dataset.

### Forward Propagation
The forward propagation steps are intuitive and simple to follow:
1. The image first passes through the three convolutional layers (`conv1`, `conv2`, `conv3`), each followed by a ReLU activation and max-pooling.
2. The output is then flattened and passed through the first fully connected layer (`fc1`) with a ReLU activation.
3. Finally, the output goes through the last fully connected layer (`fc2`) to produce the class scores.

### Why SimpleCNN?
SimpleCNN serves as a no-frills approach to understanding what the bare minimum architecture can achieve. It sets the stage for implementing and comparing more advanced models like ResNet and DenseNet. Although SimpleCNN may not produce state-of-the-art results, it gives us a valuable point of reference for gauging the performance of subsequent architectures.



In [None]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes, img_size):
        super(SimpleCNN, self).__init__()
        self.name = "SimpleCNN"
        
        fc_size = img_size // 8

        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * fc_size * fc_size, 128)
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        x = nn.functional.relu(self.conv1(x))
        x = nn.functional.max_pool2d(x, 2)

        x = nn.functional.relu(self.conv2(x))
        x = nn.functional.max_pool2d(x, 2)

        x = nn.functional.relu(self.conv3(x))
        x = nn.functional.max_pool2d(x, 2)

        x = x.view(x.size(0), -1)

        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)

        return x

In [None]:
LEARNING_RATES = [0.01, 0.001, 0.0001]
BATCH_SIZES = [64, 128, 256]
OPTMIZERS = ['SGD', 'Adam']
IMAGES_SIZES = [32, 64, 128]
EPOCHS = [100]

# this function generates a simple CNN model with the given image size
def generate_simple_cnn_model(image_size):
    return SimpleCNN(CLASSES_COUNT, image_size)

# find the best model using different hyperparameters combination and keep the best one (based on accuracy of validation set)
best_simple_cnn_model, best_params = find_best_model(generate_simple_cnn_model)

# display the summary of the best model (architecture, parameters, etc.)
summary(best_simple_cnn_model, input_size=(best_params['batch_size'], 3, best_params['image_size'], best_params['image_size']))

# load the test dataset
_, _, test_loader = get_dataloaders(
        best_params['image_size'], 128
    )

# evaluate the best model on the test set
evaluate_model(best_simple_cnn_model, test_loader)

In [None]:
%tensorboard --logdir runs/SimpleCNN --port=6006

## ResNet: Advanced Model for Robust Feature Learning

### Introduction
The ResNet architecture is designed for deep learning tasks where the network depth is crucial for capturing intricate patterns. ResNet employs residual blocks that allow the network to learn from the residual error, which enables the training of very deep networks without the hindrance of vanishing gradients. This architecture is often used for challenging tasks in both image and natural language processing domains.

### Architecture

#### Convolutional Layer and Batch Normalization
- **Conv1**: A 3x3 convolutional layer with 64 filters, with stride 1 and padding 1. Followed by Batch Normalization and ReLU activation.

#### Residual Blocks
Residual Blocks are the heart of the ResNet model. Each block contains:
- **Conv1**: A 3x3 convolutional layer.
- **BatchNorm1**: Followed by Batch Normalization.
- **Conv2**: Another 3x3 convolutional layer.
- **BatchNorm2**: Followed by Batch Normalization.
- **Shortcut**: A shortcut connection that can bypass one or more layers during the forward and backward passes.

ResNet uses several such residual blocks and groups them into four layers:
- **Layer 1**: Consisting of `num_blocks[0]` residual blocks with 64 filters.
- **Layer 2**: Consisting of `num_blocks[1]` residual blocks with 128 filters.
- **Layer 3**: Consisting of `num_blocks[2]` residual blocks with 256 filters.
- **Layer 4**: Consisting of `num_blocks[3]` residual blocks with 512 filters.

Each layer may change the dimensions of its input tensor, typically by down-sampling the spatial dimensions.

#### Fully Connected Layer
- **Linear**: A fully connected layer that outputs to the number of classes. The input dimension is dynamically calculated based on the image size and other architectural details.

### Forward Propagation
1. The input image first passes through an initial convolutional layer (`conv1`), followed by Batch Normalization and ReLU activation.
2. The image then goes through four layers of residual blocks (`layer1`, `layer2`, `layer3`, `layer4`), each having a different number of blocks and filters.
3. Finally, an average pooling layer condenses the feature maps.
4. The output is flattened and passed through the fully connected layer to produce the class scores.

### Why ResNet?
The ResNet architecture's main advantage is its ability to train extremely deep networks by leveraging residual learning. It has proven to be effective in various applications and has set multiple benchmarks in different challenges. It's particularly useful for tasks requiring the model to learn from highly complex and nuanced data.

This robustness makes it an excellent choice for developers and researchers who are looking to push the boundaries of what's achievable with current deep learning technologies.


In [None]:
# Define the Residual Block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        
        # First convolutional layer in the residual block
        # Followed by Batch Normalization
        self.conv1 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False,
        )
        self.bn1 = nn.BatchNorm2d(out_channels)
        
        # Second convolutional layer in the residual block
        # Followed by Batch Normalization
        self.conv2 = nn.Conv2d(
            out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        # Shortcut connection to match dimensions
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(
                    in_channels, out_channels, kernel_size=1, stride=stride, bias=False
                ),
                nn.BatchNorm2d(out_channels),
            )

    def forward(self, x):
        # Forward pass
        out = F.relu(self.bn1(self.conv1(x)))  # First Conv -> BN -> ReLU
        out = self.bn2(self.conv2(out))  # Second Conv -> BN
        out += self.shortcut(x)  # Add the shortcut
        out = F.relu(out)  # Final ReLU
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10, img_size=32):
        super(ResNet, self).__init__()
        self.name = "ResNet"
        self.in_channels = 64

        self.feature_size = img_size // 32

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)

        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)

        last_layer_output = 512 * ((img_size * img_size * 8) // (512 * self.feature_size * self.feature_size))

        # Modify the input dimension for the fully connected layer
        self.linear = nn.Linear(last_layer_output, 512)
        self.linear2 = nn.Linear(512, num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, self.feature_size)
        out = out.view(out.size(0), -1)
        out = F.relu(self.linear(out))
        out = self.linear2(out)
        return out

In [None]:
LEARNING_RATES = [0.001, 0.0001]
BATCH_SIZES = [32, 64]
OPTMIZERS = ['SGD', 'Adam']
IMAGES_SIZES = [32, 64]
EPOCHS = [100]

# this function generates a ResNet model with the given image size
def generate_resnet_model(image_size):
    return ResNet(ResidualBlock, [2, 2, 2, 2], CLASSES_COUNT, image_size)

# find the best model using different hyperparameters combination and keep the best one (based on accuracy of validation set)
best_resnet_model, best_params = find_best_model(generate_resnet_model)

# display the summary of the best model (architecture, parameters, etc.)
summary(best_resnet_model, input_size=(best_params['batch_size'], 3, best_params['image_size'], best_params['image_size']))

# load the test dataset
_, _, test_loader = get_dataloaders(
        best_params['image_size'], 128
    )

# evaluate the best model on the test set
evaluate_model(best_resnet_model, test_loader)

In [None]:
%tensorboard --logdir runs/ResNet --port=6007

## DenseNet: Highly Efficient and Compact Architecture for Deep Learning

### Introduction
DenseNet, or Densely Connected Convolutional Networks, is designed to optimize the flow of information and gradients between layers in deep neural networks. It accomplishes this by connecting each layer's input to the outputs of all preceding layers, ensuring maximum information flow. This architecture is highly efficient and has shown excellent performance in tasks like image classification.

### Architecture

#### Initial Convolutional Layer
- **init_conv**: An initial 7x7 convolutional layer with 64 filters, a stride of 2, and padding of 3. This is followed by Batch Normalization and a ReLU activation function.

#### Dense Blocks
- **DenseBlock**: A key component of DenseNet. It comprises multiple `ConvLayer` units where each unit outputs feature-maps that are used as input for all subsequent layers within the block.

#### Transition Layers
- **TransitionLayer**: These are interspersed between Dense Blocks and reduce the dimensions of the feature maps, helping to control the model's complexity.

#### Fully Connected Layer
- **fc**: A fully connected layer with 512 neurons, followed by another fully connected layer that outputs the number of classes. 

### Forward Propagation
1. The input passes through the `init_conv` layer for initial feature extraction.
2. The processed input is then fed through a sequence of Dense Blocks and Transition Layers: `dense1 -> trans1 -> dense2 -> trans2 -> dense3 -> trans3 -> dense4`.
3. Adaptive Average Pooling condenses the final feature maps into a single vector.
4. This vector is passed through the fully connected layer (`fc`) to output class scores.

### Why DenseNet?
DenseNet's unique architecture makes it very parameter-efficient, mitigates the vanishing gradient problem, and encourages feature reuse, making it a strong choice for various computer vision tasks.


In [None]:
class ConvLayer(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
        super(ConvLayer, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
    
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = F.relu(x)
        return x
    
class DenseBlock(nn.Module):
    def __init__(self, in_channels, growth_rate, num_layers):
        super(DenseBlock, self).__init__()
        self.layers = nn.ModuleList()
        for i in range(num_layers):
            self.layers.append(ConvLayer(in_channels + i * growth_rate, growth_rate))
    
    def forward(self, x):
        outputs = [x]
        for layer in self.layers:
            out = layer(torch.cat(outputs, dim=1))
            outputs.append(out)
        return torch.cat(outputs, dim=1)
    
class TransitionLayer(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(TransitionLayer, self).__init__()
        self.conv = ConvLayer(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
        self.pool = nn.AvgPool2d(2, stride=2)
    
    def forward(self, x):
        x = self.conv(x)
        x = self.pool(x)
        return x

class DenseNet(nn.Module):
    def __init__(self, input_size, num_classes):
        super(DenseNet, self).__init__()
        self.name = "DenseNet"
        self.input_size = input_size
        self.init_conv = ConvLayer(3, 64, kernel_size=7, stride=2, padding=3)
        self.pool = nn.MaxPool2d(3, stride=2, padding=1)
        
        
        self.dense1 = DenseBlock(64, 32, 4)
        self.trans1 = TransitionLayer(192, 96)  # 64 + 4 * 32 = 192 | 192 / 2 = 96
        self.dense2 = DenseBlock(96, 32, 4)
        self.trans2 = TransitionLayer(224, 112)  # 96 + 4 * 32 = 224 | 224 / 2 = 112
        self.dense3 = DenseBlock(112, 32, 4)
        self.trans3 = TransitionLayer(240, 120)  # 112 + 4 * 32 = 240 | 240 / 2 = 120
        self.dense4 = DenseBlock(120, 32, 4)
        
        self.fc = nn.Linear(248, 512)  # 120 + 4 * 32 = 248
        self.fc2 = nn.Linear(512, num_classes)
    
    def forward(self, x):
        x = self.init_conv(x)
        x = self.pool(x)
        
        x = self.dense1(x)
        x = self.trans1(x)
        x = self.dense2(x)
        x = self.trans2(x)
        x = self.dense3(x)
        x = self.trans3(x)
        x = self.dense4(x)
        
        x = F.adaptive_avg_pool2d(x, (1, 1))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc(x))
        x = self.fc2(x)
        
        return x



In [None]:
LEARNING_RATES = [0.001, 0.0001]
BATCH_SIZES = [128, 256]
OPTMIZERS = ['SGD', 'Adam']
IMAGES_SIZES = [64, 128]
EPOCHS = [100]

# this function generates a ResNet model with the given image size
def generate_densenet_model(image_size):
    return DenseNet(CLASSES_COUNT, image_size)

# find the best model using different hyperparameters combination and keep the best one (based on accuracy of validation set)
best_densenet_model, best_params = find_best_model(generate_densenet_model)

# display the summary of the best model (architecture, parameters, etc.)
summary(best_densenet_model, input_size=(best_params['batch_size'], 3, best_params['image_size'], best_params['image_size']))

# load the test dataset
_, _, test_loader = get_dataloaders(
        best_params['image_size'], 128
    )

# evaluate the best model on the test set
evaluate_model(best_densenet_model, test_loader)

In [None]:
%tensorboard --logdir runs/DenseNet --port=6008

In [None]:
%tensorboard --logdir runs --port=6009