# Deep Learning Model Architecture Exploration and Performance Evaluation

#### See the `data_EDA_and_CML_benchmarking.ipynb` notebook for parts 1 and 2, which include the deep learning dataset preparation and CML benchmarking, respectively

## 3.1 Model Architecture Exploration: Justification

##### Overall, the performances of the initial four deep learning models implemented in the `data_EDA_and_CML_benchmarking.ipynb` notebook, which included FCN, CNN, ResNet, and RNN, were poor. Among them, the CNN had the highest accuracy, exceeding 25%. While this value is still low, we will focus on implementing architectures that utilize CNNs, focusing on the three architectures listed below:  

1. VGG16 with Fine-Tuning (a deep CNN)
* *Why?* A VGG16 is a deep CNN with 16 layers that excels at deep feature extraction, effectively capturing complex visual features through small 3x3 convolutional filters. By using pre-trained weights on ImageNet and fine-tuning them on the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset, VGG16 can adapt to our specific classification task, improving performance even with limited data, as the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset is relatively small. The VGG16's depth and fine-tuning capabilities help overcome the low accuracy of initial models by learning more intricate patterns specific to our ice crystal images.

2. InceptionV3 (a different variation of a deep CNN)
* *Why?* This architecture excels at multi-scale feature learning, utilizing Inception modules to process multiple convolutional filter sizes in parallel, capturing visual information at different scales within the same layer. Despite its depth, InceptionV3 is computationally efficient due to techniques like factorized convolutions and dimension reductions, making it suitable for complex datasets without excessive computational cost. Its advanced architecture can extract richer and more diverse features than simpler models, potentially leading to significant improvements in classification accuracy on the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset.

3. Convolutional Recurrent Neural Network (CRNN) with Attention Mechanism (a hyrbid of CNN and RNN)
* *Why?* CRNN integrates Convolutional Neural Networks for spatial feature extraction with Recurrent Neural Networks (like LSTM or GRU) to capture sequential or temporal dependencies in the data. Incorporating attention layers enables the model to focus on the most relevant parts of the input images, enhancing its ability to learn important features and improving classification results. Lastly, this architecture offers a novel solution that goes beyond standard models, potentially capturing complex patterns and relationships in our ice crystal images that previous models may have missed.

#### By using these DL architectures, we will address the low performance of the initial DL models by leveraging deeper networks, advanced feature extraction techniques, and innovative combinations of neural network types tailored to our image classification task. 

## 3.2 Imports and Environment Setup

In [1]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import time
import os

In [2]:
# TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import (Dense, Dropout, Flatten, Conv2D, MaxPooling2D, 
                                     GlobalAveragePooling2D, Input, SimpleRNN, LSTM, TimeDistributed, 
                                     Bidirectional, Attention)
from tensorflow.keras.applications import VGG16, InceptionV3
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.losses import Loss
from tensorflow.keras.preprocessing.image import smart_resize

In [3]:
# Sklearn for metrics
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score, 
                             f1_score, precision_score, recall_score, mean_squared_error, roc_curve, auc)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [4]:
# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

## 3.3 Data Loading and Preprocessing
##### organized using a `DatasetLoader` class

In [6]:
# Define a class for loading and preprocessing the dataset
class DatasetLoader:
    def __init__(self, file_path):
        self.file_path = file_path

    def load_data(self):
        # Load the dataset using xarray
        ds = xr.open_dataset(self.file_path)
        images = ds['image_array'].values  # Shape: (samples, height, width)
        labels = ds['label'].values        # Shape: (samples,)
        return images, labels

    def preprocess_data(self, images, labels):
        # Encode string labels into integers
        label_encoder = LabelEncoder()
        labels_encoded = label_encoder.fit_transform(labels)
        num_classes = len(np.unique(labels_encoded))

        # One-hot encode the labels
        labels_one_hot = to_categorical(labels_encoded, num_classes)

        # Expand dimensions of images for channels (grayscale images)
        images_expanded = np.expand_dims(images, axis=-1)  # Shape: (samples, height, width, 1)

        # Normalize images to [0, 1]
        images_normalized = images_expanded / 255.0

        return images_normalized, labels_one_hot, labels_encoded, num_classes, label_encoder

    def split_data(self, images, labels_encoded, labels_one_hot):
        # Split data into training, validation, and test sets
        X_train, X_temp, y_train_encoded, y_temp_encoded, y_train_one_hot, y_temp_one_hot = train_test_split(
            images, labels_encoded, labels_one_hot, test_size=0.2, random_state=42, stratify=labels_encoded)
        X_val, X_test, y_val_encoded, y_test_encoded, y_val_one_hot, y_test_one_hot = train_test_split(
            X_temp, y_temp_encoded, y_temp_one_hot, test_size=0.5, random_state=42, stratify=y_temp_encoded)

        return (X_train, y_train_encoded, y_train_one_hot), \
               (X_val, y_val_encoded, y_val_one_hot), \
               (X_test, y_test_encoded, y_test_one_hot)

# Instantiate the DatasetLoader and load the data
data_loader = DatasetLoader('/Users/valeriagarcia/Desktop/ESS569_Snowflake_Classification/PHIPS_CrystalHabitAI_Dataset.nc')
images, labels = data_loader.load_data()
images, labels_one_hot, labels_encoded, num_classes, label_encoder = data_loader.preprocess_data(images, labels)
(X_train, y_train_encoded, y_train_one_hot), \
(X_val, y_val_encoded, y_val_one_hot), \
(X_test, y_test_encoded, y_test_one_hot) = data_loader.split_data(images, labels_encoded, labels_one_hot)

## 3.4 Data Augmentation
##### Here, we create a data augmentation generator (`train_datagen`) for the training data that applies random transformations—including rotations up to 20 degrees, horizontal and vertical shifts up to 10% of the image size, horizontal and vertical flips, zooms up to 10%—to enhance the diversity of the dataset during training.

In [7]:
# Define data augmentation for training data
train_datagen = ImageDataGenerator(
    rescale=1.0,  # Images are already normalized
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=True,
    zoom_range=0.1
)

# No augmentation for validation and test data, only rescaling
val_datagen = ImageDataGenerator(rescale=1.0)
test_datagen = ImageDataGenerator(rescale=1.0)

# Create data generators
train_generator = train_datagen.flow(X_train, y_train_one_hot, batch_size=32)
val_generator = val_datagen.flow(X_val, y_val_one_hot, batch_size=32)
test_generator = test_datagen.flow(X_test, y_test_one_hot, batch_size=32, shuffle=False)


## 3.5 Custom Physics-Informed Loss Function
