# Deep Learning Model Architecture Exploration and Performance Evaluation

#### See the `data_EDA_and_CML_benchmarking.ipynb` notebook for parts 1 and 2, which include the deep learning dataset preparation and CML benchmarking, respectively

## 3.1 Model Architecture Exploration: Justification

##### Overall, the performances of the initial four deep learning models implemented in the `data_EDA_and_CML_benchmarking.ipynb` notebook, which included FCN, CNN, ResNet, and RNN, were poor. Among them, the CNN had the highest accuracy, exceeding 25%. While this value is still low, we will focus on implementing architectures that utilize CNNs, focusing on the three architectures listed below:  

1. VGG16 with Fine-Tuning (a deep CNN)
* *Why?* A VGG16 is a deep CNN with 16 layers that excels at deep feature extraction, effectively capturing complex visual features through small 3x3 convolutional filters. By using pre-trained weights on ImageNet and fine-tuning them on the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset, VGG16 can adapt to our specific classification task, improving performance even with limited data, as the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset is relatively small. The VGG16's depth and fine-tuning capabilities help overcome the low accuracy of initial models by learning more intricate patterns specific to our ice crystal images.

2. InceptionV3 (a different variation of a deep CNN)
* *Why?* This architecture excels at multi-scale feature learning, utilizing Inception modules to process multiple convolutional filter sizes in parallel, capturing visual information at different scales within the same layer. Despite its depth, InceptionV3 is computationally efficient due to techniques like factorized convolutions and dimension reductions, making it suitable for complex datasets without excessive computational cost. Its advanced architecture can extract richer and more diverse features than simpler models, potentially leading to significant improvements in classification accuracy on the `PHIPS_CrystalHabitAI_Dataset.nc` image dataset.

3. Convolutional Recurrent Neural Network (CRNN) with Attention Mechanism (a hyrbid of CNN and RNN)
* *Why?* CRNN integrates Convolutional Neural Networks for spatial feature extraction with Recurrent Neural Networks (like LSTM or GRU) to capture sequential or temporal dependencies in the data. Incorporating attention layers enables the model to focus on the most relevant parts of the input images, enhancing its ability to learn important features and improving classification results. Lastly, this architecture offers a novel solution that goes beyond standard models, potentially capturing complex patterns and relationships in our ice crystal images that previous models may have missed.

#### By using these DL architectures, we will address the low performance of the initial DL models by leveraging deeper networks, advanced feature extraction techniques, and innovative combinations of neural network types tailored to our image classification task. 

## 3.2 Imports and Environment Setup

In [1]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import time
import os

In [2]:
# TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import (Dense, Dropout, Flatten, Conv2D, MaxPooling2D, 
                                     GlobalAveragePooling2D, Input, SimpleRNN, LSTM, TimeDistributed, 
                                     Bidirectional, Attention)
from tensorflow.keras.applications import VGG16, InceptionV3
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.losses import Loss
from tensorflow.keras.preprocessing.image import smart_resize
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Concatenate, Resizing, Reshape, Permute, Multiply, Activation, RepeatVector, Lambda

In [3]:
# Sklearn for metrics
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score, 
                             f1_score, precision_score, recall_score, mean_squared_error, roc_curve, auc)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [4]:
# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

## 3.3 Data Loading and Preprocessing
##### organized using a `DatasetLoader` class

In [5]:
class DatasetLoader:
    def __init__(self, file_path):
        self.file_path = file_path

    def load_data(self):
        # Load the dataset using xarray
        ds = xr.open_dataset(self.file_path)
        images = ds['image_array'].values  # Shape: (samples, height, width)
        labels = ds['label'].values        # Shape: (samples,)
        temps = ds['temperature'].values   # Shape: (samples,)
        return images, labels, temps

    def preprocess_data(self, images, labels):
        # Encode string labels into integers
        label_encoder = LabelEncoder()
        labels_encoded = label_encoder.fit_transform(labels)
        num_classes = len(np.unique(labels_encoded))

        # One-hot encode the labels
        labels_one_hot = to_categorical(labels_encoded, num_classes)

        # Expand dimensions of images for channels (grayscale images)
        images_expanded = np.expand_dims(images, axis=-1)  # Shape: (samples, height, width, 1)

        # Normalize images to [0, 1]
        images_normalized = images_expanded / 255.0

        return images_normalized, labels_one_hot, labels_encoded, num_classes, label_encoder

    def split_data(self, images, labels_encoded, labels_one_hot, temps):
        # First split: training set and temp set
        X_train, X_temp, y_train_encoded, y_temp_encoded, y_train_one_hot, y_temp_one_hot, temp_train, temp_temp = train_test_split(
            images, labels_encoded, labels_one_hot, temps, test_size=0.2, random_state=42, stratify=labels_encoded)

        # Second split: validation set and test set
        X_val, X_test, y_val_encoded, y_test_encoded, y_val_one_hot, y_test_one_hot, temp_val, temp_test = train_test_split(
            X_temp, y_temp_encoded, y_temp_one_hot, temp_temp, test_size=0.5, random_state=42, stratify=y_temp_encoded)

        return (X_train, y_train_encoded, y_train_one_hot, temp_train), \
               (X_val, y_val_encoded, y_val_one_hot, temp_val), \
               (X_test, y_test_encoded, y_test_one_hot, temp_test)

In [6]:
# Instantiate the DatasetLoader and load the data
data_loader = DatasetLoader('/Users/valeriagarcia/Desktop/ESS569_Snowflake_Classification/PHIPS_CrystalHabitAI_Dataset.nc')
images, labels, temps = data_loader.load_data()
images, labels_one_hot, labels_encoded, num_classes, label_encoder = data_loader.preprocess_data(images, labels)

# Split data including temperatures
(X_train, y_train_encoded, y_train_one_hot, temp_train), \
(X_val, y_val_encoded, y_val_one_hot, temp_val), \
(X_test, y_test_encoded, y_test_one_hot, temp_test) = data_loader.split_data(images, labels_encoded, labels_one_hot, temps)

In [7]:
# Check Shapes and Data Types
print("X_train shape:", X_train.shape)
print("y_train_one_hot shape:", y_train_one_hot.shape)
print("temp_train shape:", temp_train.shape)
print("Data type of temp_train:", temp_train.dtype)

X_train shape: (352, 1024, 1360, 1)
y_train_one_hot shape: (352, 11)
temp_train shape: (352,)
Data type of temp_train: float64


## 3.4 Data Augmentation
##### Here, we create a data augmentation generator (`data_generator`) for the training data that applies random transformations—including rotations up to 20 degrees, horizontal and vertical shifts up to 10% of the image size, horizontal and vertical flips, zooms up to 10%—to enhance the diversity of the dataset during training.

In [50]:
def data_generator(images, labels_one_hot, temperatures, batch_size, augment=False):
    datagen = ImageDataGenerator(
        rescale=1.0,
        rotation_range=20 if augment else 0,
        width_shift_range=0.1 if augment else 0,
        height_shift_range=0.1 if augment else 0,
        horizontal_flip=augment,
        vertical_flip=augment,
        zoom_range=0.1 if augment else 0
    )
    
    images = np.array(images)
    labels_one_hot = np.array(labels_one_hot)
    temperatures = np.array(temperatures)
    num_samples = images.shape[0]
    indices = np.arange(num_samples)
    
    while True:
        if augment:
            np.random.shuffle(indices)
        
        for start_idx in range(0, num_samples, batch_size):
            end_idx = min(start_idx + batch_size, num_samples)
            batch_indices = indices[start_idx:end_idx]
            
            x_batch = images[batch_indices]
            y_batch = labels_one_hot[batch_indices]
            temp_batch = temperatures[batch_indices]
            
            x_batch_augmented = np.empty_like(x_batch)
            for i, img in enumerate(x_batch):
                x_batch_augmented[i] = datagen.random_transform(img)
            
            # Ensure correct output structure
            yield (x_batch_augmented, temp_batch), y_batch

# Create data generators
batch_size = 32

train_generator = data_generator(
    X_train, y_train_one_hot, temp_train, batch_size, augment=True
)
val_generator = data_generator(
    X_val, y_val_one_hot, temp_val, batch_size, augment=False
)
test_generator = data_generator(
    X_test, y_test_one_hot, temp_test, batch_size, augment=False
)

In [32]:
# Check if the data generators produce batches with the expected shapes, types, and values

# Fetch a batch from the train generator
(data_batch, temp_batch), label_batch = next(train_generator)

# Verify shapes
assert data_batch.shape == (32, 1024, 1360, 1), f"Data batch shape mismatch: {data_batch.shape}"
assert temp_batch.shape == (32,), f"Temperature batch shape mismatch: {temp_batch.shape}"
assert label_batch.shape == (32, 11), f"Label batch shape mismatch: {label_batch.shape}"

print("Data generator outputs verified successfully.")


Data generator outputs verified successfully.


## 3.5 Physics-Informed Loss Function with Probabilistic Class Likelihoods

##### In the cloud microphysics community, it is well-understood from laboratory studies that different ice crystal habits have a tendency to grow within a specific range of temperatures and relative humidity conditions. An example of the different temperature regimes is provided in Varcie et al. 2024:
* *polycrystalline growth layer* (growth of polycrstals) -  may occur when the ambient temperature is below -18˚C
* *dendritic growth layer* (growth of dendrites) - may occur when temperature is warmer than or equal to -18˚C and less than or equal to -12˚C
* *plate growth layer* (growth of plates) - may occur where temperature is warmer than -12˚C and less than -8˚C
* *needle growth layer* (growth of needles) - may occur where temperatures are warmer than or equal to -8˚C and less than -3˚C

##### As our dataset only contains temperature in the metadata, we will focus on leveraging temperature and the temperature regimes above to create a custom loss function. Note the above temperature ranges refer to temperature layers over which certain ice crystals *may* grow, assuming other conditions, such as high ice/water supersaturations, are met. Moreover, particles that growth at cooler temperatures may still be observed at warmer temperatures due to sedimentation of the particles. 

##### Objective: Incorporate temperature-dependent class probabilities into the loss function to guide the model based on physical principles while allowing for natural variability.

In [10]:
#### Create a mapping from class labels to their corresponding temperatures ####

# Get unique class labels
unique_classes = np.unique(labels_encoded)

# Initialize a dictionary to hold temperatures for each class
class_temperatures = {class_idx: [] for class_idx in unique_classes}

# Populate the dictionary
for idx, class_idx in enumerate(labels_encoded):
    temp = temps[idx]
    class_temperatures[class_idx].append(temp)

#### Calculate the mean and standard deviation for the temperatures in each class ####

# Initialize the dictionary to hold temperature statistics for each class
class_temperature_stats = {}

# Compute mean and standard deviation for each class, ignoring NaNs
for class_idx in unique_classes:
    temps_ = np.array(class_temperatures[class_idx])
    
    # Compute mean and standard deviation while ignoring NaNs
    mean_temp = np.nanmean(temps_)
    std_temp = np.nanstd(temps_)
    
    # Handle case where all temps are NaN
    if np.isnan(mean_temp) or np.isnan(std_temp):
        class_temperature_stats[class_idx] = {'mean': np.nan, 'std': np.nan}
    else:
        class_temperature_stats[class_idx] = {'mean': mean_temp, 'std': std_temp}

  mean_temp = np.nanmean(temps_)
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


In [11]:
# Confirm the number of unique classes
actual_num_classes = len(np.unique(labels_encoded))
print(f"Expected num_classes: {num_classes}, Actual num_classes: {actual_num_classes}")

# Ensure consistency
if actual_num_classes != num_classes:
    raise ValueError("Mismatch between num_classes and the actual number of classes in the dataset!")
else:
    print("Number of classes matches the dataset.")

Expected num_classes: 11, Actual num_classes: 11
Number of classes matches the dataset.


In [12]:
# Print the mapping of integer-encoded labels to original class labels
print("Class Mappings (Integer to Original Label):")
for class_idx, class_label in enumerate(label_encoder.classes_):
    print(f"Class {class_idx}: {class_label}")

# Check the number of samples per class in the dataset
print("\nClass Distribution in Dataset:")
unique_classes, class_counts = np.unique(labels_encoded, return_counts=True)
for class_idx, count in zip(unique_classes, class_counts):
    print(f"Class {class_idx}: {count} samples")


Class Mappings (Integer to Original Label):
Class 0: aggregate
Class 1: bullet_rosette
Class 2: capped_column
Class 3: column
Class 4: dendrite
Class 5: graupel
Class 6: needle
Class 7: plate
Class 8: polycrystal
Class 9: side_plane
Class 10: tiny

Class Distribution in Dataset:
Class 0: 40 samples
Class 1: 40 samples
Class 2: 40 samples
Class 3: 40 samples
Class 4: 40 samples
Class 5: 40 samples
Class 6: 40 samples
Class 7: 40 samples
Class 8: 40 samples
Class 9: 40 samples
Class 10: 40 samples


In [13]:
# Verify class temperature statistics
for class_idx, stats in class_temperature_stats.items():
    if np.isnan(stats['mean']) or np.isnan(stats['std']):
        print(f"Class {class_idx}: Temperature stats are missing (mean/std are NaN).")
    else:
        print(f"Class {class_idx}: Mean Temp = {stats['mean']:.2f}, Std Temp = {stats['std']:.2f}")

# Spot-check the temperature values for a specific class
class_to_check = 0  # Replace with a class index to check
print(f"Temperature values for Class {class_to_check}: {class_temperatures[class_to_check]}")

Class 0: Mean Temp = -11.28, Std Temp = 4.97
Class 1: Temperature stats are missing (mean/std are NaN).
Class 2: Mean Temp = -11.40, Std Temp = 2.89
Class 3: Mean Temp = -13.08, Std Temp = 4.77
Class 4: Mean Temp = -14.17, Std Temp = 2.99
Class 5: Mean Temp = -11.17, Std Temp = 4.74
Class 6: Mean Temp = -3.14, Std Temp = 1.18
Class 7: Mean Temp = -9.33, Std Temp = 4.09
Class 8: Mean Temp = -13.55, Std Temp = 4.28
Class 9: Mean Temp = -9.21, Std Temp = 4.76
Class 10: Mean Temp = -8.36, Std Temp = 3.61
Temperature values for Class 0: [-15.21, -15.28, -15.25, -15.23, -15.23, -15.29, -15.34, -15.3, -15.28, -15.11, -15.01, -14.8, -12.16, -15.15, -15.0, -14.85, nan, -15.25, -17.89, -16.53, -16.33, -16.61, -7.67, -5.09, -4.64, -4.32, nan, nan, nan, -5.64, -5.77, -5.69, -5.23, -5.09, -4.9, -4.82, -4.89, -4.72, -4.79, -10.6]


In [14]:
# Check the percentage of NaN temperatures
nan_count = np.isnan(temps).sum()
total_samples = len(temps)
print(f"Number of NaN temperatures: {nan_count}/{total_samples} ({(nan_count/total_samples)*100:.2f}%)")

# Investigate classes with NaN temperatures
for class_idx, temps_ in class_temperatures.items():
    nan_temps = [temp for temp in temps_ if np.isnan(temp)]
    if nan_temps:
        print(f"Class {class_idx} has {len(nan_temps)} NaN temperature(s).")

Number of NaN temperatures: 111/440 (25.23%)
Class 0 has 4 NaN temperature(s).
Class 1 has 40 NaN temperature(s).
Class 2 has 6 NaN temperature(s).
Class 3 has 12 NaN temperature(s).
Class 5 has 1 NaN temperature(s).
Class 6 has 18 NaN temperature(s).
Class 7 has 6 NaN temperature(s).
Class 9 has 17 NaN temperature(s).
Class 10 has 7 NaN temperature(s).


##### The temperature statistics are all NaN for the bullet rosette group (Class 1)

In [15]:
# The get_expected_probs() function calculates the expected class probabilities for each sample in a batch based on its temperature, using Gaussian distributions derived from the class temperature statistics.
# This function will be called within the physics-informed loss function to obtain the expected probabilities based on temperature, which are then used to compute the physics term (e.g., KL divergence).
# Output: Normalized probabilities for each sample in a batch.

def get_expected_probs(temperature_batch, num_classes):
    # Initialize expected probabilities array
    expected_probs = np.zeros((len(temperature_batch), num_classes), dtype=np.float32)
    
    for i, temp in enumerate(temperature_batch):
        total_prob = 0.0
        probs = np.zeros(num_classes, dtype=np.float32)
        
        # Handle NaN temperatures by assigning uniform probabilities
        if np.isnan(temp):
            # Assign uniform probability if temperature is NaN
            probs[:] = 1.0 / num_classes
        else:
            for class_idx in range(num_classes):
                mean = class_temperature_stats[class_idx]['mean']
                std = class_temperature_stats[class_idx]['std']
                
                # Handle classes with NaN mean or std by assigning uniform probability
                if np.isnan(mean) or np.isnan(std):
                    prob = 1.0 / num_classes
                else:
                    # Gaussian probability density function
                    prob = np.exp(-0.5 * ((temp - mean) / std) ** 2) / (std * np.sqrt(2 * np.pi))
                probs[class_idx] = prob
                total_prob += prob
            
            # Normalize probabilities to sum to 1
            if total_prob > 0:
                probs /= total_prob
            else:
                # If total_prob is zero (unlikely), assign uniform probabilities
                probs[:] = 1.0 / num_classes
        
        expected_probs[i] = probs
    
    return expected_probs

In [62]:
# Combines the categorical cross-entropy loss with a physics-informed KL divergence term.
# This physics-informed loss function will be modified to exclude classes with missing temperature data (e.g., Class 1) from the physics term.

# Define the physics-informed loss function
def physics_informed_loss(y_true, y_pred, temperature):
    print("y_true shape:", y_true.shape)
    print("y_pred shape:", y_pred.shape)
    print("temperature shape:", temperature.shape)

    # Standard categorical cross-entropy loss
    cce = tf.keras.losses.CategoricalCrossentropy()
    loss = cce(y_true, y_pred)

    # Identify samples not belonging to classes with missing temperature stats
    class_indices = tf.argmax(y_true, axis=1)
    valid_class_mask = tf.constant([
        not np.isnan(class_temperature_stats[i]['mean']) for i in range(num_classes)
    ], dtype=tf.bool)
    sample_mask = tf.gather(valid_class_mask, class_indices)
    sample_mask = tf.cast(sample_mask, tf.float32)

    # Compute expected probabilities based on temperature
    expected_probs = tf.numpy_function(
        func=get_expected_probs,
        inp=[temperature, num_classes],
        Tout=tf.float32
    )

    print("Expected probabilities shape:", expected_probs.shape)

    # Compute the physics term (KL divergence)
    kl_divergence = tf.keras.losses.KLDivergence(reduction=tf.keras.losses.Reduction.NONE)
    physics_term = kl_divergence(expected_probs, y_pred)

    print("Physics term shape before masking:", physics_term.shape)

    # Apply the mask to exclude invalid samples
    physics_term = physics_term * sample_mask

    # Compute mean physics term over valid samples
    total_valid_samples = tf.reduce_sum(sample_mask) + 1e-7  # Avoid division by zero
    physics_term = tf.reduce_sum(physics_term) / total_valid_samples

    # Total loss with weighting factor
    lambda_weight = 0.1  # Adjust as needed
    total_loss = loss + lambda_weight * physics_term

    return total_loss


##### Test the loss function with dummy data to ensure it works without errors.

In [17]:
# Set your actual number of classes
num_classes = 11

# Dummy data for testing
y_true_test = tf.one_hot([0, 1], depth=num_classes)
y_pred_test = tf.constant([[0.1]*num_classes, [0.1]*num_classes], dtype=tf.float32)
temp_test = tf.constant([0.0, -5.0], dtype=tf.float32)

# Call the loss function
test_loss = physics_informed_loss(y_true_test, y_pred_test, temp_test)
print("Test loss:", test_loss.numpy())

2024-12-02 15:09:48.467828: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2024-12-02 15:09:48.468834: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB
2024-12-02 15:09:48.468850: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB
2024-12-02 15:09:48.469503: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-12-02 15:09:48.470492: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Test loss: 2.4944966


##### The loss value obtained is reasonable and aligns with the expected calculations given the dummy data. The loss value is relatively high (~2.49) because the predicted probabilities are uniform and do not match the true labels. We have set the predicted probabilities to be uniform across all classes, so predicting the same probability for all classes (especially when incorrect) results in a higher loss.


In [18]:
# Verifying Temperature-Based Loss Function one more time to ensure the function works correctly with model outputs and temperature inputs, using a small batch of dummy data to simulate training and testing scenarios

# Dummy data for verification
dummy_images = np.random.rand(2, 1024, 1360, 1).astype(np.float32)  # 2 grayscale images
dummy_labels = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]])  # One-hot labels
dummy_temps = np.array([0.0, -5.0], dtype=np.float32)  # 2 temperature values

# Simulate model predictions (random probabilities)
dummy_predictions = np.random.rand(2, 11).astype(np.float32)
dummy_predictions /= np.sum(dummy_predictions, axis=1, keepdims=True)  # Normalize to sum to 1

# Compute the loss
dummy_loss = physics_informed_loss(
    tf.constant(dummy_labels, dtype=tf.float32),
    tf.constant(dummy_predictions, dtype=tf.float32),
    tf.constant(dummy_temps, dtype=tf.float32)
)

print("Dummy Loss Value:", dummy_loss.numpy())

Dummy Loss Value: 2.024979


In [None]:
test_loss = physics_informed_loss(
    tf.constant(label_batch),
    tf.constant(np.ones_like(label_batch)),  # Dummy prediction
    tf.constant(temp_batch)
)


##### Lastly, to handle the passing of temperature data to the loss function, we'll use a subclassed model with a custom `train_step`.

In [60]:
class CustomModel(tf.keras.Model):
    def __init__(self, model):
        super(CustomModel, self).__init__()
        self.model = model

    def train_step(self, data):
        print("Incoming data structure in train_step:", type(data), len(data))  # Debugging
        print("Data content in train_step:", data)  # Debugging

        try:
            # Adjust unpacking to handle nested structure
            if len(data) == 3:  # If there are extra elements (e.g., None)
                (x_data, y_batch), _ = data[:2]  # Discard unnecessary parts
            else:
                (x_data, y_batch) = data

            # Further unpack x_data into images and temperature
            x_batch, temp_batch = x_data

            # Debugging: Check shapes and data types
            print("x_batch shape:", x_batch.shape)
            print("temp_batch shape:", temp_batch.shape)
            print("y_batch shape:", y_batch.shape)

            # Proceed with the rest of the train_step logic
            with tf.GradientTape() as tape:
                y_pred = self.model(x_batch, training=True)
                # Compute the loss value
                loss = physics_informed_loss(y_batch, y_pred, temp_batch)

            # Compute gradients
            trainable_vars = self.model.trainable_variables
            gradients = tape.gradient(loss, trainable_vars)

            # Update weights
            self.optimizer.apply_gradients(zip(gradients, trainable_vars))

            # Update metrics
            self.compiled_metrics.update_state(y_batch, y_pred)

            # Return a dict mapping metric names to current value
            return {m.name: m.result() for m in self.metrics}

        except Exception as e:
            print("Error in train_step:", e)
            raise

## 3.6 Model Definitions

##### Here, we will define the DL models to be used (e.g., VGG16 with fine-tuning, InceptionV3, CRNN with Attention) with necessary adjustments to accept temperature data where needed.

#### A. VGG16 with Fine-Tuning
**Implementation details:**
* Pre-trained VGG16 Model: Utilize the VGG16 model pre-trained on ImageNet.
* Input Adjustments: Convert grayscale images to RGB by repeating the single channel three times.
* Output Layer: Adjust the final dense layer to match the number of classes.
* Temperature Handling: Temperature data is not fed into the model but provided to the loss function during training.

In [20]:
class VGG16Model:
    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape  # Shape: (height, width, channels)
        self.num_classes = num_classes
        self.model = self.build_model()

    def build_model(self):
        # Input layer for images
        inputs = Input(shape=self.input_shape, name='image_input')

        # Convert grayscale to RGB by repeating channels
        x = Concatenate(axis=-1)([inputs, inputs, inputs])  # Shape: (height, width, 3)

        # Load pre-trained VGG16 model without the top layer and wiht pre-trained weights
        base_model = VGG16(weights='imagenet', include_top=False, input_tensor=x)

        # Freeze base model layers for initial training (to keep pre-trained features during training)
        for layer in base_model.layers:
            layer.trainable = False

        # Add custom layers on top (more specifically, adds GlobalAveragePooling2D, a dense layer with 256 units and ReLU activation, and an output layer matching the number of classes with softmax activation)
        x = base_model.output
        x = GlobalAveragePooling2D()(x)
        x = Dense(256, activation='relu')(x)
        outputs = Dense(self.num_classes, activation='softmax', name='output')(x)

        # Construct the model
        model = Model(inputs=inputs, outputs=outputs, name='VGG16Model')

        return model

    def compile_model(self):
        # Compilation will be handled in the training step using a custom training loop
        pass  # No action needed here

In [21]:
# Test VGG16Model

# Initialize VGG16 Model
vgg16_instance = VGG16Model(input_shape=(1024, 1360, 1), num_classes=11)
vgg16_base_model = vgg16_instance.model

# Perform a forward pass with a dummy batch
dummy_output = vgg16_base_model.predict(dummy_images)
assert dummy_output.shape == (2, 11), f"VGG16 output shape mismatch: {dummy_output.shape}"
print("VGG16 forward pass successful.")


2024-12-02 15:09:50.393766: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
VGG16 forward pass successful.


#### B. InceptionV3
**Implementation details:**
* Pre-trained InceptionV3 Model: Utilize the InceptionV3 model pre-trained on ImageNet.
* Input Adjustments: Convert grayscale images to RGB. Also resize images to the expected input size for InceptionV3 (e.g., 299x299).
* Output Layer: Adjust the final dense layer to match the number of classes.
* Temperature Handling: Temperature data is not fed into the model but provided to the loss function during training.

In [22]:
class InceptionV3Model:
    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape  # Original image shape
        self.num_classes = num_classes
        self.model = self.build_model()

    def build_model(self):
        # Input layer for images
        inputs = Input(shape=self.input_shape, name='image_input')

        # Resize images to 299x299 as expected by InceptionV3
        x = Resizing(299, 299)(inputs)

        # Convert grayscale to RGB
        x = Concatenate(axis=-1)([x, x, x])  # Shape: (299, 299, 3)

        # Load pre-trained InceptionV3 model without the top layer
        base_model = InceptionV3(weights='imagenet', include_top=False, input_tensor=x)

        # Freeze base model layers for initial training
        for layer in base_model.layers:
            layer.trainable = False

        # Add custom layers on top
        x = base_model.output
        x = GlobalAveragePooling2D()(x)
        x = Dense(256, activation='relu')(x)
        outputs = Dense(self.num_classes, activation='softmax', name='output')(x)

        # Construct the model
        model = Model(inputs=inputs, outputs=outputs, name='InceptionV3Model')

        return model

    def compile_model(self):
        # Compilation will be handled during training with the custom loss
        pass

In [23]:
# Test InceptionV3Model

# Initialize InceptionV3 Model
inception_instance = InceptionV3Model(input_shape=(1024, 1360, 1), num_classes=11)
inception_base_model = inception_instance.model

# Perform a forward pass with a dummy batch
dummy_output = inception_base_model.predict(dummy_images)
assert dummy_output.shape == (2, 11), f"InceptionV3 output shape mismatch: {dummy_output.shape}"
print("InceptionV3 forward pass successful.")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step
InceptionV3 forward pass successful.


#### C. CRNN with Attention Mechanism
**Implementation details:**
* Convolutional Layers: Extract spatial features from images.
* Recurrent Layers (LSTM): Capture sequential dependencies in the extracted features.
* Attention Mechanism: Enhance the model's focus on relevant features.
* Input Adjustments: Use the grayscale images directly.
* Output Layer: Adjust the final dense layer to match the number of classes.
* Temperature Handling: Temperature data is not fed into the model but provided to the loss function during training.

In [24]:
class CRNNModel:
    def __init__(self, input_shape, num_classes, lstm_units=64):
        self.input_shape = input_shape  # Shape: (height, width, channels)
        self.num_classes = num_classes
        self.lstm_units = lstm_units
        self.model = self.build_model()

    def build_model(self):
        # Input layer for images
        inputs = Input(shape=self.input_shape, name='image_input')

        # Convolutional layers
        x = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
        x = MaxPooling2D(pool_size=(2, 2))(x)

        x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)

        # Prepare data for LSTM
        shape = x.shape
        x = Reshape((shape[1] * shape[2], shape[3]))(x)  # Shape: (batch_size, timesteps, features)

        # LSTM layer
        x = LSTM(self.lstm_units, return_sequences=True)(x)

        # Attention mechanism
        attention = Dense(1, activation='tanh')(x)
        attention = Flatten()(attention)
        attention = Activation('softmax')(attention)
        attention = RepeatVector(self.lstm_units)(attention)
        attention = Permute([2, 1])(attention)
        x = Multiply()([x, attention])
        x = Lambda(lambda xin: K.sum(xin, axis=1))(x)

        # Output layer
        outputs = Dense(self.num_classes, activation='softmax', name='output')(x)

        # Construct the model
        model = Model(inputs=inputs, outputs=outputs, name='CRNNModel')

        return model

    def compile_model(self):
        # Compilation will be handled during training with the custom loss
        pass

In [25]:
# Test CRNNModel

# Initialize CRNN Model
crnn_instance = CRNNModel(input_shape=(1024, 1360, 1), num_classes=11)
crnn_base_model = crnn_instance.model

# Perform a forward pass with a dummy batch
dummy_output = crnn_base_model.predict(dummy_images)
assert dummy_output.shape == (2, 11), f"CRNN output shape mismatch: {dummy_output.shape}"
print("CRNN forward pass successful.")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step
CRNN forward pass successful.


In [61]:
# Verify CustomModel Training Logic
# Use the CustomModel subclass with one architecture to ensure the train_step and test_step execute without errors

# Wrap VGG16 model with CustomModel
vgg16_model = CustomModel(vgg16_base_model)
vgg16_model.compile(optimizer=Adam(learning_rate=1e-3), metrics=['accuracy'])

# Fetch a batch
(data_batch, temp_batch), label_batch = next(train_generator)

# Perform one training step
train_results = vgg16_model.train_on_batch(((data_batch, temp_batch), label_batch))
print("CustomModel training step successful. Results:", train_results)

Incoming data structure in train_step: <class 'tuple'> 3
Data content in train_step: (((<tf.Tensor 'data:0' shape=(32, 1024, 1360, 1) dtype=float64>, <tf.Tensor 'data_1:0' shape=(32,) dtype=float64>), <tf.Tensor 'data_2:0' shape=(32, 11) dtype=float64>), None, None)
x_batch shape: (32, 1024, 1360, 1)
temp_batch shape: (32,)
y_batch shape: (32, 11)
Error in train_step: Cannot take the length of shape with unknown rank.


ValueError: Cannot take the length of shape with unknown rank.

In [None]:
# Perform one testing step
test_results = vgg16_model.test_on_batch(((data_batch, temp_batch), label_batch))
print("CustomModel testing step successful. Results:", test_results)

Incoming data structure in train_step: <class 'tuple'> 3
Data content in train_step: (((<tf.Tensor 'data:0' shape=(32, 1024, 1360, 1) dtype=float64>, <tf.Tensor 'data_1:0' shape=(32,) dtype=float64>), <tf.Tensor 'data_2:0' shape=(32, 11) dtype=float64>), None, None)
Error unpacking data in train_step: too many values to unpack (expected 2)


ValueError: too many values to unpack (expected 2)

## 3.7 Compliling and Training the Models