# Aerosol Optical Thickness Prediction Neural Network

+ Work realized by Bernardo Vitorino and João Condeço.

## Data Caracterization

### Data loading

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv("datasets/train.csv")

print(f'Number of features: {df.shape[1]}')
print(f'Number of instances: {df.shape[0]}')
df.head()

Number of features: 10
Number of instances: 10438


Unnamed: 0,id,elevation,ozone,NO2,azimuth,zenith,incidence_azimuth,incidence_zenith,file_name_l1,value_550
0,1,10,318,0.248,150.6,31.8,286.1,8.0,AAOT_45-3139_12-5083_COPERNICUS_S2_20180807T10...,0.277
1,2,10,302,0.279,161.6,44.2,243.6,3.9,AAOT_45-3139_12-5083_COPERNICUS_S2_20180916T10...,0.201
2,4,10,373,0.303,163.5,34.4,103.9,9.8,AAOT_45-3139_12-5083_COPERNICUS_S2_20190421T10...,0.169
3,5,10,342,0.271,144.7,25.3,286.2,7.9,AAOT_45-3139_12-5083_COPERNICUS_S2_20190623T10...,0.107
4,6,10,327,0.252,140.4,29.4,105.8,7.0,AAOT_45-3139_12-5083_COPERNICUS_S2_20190720T10...,0.188


As we can see the dataset is composed of 10 features and has 10438 instances or observations.

The target feature is the 'value_550', the one we want to be capable of predicting.

Then we can split the features into two groups, the numeric features, and the image feature. The features, 'id' (identification feature, is not important for the training and prediction), 'elevation', 'ozone', 'NO2', 'azimuth', 'zenith', 'incidence_azimuth', and 'incident_zenith' are the numeric features, that will be scaled for a better Neural Network Model training. At last, but not least, the feature file_name_l1 is the name associated to the image from the zone where the other features were measured.

## Data preprocessing 

### Data separation (numerical and images)

In [2]:
from sklearn.preprocessing import StandardScaler
import numpy as np
import os
import tifffile as tiff

# Preprocess numerical data
numerical_features = df[['elevation', 'ozone', 'NO2', 'azimuth', 'zenith', 'incidence_azimuth', 'incidence_zenith']]

# Numerical data scaling
scaler = StandardScaler()
numerical_features = scaler.fit_transform(numerical_features)

# Function to load and preprocess image data
def load_and_preprocess_image(filepath):
    img = tiff.imread(filepath)
    img_array = np.array(img)
    img_array = img_array / 65535.0   # Normalize pixel values
    return img_array

# Load image data
image_data = np.array([load_and_preprocess_image(os.path.join('./train/', filename)) for filename in df['file_name_l1']])

In the data preprocessing section we separate the numerical data from the images data. The numerical data was normalized, using the StandardScaler. Here we also created a function to load and normalize the images and put them in a numpy array.  

### Train and Validation Split

In [3]:
from sklearn.model_selection import train_test_split

# Target variable
target = df['value_550'].values

# Split data into training, validation, and testing sets
X_train_num, X_temp_num, X_train_img, X_temp_img, y_train, y_temp = train_test_split(numerical_features, image_data, target, test_size=0.3, random_state=42)
X_val_num, X_test_num, X_val_img, X_test_img, y_val, y_test = train_test_split(X_temp_num, X_temp_img, y_temp, test_size=0.5, random_state=42)

A split of the dataset into train, test and validation datasets.

## Neural Networks Arquitectures

### First concept

In [4]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Input, Concatenate, BatchNormalization, Dropout, GlobalAveragePooling2D
from tensorflow.keras.regularizers import l2

# Define the CNN and dense model
class AOTModel:
    def __init__(self, image_shape=(19, 19, 13), num_numerical_features=7):
        # Image processing Neural Network
        self.image_input = Input(shape=image_shape)
        image_processing_network = Conv2D(32, (3, 3), activation='relu', kernel_regularizer=l2(0.01))(self.image_input)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = MaxPooling2D((2, 2))(image_processing_network)
        image_processing_network = Dropout(0.25)(image_processing_network)

        image_processing_network = Conv2D(64, (3, 3), activation='relu', kernel_regularizer=l2(0.01))(image_processing_network)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = MaxPooling2D((2, 2))(image_processing_network)
        image_processing_network = Dropout(0.25)(image_processing_network)

        image_processing_network = Conv2D(128, (3, 3), activation='relu', kernel_regularizer=l2(0.01))(image_processing_network)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = GlobalAveragePooling2D()(image_processing_network)
        image_processing_network = Dropout(0.5)(image_processing_network)

        # Numerical processing Neural Network
        self.numerical_input = Input(shape=(num_numerical_features,))
        numerical_processing_network = Dense(128, activation='relu')(self.numerical_input)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)
        
        numerical_processing_network = Dense(64, activation='relu')(numerical_processing_network)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)

        numerical_processing_network = Dense(32, activation='relu')(numerical_processing_network)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)
        
        # Concatenation of both networks
        aot_network = Concatenate()([image_processing_network, numerical_processing_network])
        aot_network = Dense(64, activation='relu')(aot_network)
        aot_network = Dropout(0.5)(aot_network)
        aot_network = Dense(1)(aot_network)

        self.aot_network_arquitecture = aot_network
        del image_processing_network, numerical_processing_network, aot_network

    def model(self):
        model = Model(inputs= [self.image_input, self.numerical_input], outputs=self.aot_network_arquitecture)
        # Compile the model
        model.compile(optimizer='adam', loss='mean_absolute_error', metrics=['mae'])
        return model


2024-06-24 22:11:15.619374: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


We divided the neural network architecture into two parts that later concatenate to produce just one value.

This architecture is designed to handle both image data and numerical data in a single model. The image processing network consists of convolutional layers for feature extraction from images, while the numerical processing network consists of dense layers for handling numerical features. Both networks are combined and followed by additional dense layers to produce a final output, making this a versatile model suitable for tasks that require the integration of multiple data types.

Here the arquitecture is described in more detail:

+ Image Processing Network

    + Input Layer: The model accepts images of shape (19, 19, 13).
    + First Convolutional Block:
        + Conv2D: 32 filters, kernel size (3, 3), ReLU activation, L2 regularization.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + MaxPooling2D: Pool size (2, 2) to reduce spatial dimensions.
        + Dropout: 25% to prevent overfitting.
    + Second Convolutional Block:
        + Conv2D: 64 filters, kernel size (3, 3), ReLU activation, L2 regularization.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + MaxPooling2D: Pool size (2, 2) to reduce spatial dimensions.
        + Dropout: 25% to prevent overfitting.
    + Third Convolutional Block:
        + Conv2D: 128 filters, kernel size (3, 3), ReLU activation, L2 regularization.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + GlobalAveragePooling2D: Reduces each feature map to a single value.
        + Dropout: 50% to prevent overfitting.

+ Numerical Processing Network

    + Input Layer: The model accepts numerical features of shape (7,).
    + First Dense Block:
        + Dense: 128 units, ReLU activation.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.
    + Second Dense Block:
        + Dense: 64 units, ReLU activation.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.
    + Third Dense Block:
        + Dense: 32 units, ReLU activation.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.

+ Combined Network

    + Concatenation Layer: Concatenates the outputs of the image and numerical processing networks.
    + Dense Block:
        + Dense: 64 units, ReLU activation.
        + Dropout: 50% to prevent overfitting.
    + Output Layer:
        + Dense: 1 unit (presumably for regression tasks).

+ Model Compilation

    + Optimizer: Adam.
    + Loss Function: Mean Absolute Error (MAE).
    + Metrics: Mean Absolute Error (MAE).

### Arquitecture Optimization

In [5]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Input, Concatenate, BatchNormalization, Dropout, GlobalAveragePooling2D, Add, LeakyReLU
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, LearningRateScheduler
from tensorflow.keras.initializers import HeNormal, HeUniform
import tensorflow as tf
from tensorflow.keras.optimizers import Adam, RMSprop, Nadam

# Define the CNN and dense model
class OptimizedAOTModel:
    def __init__(self, image_shape=(19, 19, 13), num_numerical_features=7):
        # Image processing Neural Network
        self.image_input = Input(shape=image_shape)
        initializer = HeUniform()
        
        image_processing_network = Conv2D(32, (3, 3), kernel_regularizer=l2(0.01), kernel_initializer=initializer)(self.image_input)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = LeakyReLU()(image_processing_network)
        image_processing_network = MaxPooling2D((2, 2))(image_processing_network)
        image_processing_network = Dropout(0.25)(image_processing_network)

        image_processing_network = Conv2D(64, (3, 3), kernel_regularizer=l2(0.01), kernel_initializer=initializer)(image_processing_network)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = LeakyReLU()(image_processing_network)
        image_processing_network = MaxPooling2D((2, 2))(image_processing_network)
        image_processing_network = Dropout(0.25)(image_processing_network)

        image_processing_network = Conv2D(128, (3, 3), kernel_regularizer=l2(0.01), kernel_initializer=initializer)(image_processing_network)
        image_processing_network = BatchNormalization()(image_processing_network)
        image_processing_network = LeakyReLU()(image_processing_network)
        image_processing_network = GlobalAveragePooling2D()(image_processing_network)
        image_processing_network = Dropout(0.5)(image_processing_network)

        # Numerical processing Neural Network
        self.numerical_input = Input(shape=(num_numerical_features,))
        numerical_processing_network = Dense(64, activation='relu', kernel_initializer=initializer)(self.numerical_input)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)
        
        numerical_processing_network = Dense(128, activation='relu', kernel_initializer=initializer)(numerical_processing_network)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)

        numerical_processing_network = Dense(64, activation='relu', kernel_initializer=initializer)(numerical_processing_network)
        numerical_processing_network = BatchNormalization()(numerical_processing_network)
        numerical_processing_network = Dropout(0.5)(numerical_processing_network)
        
        # Concatenation of both networks
        aot_network = Concatenate()([image_processing_network, numerical_processing_network])
        aot_network = Dense(64, activation='relu', kernel_initializer=initializer)(aot_network)
        aot_network = Dropout(0.5)(aot_network)
        aot_network = Dense(1, kernel_initializer=initializer)(aot_network)

        self.aot_network_arquitecture = aot_network
        del image_processing_network, numerical_processing_network, aot_network

    def model(self, learning_rate=0.001, optimizer_choice='adam'):
        model = Model(inputs=[self.image_input, self.numerical_input], outputs=self.aot_network_arquitecture)
        
        if optimizer_choice == 'adam':
            optimizer = Adam(learning_rate=learning_rate, clipnorm=1.0)
        elif optimizer_choice == 'rmsprop':
            optimizer = RMSprop(learning_rate=learning_rate, clipnorm=1.0)
        elif optimizer_choice == 'nadam':
            optimizer = Nadam(learning_rate=learning_rate, clipnorm=1.0)
        else:
            optimizer = Adam(learning_rate=learning_rate, clipnorm=1.0)
        
        model.compile(optimizer=optimizer, loss='mean_absolute_error', metrics=['mae'])
        return model

The optimization architecture is similar to the first concept. It enhances the initial multi-input model by incorporating advanced initialization techniques (HeUniform), alternative activation functions (LeakyReLU), and gradient clipping. These modifications aim to improve the training stability and performance of the model. The image processing network uses convolutional layers for feature extraction, while the numerical processing network uses dense layers. Both networks are concatenated and followed by additional dense layers to produce the final output, making this a robust model suitable for tasks requiring the integration of image and numerical data. Here is a more detailed description:

+ Image Processing Network

    + Input Layer: The model accepts images of shape (19, 19, 13).
    + First Convolutional Block:
        + Conv2D: 32 filters, kernel size (3, 3), L2 regularization, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + LeakyReLU: Activation function.
        + MaxPooling2D: Pool size (2, 2) to reduce spatial dimensions.
        + Dropout: 25% to prevent overfitting.
    + Second Convolutional Block:
        + Conv2D: 64 filters, kernel size (3, 3), L2 regularization, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + LeakyReLU: Activation function.
        + MaxPooling2D: Pool size (2, 2) to reduce spatial dimensions.
        + Dropout: 25% to prevent overfitting.
    + Third Convolutional Block:
        + Conv2D: 128 filters, kernel size (3, 3), L2 regularization, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs of the convolution.
        + LeakyReLU: Activation function.
        + GlobalAveragePooling2D: Reduces each feature map to a single value.
        + Dropout: 50% to prevent overfitting.

+ Numerical Processing Network

    + Input Layer: The model accepts numerical features of shape (7,).
    + First Dense Block:
        + Dense: 64 units, ReLU activation, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.
    + Second Dense Block:
        + Dense: 128 units, ReLU activation, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.
    + Third Dense Block:
        + Dense: 64 units, ReLU activation, HeUniform initializer.
        + BatchNormalization: Normalizes the outputs.
        + Dropout: 50% to prevent overfitting.

+ Combined Network

    + Concatenation Layer: Concatenates the outputs of the image and numerical processing networks.
    + Dense Block:
        + Dense: 64 units, ReLU activation, HeUniform initializer.
        + Dropout: 50% to prevent overfitting.
    + Output Layer:
        + Dense: 1 unit, HeUniform initializer (presumably for regression tasks).

+ Model Compilation

    + Optimizer: Configurable (Adam, RMSprop, Nadam) with a learning rate of 0.001 and gradient clipping (clipnorm=1.0).
    + Loss Function: Mean Absolute Error (MAE).
    + Metrics: Mean Absolute Error (MAE).

+ Additional Techniques

    + HeUniform Initializer: Used for initializing the weights, which can help in training deep networks by maintaining a better distribution of weights.
    + LeakyReLU: Used instead of ReLU to allow a small gradient when the unit is not active, which can help in training deep networks by preventing dead neurons.
    + Gradient Clipping: Clipping the gradients to a maximum norm of 1.0 to stabilize training

## Training and evaluation

### First concept

In [6]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Instantiate the model
model = AOTModel()
model = model.model()

# Define callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)

# Train the model
history = model.fit(
    [X_train_img, X_train_num], y_train,
    validation_data=([X_val_img, X_val_num], y_val),
    epochs=200,
    batch_size=32,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

model.save('aot_model.keras')

# Evaluate the model
val_loss, val_mae = model.evaluate([X_val_img, X_val_num], y_val)
print(f'Validation MAE: {val_mae}')

# Evaluate on test set
test_loss, test_mae = model.evaluate([X_test_img, X_test_num], y_test)
print(f'Test MAE: {test_mae}')

2024-06-24 22:11:18.296463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10689 MB memory:  -> device: 0, name: NVIDIA TITAN V, pci bus id: 0000:5e:00.0, compute capability: 7.0
2024-06-24 22:11:18.297306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10689 MB memory:  -> device: 1, name: NVIDIA TITAN V, pci bus id: 0000:86:00.0, compute capability: 7.0


Epoch 1/200


I0000 00:00:1719263483.479886 3528219 service.cc:145] XLA service 0x7fc2cc001bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1719263483.479927 3528219 service.cc:153]   StreamExecutor device (0): NVIDIA TITAN V, Compute Capability 7.0
I0000 00:00:1719263483.479933 3528219 service.cc:153]   StreamExecutor device (1): NVIDIA TITAN V, Compute Capability 7.0
2024-06-24 22:11:23.608715: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-06-24 22:11:24.254298: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907


[1m 22/229[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - loss: 3.2283 - mae: 1.7888

I0000 00:00:1719263501.105922 3528219 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 92ms/step - loss: 2.6097 - mae: 1.3182 - val_loss: 1.0330 - val_mae: 0.0952 - learning_rate: 0.0010
Epoch 2/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 1.3100 - mae: 0.4668 - val_loss: 0.6869 - val_mae: 0.1009 - learning_rate: 0.0010
Epoch 3/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 0.7062 - mae: 0.1881 - val_loss: 0.4317 - val_mae: 0.0886 - learning_rate: 0.0010
Epoch 4/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 0.3994 - mae: 0.0991 - val_loss: 0.2822 - val_mae: 0.0875 - learning_rate: 0.0010
Epoch 5/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.2561 - mae: 0.0857 - val_loss: 0.1960 - val_mae: 0.0840 - learning_rate: 0.0010
Epoch 6/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 0.1798 - mae: 0.0808 - val_los

This bit of code initializes and compiles an instance of the AOTModel, then trains it using both image and numerical data. It utilizes EarlyStopping and ReduceLROnPlateau callbacks to optimize training and prevent overfitting. After training, the model is saved to a file, and its performance is evaluated on both validation and test datasets. The validation and test mean absolute errors (MAE) are printed to assess the model's accuracy.

+ Model Initialization and Compilation

    + Model Instantiation and Compilation:
        + model = AOTModel(): Creates an instance of the AOTModel class.
        + model = model.model(): Calls the model method of the AOTModel instance to create a compiled Keras model.

+ Callbacks

    + EarlyStopping:
        + monitor='val_loss': Monitors the validation loss.
        + patience=10: Training stops if the validation loss doesn't improve for 10 consecutive epochs.
        + restore_best_weights=True: Restores the model weights from the epoch with the best validation loss.
    + ReduceLROnPlateau:
        + monitor='val_loss': Monitors the validation loss.
        + factor=0.5: Reduces the learning rate by a factor of 0.5 if the validation loss doesn't improve.
        + patience=5: Waits for 5 epochs before reducing the learning rate.
        + min_lr=1e-6: Ensures that the learning rate doesn't go below 1×10−61×10−6.

+ Model Training

    + Training the Model:
        + model.fit(...): Trains the model using the provided training data.
        + [X_train_img, X_train_num]: Input training data consisting of images and numerical features.
        + y_train: Training labels.
        + validation_data=([X_val_img, X_val_num], y_val): Validation data consisting of images, numerical features, and labels.
        + epochs=200: Maximum number of epochs for training.
        + batch_size=32: Number of samples per gradient update.
        + callbacks=[early_stopping, reduce_lr]: List of callbacks to use during training.
        + verbose=1: Verbosity mode for logging the progress of training.

+ Model Saving

    + Saving the Model:
        + model.save('aot_model.keras'): Saves the trained model to a file named aot_model.keras.

+ Model Evaluation

    + Evaluating the Model on Validation Data:
        + val_loss, val_mae = model.evaluate([X_val_img, X_val_num], y_val): Evaluates the model on the validation data and returns the validation loss and mean absolute error (MAE).
        + print(f'Validation MAE: {val_mae}'): Prints the validation MAE.

    + Evaluating the Model on Test Data:
        + test_loss, test_mae = model.evaluate([X_test_img, X_test_num], y_test): Evaluates the model on the test data and returns the test loss and MAE.
        + print(f'Test MAE: {test_mae}'): Prints the test MAE.

### Optimization

In [7]:
# Instantiate the model with a chosen optimizer
model = OptimizedAOTModel().model(learning_rate=0.0001, optimizer_choice='adam')

early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)

print("starting training")
# Train the model
history = model.fit(
    [X_train_img, X_train_num], y_train,
    validation_data=([X_val_img, X_val_num], y_val),
    epochs=200,
    batch_size=32,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

model.save('optimized_aot_model.keras')

# Evaluate the model
val_loss, val_mae = model.evaluate([X_val_img, X_val_num], y_val)
print(f'Validation MAE: {val_mae}')

# Evaluate on test set
test_loss, test_mae = model.evaluate([X_test_img, X_test_num], y_test)
print(f'Test MAE: {test_mae}')

starting training
Epoch 1/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 100ms/step - loss: 6.2327 - mae: 1.7868 - val_loss: 4.5160 - val_mae: 0.2310 - learning_rate: 1.0000e-04
Epoch 2/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 5.5511 - mae: 1.3238 - val_loss: 4.3152 - val_mae: 0.2712 - learning_rate: 1.0000e-04
Epoch 3/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 5.1163 - mae: 1.1366 - val_loss: 4.0517 - val_mae: 0.2755 - learning_rate: 1.0000e-04
Epoch 4/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 4.6602 - mae: 0.9565 - val_loss: 3.7154 - val_mae: 0.2418 - learning_rate: 1.0000e-04
Epoch 5/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - loss: 4.2003 - mae: 0.8085 - val_loss: 3.3916 - val_mae: 0.2516 - learning_rate: 1.0000e-04
Epoch 6/200
[1m229/229[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s

This bit of code initializes and compiles an instance of the OptimizedAOTModel, then trains it using both image and numerical data. It utilizes EarlyStopping and ReduceLROnPlateau callbacks to optimize training and prevent overfitting. After training, the model is saved to a file, and its performance is evaluated on both validation and test datasets. The validation and test mean absolute errors (MAE) are printed to assess the model's accuracy.

+ Model Initialization and Compilation

    + Model Instantiation and Compilation:
        + model = AOTModel(): Creates an instance of the AOTModel class.
        + model = model.model(): Calls the model method of the AOTModel instance to create a compiled Keras model.

+ Callbacks

    + EarlyStopping:
        + monitor='val_loss': Monitors the validation loss.
        + patience=10: Training stops if the validation loss doesn't improve for 10 consecutive epochs.
        + restore_best_weights=True: Restores the model weights from the epoch with the best validation loss.
    + ReduceLROnPlateau:
        + monitor='val_loss': Monitors the validation loss.
        + factor=0.5: Reduces the learning rate by a factor of 0.5 if the validation loss doesn't improve.
        + patience=5: Waits for 5 epochs before reducing the learning rate.
        + min_lr=1e-6: Ensures that the learning rate doesn't go below 1×10−61×10−6.

+ Model Training

    + Training the Model:
        + model.fit(...): Trains the model using the provided training data.
        + [X_train_img, X_train_num]: Input training data consisting of images and numerical features.
        + y_train: Training labels.
        + validation_data=([X_val_img, X_val_num], y_val): Validation data consisting of images, numerical features, and labels.
        + epochs=200: Maximum number of epochs for training.
        + batch_size=32: Number of samples per gradient update.
        + callbacks=[early_stopping, reduce_lr]: List of callbacks to use during training.
        + verbose=1: Verbosity mode for logging the progress of training.

+ Model Saving

    + Saving the Model:
        + model.save('aot_model.keras'): Saves the trained model to a file named aot_model.keras.

+ Model Evaluation

    + Evaluating the Model on Validation Data:
        + val_loss, val_mae = model.evaluate([X_val_img, X_val_num], y_val): Evaluates the model on the validation data and returns the validation loss and mean absolute error (MAE).
        + print(f'Validation MAE: {val_mae}'): Prints the validation MAE.

    + Evaluating the Model on Test Data:
        + test_loss, test_mae = model.evaluate([X_test_img, X_test_num], y_test): Evaluates the model on the test data and returns the test loss and MAE.
        + print(f'Test MAE: {test_mae}'): Prints the test MAE.

## Predictions

In [8]:
from tensorflow.keras.models import load_model

# Load the saved model
# model = load_model('aot_model.keras')

model = load_model('optimized_aot_model.keras')

# Load the new dataset
new_df = pd.read_csv("datasets/test.csv")

# Preprocess numerical data
new_numerical_features = new_df[['elevation', 'ozone', 'NO2', 'azimuth', 'zenith', 'incidence_azimuth', 'incidence_zenith']]
new_numerical_features = scaler.transform(new_numerical_features)

# Load and preprocess new image data
new_image_data = np.array([load_and_preprocess_image(os.path.join('./test/', filename)) for filename in new_df['file_name_l1']])

# Predict values for the new data
predictions = model.predict([new_image_data, new_numerical_features])

# Save the predictions to a CSV file
results = pd.DataFrame({
    'id': new_df['id'],
    'value_550': predictions.flatten()
})
results.to_csv('predictions.csv', index=False)

print(results.head())

[1m85/85[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step
   id  value_550
0   3   0.126318
1  25   0.100225
2  26   0.066412
3  27   0.128408
4  29   0.073117


Finally here we did the prevision of the test data (here only shown the predictions made by the optimized architecture).

## Conclusion

The neural network architectures presented are specifically designed to predict aerosol optical thickness (AOT) by processing both image and numerical data. Here are the key insights and conclusions based on the provided models and their training procedures:

### Effectiveness of Multi-Input Neural Networks
The architectures effectively combine convolutional neural networks (CNNs) for image processing with dense layers for numerical data processing. This multi-input approach allows the model to leverage spatial features from satellite or other imaging data alongside additional numerical features, providing a comprehensive understanding of the factors influencing AOT.

### Optimization Techniques
Several advanced techniques are employed to enhance model performance:
- **BatchNormalization**: This helps in stabilizing and accelerating the training process by normalizing the inputs of each layer.
- **Dropout**: By introducing dropout layers, the model is better regularized, reducing the risk of overfitting.
- **HeUniform Initialization**: This weight initialization technique helps in maintaining a good distribution of weights, which is particularly useful for training deep networks.
- **LeakyReLU Activation**: This activation function helps prevent dead neurons, ensuring that all neurons can contribute to learning.

### Callbacks for Improved Training
The use of callbacks such as `EarlyStopping` and `ReduceLROnPlateau` demonstrates a robust approach to training:
- **EarlyStopping**: This helps in stopping the training process once the model stops improving, saving time and computational resources while ensuring the model does not overfit.
- **ReduceLROnPlateau**: This reduces the learning rate when the model's performance plateaus, allowing for finer adjustments and potentially better convergence.

### Performance Evaluation
The training and evaluation process includes careful monitoring of validation and test mean absolute error (MAE), providing clear metrics to assess the model's predictive performance. The reported MAE on validation and test sets offers insights into how well the model generalizes to unseen data, which is crucial for real-world applications.

### Application in Aerosol Optical Thickness Prediction
The models are well-suited for predicting AOT, a critical parameter in understanding atmospheric conditions and air quality. By accurately predicting AOT, these models can aid in:
- **Environmental Monitoring**: Providing timely and accurate information on aerosol concentrations.
- **Climate Studies**: Contributing to research on the impact of aerosols on climate change.
- **Public Health**: Informing public health initiatives by tracking air quality and its potential effects on respiratory health.

### Future Directions
While the presented models show promise, future work could explore:
- **Enhanced Data Integration**: Incorporating additional data sources such as meteorological data or historical AOT measurements.
- **Model Refinement**: Experimenting with different architectures or advanced techniques such as attention mechanisms to further improve accuracy.
- **Deployment and Scalability**: Developing scalable solutions for real-time AOT prediction in various geographical regions.

In conclusion, the designed neural networks provide a powerful tool for predicting aerosol optical thickness, combining sophisticated data processing techniques with robust training strategies to achieve reliable and accurate predictions.
