:warning:**IMPORTANT NOTICE**:warning:\
*Since the initial parameterisation and gradient descent optimisation are stochastic processes, the training of a neural network is not fully reproducible.*

*Therefore, it is not recommended to re-run this script as it will overwrite the original calibration of the neural network used in the work presented here.
The purpose of this script is solely to document the training procedure and can be copied as a template to fit other new neural networks.*

*To experiment with the models calibrated here, they can be loaded from the `saved_models` directory.*

# Systematic assesment of different hyperparameters to optimise model performance
This notebook is used to find the optimal hyperparameters fot the NN-thermometer based on XMg and Ti.\
**Only a thermometer, no barometer.**

Various hyperparameters are varied to asses their impact on the model performance in a semi-quantitative way.\
The hyperparameters that are varied are:
- Number of hidden layers / neurons --> model capacity
- Initial learning rate
- Activation function
- Effect of regularization

### Model capacity

These architectures are tested:
- **very small**: 1 hidden layers with 8 neurons each
- **small**: 1 hidden layers with 16 neurons each
- **large**: 1 hidden layers with 32 neurons each
- **small_2hl**: 2 hidden layers with 16 neurons each
- **large_2hl**: 2 hidden layers with 32 neurons each
- **huge**: 3 hidden layers with 64 neurons

### Learning rate

These learning rates are tested:
- 0.01
- 0.001
- 0.0005
- 0.0001

### Activation function
*Decided against testing!*

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf

from pathlib import Path
from keras.models import Sequential
from keras.layers import Dense, Normalization, BatchNormalization, LayerNormalization, Dropout
from keras.losses import MeanSquaredError
from keras.optimizers import Adam, schedules
from keras.metrics import MeanAbsoluteError, RootMeanSquaredError
from keras.callbacks import CSVLogger, EarlyStopping
from sklearn.model_selection import train_test_split

from ml_tb.normalisation import MinMaxScaler
from ml_tb.metrics import RMSE_denormalised_temperature_only

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd





## Import data, pre-procesing and train/test split

Load the two datasets and split them into training and validation sets.

Validation set is **20%** of the training set. Approx. 200 samples.

In [2]:
VALIDATION_FRACTION = 0.2

For the scaling of the P-T data, a MinMaxScaler is used.
This scaler is defined globally and used for all datasets.

In [3]:
scaling_pt = MinMaxScaler(min=400, max=900, axis=0)
inv_scaling_pt = MinMaxScaler(min=400, max=900, axis=0, invert=True)




### **Dataset: Ti-XMg**

In [4]:
# load excel file
data = pd.read_excel(Path("Metapelite-Database_Bt_CLEAN_2024-02-03.xlsx"))

biotite_composition = np.zeros(shape=(len(data), 2))
biotite_composition[:, 0] = data["Bt-Ti"]
biotite_composition[:, 1] = data["Bt-XMg"]


temperature = np.zeros(shape=(len(data), 1))
temperature = data["Temperature random ordered after Ti-in-Bt"]

# check for NaN values (should be already filtered out)
print("NaN values in biotite composition: ", np.isnan(biotite_composition).any())
print("NaN values in PT: ", np.isnan(temperature).any())

NaN values in biotite composition:  False
NaN values in PT:  False


In [5]:
# test train split
biotite_composition_train, biotite_composition_val, pt_train, pt_val = train_test_split(biotite_composition, temperature, test_size=VALIDATION_FRACTION, shuffle=True)

normalisation_biotite_composition = Normalization(axis=-1)
normalisation_biotite_composition.adapt(biotite_composition_train)

print(normalisation_biotite_composition.mean.numpy())
print(np.sqrt(normalisation_biotite_composition.variance.numpy()))

pt_train_norm = scaling_pt(pt_train)
pt_val_norm = scaling_pt(pt_val)

print("After normalisation, the minimal value of P and T is: ", pt_train_norm.numpy().min(axis=0))
print("After normalisation, the maximal value of P and T is: ", pt_train_norm.numpy().max(axis=0))

[[0.12126549 0.49694127]]
[[0.04824786 0.10380844]]
After normalisation, the minimal value of P and T is:  0.00066033937
After normalisation, the maximal value of P and T is:  0.8988128


## Set-up global training parameters

Define a function to calculate RMSE for pressure and temperature for unscaled values to have an interpretable metric.

All models are trained for a maximum of 5000 epochs.\
Early stopping is used with a patience (no improvement of val loss) of 50 epochs.\
Inverse time learning rate decay is used for all models.

In [6]:
def RMSE_T(y_true, y_pred):
    return RMSE_denormalised_temperature_only(y_true, y_pred, inv_scaling_pt)

In [7]:
BATCH_SIZE = 50
STEPS_PER_EPOCH = len(biotite_composition_train) // BATCH_SIZE
MAX_EPOCHS = 5000

lr_schedule = schedules.InverseTimeDecay(0.001, decay_steps=STEPS_PER_EPOCH*1000, decay_rate=1, staircase=False)

LOSS = MeanSquaredError()
METRICS = [MeanAbsoluteError(), RootMeanSquaredError(), RMSE_T]

## **Test 01:** Model capacity

- **very small**: 1 hidden layers with 4 neurons each
- **small**: 1 hidden layers with 16 neurons each
- **large**: 1 hidden layers with 32 neurons each
- **small_2hl**: 2 hidden layers with 16 neurons each
- **large_2hl**: 2 hidden layers with 32 neurons each
- **huge**: 3 hidden layers with 64 neurons

In [8]:
CALLBACKS_VERYSMALL = [CSVLogger("HyperParamTest_verysmall_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_SMALL = [CSVLogger("HyperParamTest_small_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_LARGE = [CSVLogger("HyperParamTest_large_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_SMALL_2HL = [CSVLogger("HyperParamTest_small_2HL_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_LARGE_2HL = [CSVLogger("HyperParamTest_large_2HL_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_HUGE = [CSVLogger("HyperParamTest_huge_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]

In [9]:
OPT = Adam(lr_schedule)

verysmall_model = Sequential()
verysmall_model.add(normalisation_biotite_composition)
verysmall_model.add(Dense(4, activation="relu"))
verysmall_model.add(Dense(1))

verysmall_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
verysmall_model.summary()

history = verysmall_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_VERYSMALL, verbose=False)
verysmall_model.save(Path("saved_models", "verysmall_model_TiXMg"))

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense (Dense)               (None, 4)                 12        
                                                                 
 dense_1 (Dense)             (None, 1)                 5         
                                                                 
Total params: 22 (92.00 Byte)
Trainable params: 17 (68.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________




INFO:tensorflow:Assets written to: saved_models\verysmall_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\verysmall_model_TiXMg\assets


In [10]:
OPT = Adam(lr_schedule)

small_model = Sequential()
small_model.add(normalisation_biotite_composition)
small_model.add(Dense(16, activation="relu"))
small_model.add(Dense(1))

small_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
small_model.summary()

history = small_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_SMALL, verbose=False)
small_model.save(Path("saved_models", "small_model_TiXMg"))

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   


 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_2 (Dense)             (None, 16)                48        
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
Total params: 70 (284.00 Byte)
Trainable params: 65 (260.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________
INFO:tensorflow:Assets written to: saved_models\small_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\small_model_TiXMg\assets


In [11]:
OPT = Adam(lr_schedule)

large_model = Sequential()
large_model.add(normalisation_biotite_composition)
large_model.add(Dense(32, activation="relu"))
large_model.add(Dense(1))

large_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
large_model.summary()

history = large_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LARGE, verbose=False)
large_model.save(Path("saved_models", "large_model_TiXMg"))

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_4 (Dense)             (None, 32)                96        
                                                                 
 dense_5 (Dense)             (None, 1)                 33        
                                                                 
Total params: 134 (540.00 Byte)
Trainable params: 129 (516.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________
INFO:tensorflow:Assets written to: saved_models\large_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\large_model_TiXMg\assets


In [12]:
OPT = Adam(lr_schedule)

small_2hl_model = Sequential()
small_2hl_model.add(normalisation_biotite_composition)
small_2hl_model.add(Dense(16, activation="relu"))
small_2hl_model.add(Dense(16, activation="relu"))
small_2hl_model.add(Dense(1))

small_2hl_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
small_2hl_model.summary()

history = small_2hl_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_SMALL_2HL, verbose=False)
small_2hl_model.save(Path("saved_models", "small_2hl_model_TiXMg"))

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_6 (Dense)             (None, 16)                48        
                                                                 
 dense_7 (Dense)             (None, 16)                272       
                                                                 
 dense_8 (Dense)             (None, 1)                 17        
                                                                 
Total params: 342 (1.34 KB)
Trainable params: 337 (1.32 KB)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________
INFO:tensorflow:Assets written to: saved_models\small_2hl_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\small_2hl_model_TiXMg\assets


In [13]:
OPT = Adam(lr_schedule)

large_2hl_model = Sequential()
large_2hl_model.add(normalisation_biotite_composition)
large_2hl_model.add(Dense(32, activation="relu"))
large_2hl_model.add(Dense(32, activation="relu"))
large_2hl_model.add(Dense(1))

large_2hl_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
large_2hl_model.summary()

history = large_2hl_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LARGE_2HL, verbose=False)
large_2hl_model.save(Path("saved_models", "large_2hl_model_TiXMg"))

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_9 (Dense)             (None, 32)                96        
                                                                 
 dense_10 (Dense)            (None, 32)                1056      
                                                                 
 dense_11 (Dense)            (None, 1)                 33        
                                                                 
Total params: 1190 (4.65 KB)
Trainable params: 1185 (4.63 KB)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________
INFO:tensorflow:Assets written to: saved_models\large_2hl_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\large_2hl_model_TiXMg\assets


In [14]:
OPT = Adam(lr_schedule)

huge_model = Sequential()
huge_model.add(normalisation_biotite_composition)
huge_model.add(Dense(64, activation="relu"))
huge_model.add(Dense(64, activation="relu"))
huge_model.add(Dense(64, activation="relu"))
huge_model.add(Dense(1))

huge_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
huge_model.summary()

history = huge_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_HUGE, verbose=False)
huge_model.save(Path("saved_models", "huge_model_TiXMg"))

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_12 (Dense)            (None, 64)                192       
                                                                 
 dense_13 (Dense)            (None, 64)                4160      
                                                                 
 dense_14 (Dense)            (None, 64)                4160      
                                                                 
 dense_15 (Dense)            (None, 1)                 65        
                                                                 
Total params: 8582 (33.53 KB)
Trainable params: 8577 (33.50 KB)
Non-trainable params: 5 (24.00 Byte)
___________________

INFO:tensorflow:Assets written to: saved_models\huge_model_TiXMg\assets


## **Test 02:** Learning rate

All tests are performed with the optimal model architecture from Test 01. --> *small*

These learning rates are tested:
- 0.01
- 0.001
- 0.0005
- 0.0001

In [15]:
CALLBACKS_LR1 = [CSVLogger("HyperParamTest_LR1_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_LR2 = [CSVLogger("HyperParamTest_LR2_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_LR3 = [CSVLogger("HyperParamTest_LR3_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_LR4 = [CSVLogger("HyperParamTest_LR4_TiXMg.log", append=False), EarlyStopping(monitor="loss", patience=50)]

In [16]:
lr_schedule.initial_learning_rate = 0.01

In [17]:
OPT = Adam(lr_schedule)

LR1_model = Sequential()
LR1_model.add(normalisation_biotite_composition)
LR1_model.add(Dense(16, activation="relu"))
LR1_model.add(Dense(1))

LR1_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
LR1_model.summary()

history = LR1_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LR1, verbose=False)
LR1_model.save(Path("saved_models", "LR1_model_TiXMg"))

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_16 (Dense)            (None, 16)                48        
                                                                 
 dense_17 (Dense)            (None, 1)                 17        
                                                                 
Total params: 70 (284.00 Byte)
Trainable params: 65 (260.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________


INFO:tensorflow:Assets written to: saved_models\LR1_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\LR1_model_TiXMg\assets


In [18]:
lr_schedule.initial_learning_rate = 0.001

In [19]:
OPT = Adam(lr_schedule)

LR2_model = Sequential()
LR2_model.add(normalisation_biotite_composition)
LR2_model.add(Dense(16, activation="relu"))
LR2_model.add(Dense(1))

LR2_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
LR2_model.summary()

history = LR2_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LR2, verbose=False)
LR2_model.save(Path("saved_models", "LR2_model_TiXMg"))

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_18 (Dense)            (None, 16)                48        
                                                                 
 dense_19 (Dense)            (None, 1)                 17        
                                                                 
Total params: 70 (284.00 Byte)
Trainable params: 65 (260.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________


INFO:tensorflow:Assets written to: saved_models\LR2_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\LR2_model_TiXMg\assets


In [20]:
lr_schedule.initial_learning_rate = 0.0005

In [21]:
OPT = Adam(lr_schedule)

LR3_model = Sequential()
LR3_model.add(normalisation_biotite_composition)
LR3_model.add(Dense(16, activation="relu"))
LR3_model.add(Dense(1))

LR3_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
LR3_model.summary()

history = LR3_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LR3, verbose=False)
LR3_model.save(Path("saved_models", "LR3_model_TiXMg"))

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_20 (Dense)            (None, 16)                48        


                                                                 
 dense_21 (Dense)            (None, 1)                 17        
                                                                 
Total params: 70 (284.00 Byte)
Trainable params: 65 (260.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________
INFO:tensorflow:Assets written to: saved_models\LR3_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\LR3_model_TiXMg\assets


In [22]:
lr_schedule.initial_learning_rate = 0.0001

In [23]:
OPT = Adam(lr_schedule)

LR4_model = Sequential()
LR4_model.add(normalisation_biotite_composition)
LR4_model.add(Dense(16, activation="relu"))
LR4_model.add(Dense(1))

LR4_model.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
LR4_model.summary()

history = LR4_model.fit(biotite_composition_train, pt_train_norm,
                          batch_size=BATCH_SIZE, epochs=MAX_EPOCHS,
                          validation_data=[biotite_composition_val, pt_val_norm],
                          callbacks=CALLBACKS_LR4, verbose=False)
LR4_model.save(Path("saved_models", "LR4_model_TiXMg"))

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 2)                 5         
 on)                                                             
                                                                 
 dense_22 (Dense)            (None, 16)                48        
                                                                 
 dense_23 (Dense)            (None, 1)                 17        
                                                                 
Total params: 70 (284.00 Byte)
Trainable params: 65 (260.00 Byte)
Non-trainable params: 5 (24.00 Byte)
_________________________________________________________________


INFO:tensorflow:Assets written to: saved_models\LR4_model_TiXMg\assets


INFO:tensorflow:Assets written to: saved_models\LR4_model_TiXMg\assets
