:warning:**IMPORTANT NOTICE**:warning:\
*Since the initial parameterisation and gradient descent optimisation are stochastic processes, the training of a neural network is not fully reproducible.*

*Therefore, it is not recommended to re-run this script as it will overwrite the original calibration of the neural network used in the work presented here.
The purpose of this script is solely to document the training procedure and can be copied as a template to fit other new neural networks.*

*To experiment with the models calibrated here, they can be loaded from the `saved_models` directory.*

# TransferLearning: "Test different transfering techniques"

3 different ways to transfer prior knowledge from a pre-trained model to the thermobarmeter regression on natural data are tested:
- **Feature extraction**: Use the pre-trained model to predict *P* and *T* and use these predictions as additional features for the thermobarmeter regression.
- **Fine-tuning**: Use the pre-trained model as intial parameterisation and fine-tune the model on the thermobarmeter regression.

    *Different fine-tuning strategies are tested:*
    - Fine-tune all layers
    - Fine-tune only the 2 last layers
    - Fine-tune only the last layer
    - Fine-tune with L2 regularization (This will keep weights close to zero, in an additional step a custom regularizer should be implemented to keep the weights close to the prior model weights)

(- **Injection learning**: Keep training the prior model, but "inject" the natural data.)
--> It is to be tested what the best approach to this is. To begin with I would use a 50:50 mix of the prior (simulated) data and the natural data. Could be used in with both feature extraction and fine-tuning. (Maybe test in separate notebook)!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf

from pathlib import Path
from keras.models import Model, Sequential, load_model, clone_model
from keras.layers import Dense, Normalization, BatchNormalization, LayerNormalization, Dropout, Input, concatenate
from keras.losses import MeanSquaredError
from keras.optimizers import Adam, schedules
from keras.metrics import MeanAbsoluteError, RootMeanSquaredError
from keras.callbacks import CSVLogger, EarlyStopping
from sklearn.model_selection import train_test_split

from ml_tb.normalisation import MinMaxScaler
from ml_tb.metrics import RMSE_denormalised_T, RMSE_denormalised_P
from ml_tb.plot import plot_training_curve, prediction_vs_truth

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd





## Import data, pre-procesing and train/test split

Validation set is **20%** of the training set. Approx. 200 samples.

In [2]:
VALIDATION_FRACTION = 0.2

Global scaling parameters for MinMaxScaling of the target data are hard-coded to the range of the training data.

In [3]:
scaling_pt = MinMaxScaler(min=[1500, 400], max=[10000, 900], axis=0)
inv_scaling_pt = MinMaxScaler(min=[1500, 400], max=[10000, 900], axis=0, invert=True)




In [4]:
# load excel file
data = pd.read_excel(Path("..","01_fit_natural_biotite","Metapelite-Database_Bt_CLEAN_2024-02-03.xlsx"))

biotite_composition = np.zeros(shape=(len(data), 6))
biotite_composition[:, 0] = data["Bt-Si"]
biotite_composition[:, 1] = data["Bt-Ti"]
biotite_composition[:, 2] = data["Bt-Al"]
biotite_composition[:, 3] = data["Bt-FeTot"]
biotite_composition[:, 4] = data["Bt-Mn"]
biotite_composition[:, 5] = data["Bt-Mg"]

pt = np.zeros(shape=(len(data), 2))
pt[:, 0] = data["Pressure estimate random uniform"] * 1000 # convert to bar
pt[:, 1] = data["Temperature random ordered after Ti-in-Bt"]

# check for NaN values (should be already filtered out)
print("NaN values in biotite composition: ", np.isnan(biotite_composition).any())
print("NaN values in PT: ", np.isnan(pt).any())

NaN values in biotite composition:  False
NaN values in PT:  False


In [5]:
# test train split
data_train, data_val, pt_train, pt_val = train_test_split(biotite_composition, pt, test_size=VALIDATION_FRACTION, shuffle=True)

# NORMALISATION
normalisation_biotite_composition = Normalization(axis=-1)
normalisation_biotite_composition.adapt(data_train)

print(normalisation_biotite_composition.mean.numpy())
print(np.sqrt(normalisation_biotite_composition.variance.numpy()))

# SCALING of PT
pt_train_norm = scaling_pt(pt_train)
pt_val_norm = scaling_pt(pt_val)

print("After scaling, the minimal values of P and T are: ", np.min(pt_train_norm, axis=0))
print("After scaling, the maximal values of P and T are: ", np.max(pt_train_norm, axis=0))

[[2.7068577  0.12046875 1.7131128  1.2358406  0.00848252 1.0821007 ]]
[[0.04696281 0.04720167 0.09208625 0.22648923 0.00735622 0.25816932]]
After scaling, the minimal values of P and T are:  [0.00010304 0.00158875]
After scaling, the maximal values of P and T are:  [0.940462  0.8988128]


# Set global training parameters for all tests

+ Define a custom metric for RMSE_P and RMSE_T

In [6]:
def RMSE_P(y_true, y_pred):
    return RMSE_denormalised_P(y_true, y_pred, inv_scaling_pt)


def RMSE_T(y_true, y_pred):
    return RMSE_denormalised_T(y_true, y_pred, inv_scaling_pt)

In [7]:
BATCH_SIZE = 50
STEPS_PER_EPOCH = len(data_train) // BATCH_SIZE
MAX_EPOCHS = 5000

lr_schedule = schedules.InverseTimeDecay(0.001, decay_steps=STEPS_PER_EPOCH*1000, decay_rate=1, staircase=False)

LOSS = MeanSquaredError()
METRICS = [MeanAbsoluteError(), RootMeanSquaredError(), RMSE_P, RMSE_T]

Set up callbacks for each test

In [8]:
CALLBACKS_FEATURE_EXTR = [CSVLogger("Transfer_technique_feature_extraction.log"), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_FINETUNE_NOREG = [CSVLogger("Transfer_technique_finetune_noreg.log"), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_FINETUNE_ALL = [CSVLogger("Transfer_technique_finetune_all.log"), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_FINETUNE_LAST2 = [CSVLogger("Transfer_technique_finetune_last2.log"), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_FINETUNE_LAST = [CSVLogger("Transfer_technique_finetune_last.log"), EarlyStopping(monitor="loss", patience=50)]
CALLBACKS_FINETUNE_L2 = [CSVLogger("Transfer_technique_finetune_l2.log"), EarlyStopping(monitor="loss", patience=50)]

# Load pre-trained model

In [9]:
# load a saved model from "02_pretraining\saved_models\model_ds62White2014"
model_prior = load_model(Path("..","02_pretraining", "saved_models/model_ds62White2014"), compile=False)





## **Test 01**: Feature extraction

In [10]:
model_prior_01 = clone_model(model_prior, input_tensors=Input(shape=(6,)))
# freeze all layers
for layer in model_prior_01.layers:
    layer.trainable = False

model_prior_01.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 dense_4 (Dense)             (None, 2)                 130       
                                                        

In [11]:
OPT = Adam(lr_schedule)

input_vector = Input(shape=(6,))

# predict a prior PT ("initital guess") with the prior model
prior_PT = model_prior_01(input_vector)

# normalise input and concatenate with prior_PT
normed_imput = normalisation_biotite_composition(input_vector)
concatenated = concatenate([normed_imput, prior_PT])

# top model
out_dense = Dense(16, activation="relu")(concatenated)
out_PT = Dense(2, activation=None)(out_dense)

model_feature_extraction = Model(inputs=input_vector, outputs=out_PT)

model_feature_extraction.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_feature_extraction.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_2 (InputLayer)        [(None, 6)]                  0         []                            
                                                                                                  
 normalization (Normalizati  (None, 6)                    13        ['input_2[0][0]']             
 on)                                                                                              
                                                                                                  
 sequential (Sequential)     (None, 2)                    13071     ['input_2[0][0]']             
                                                                                                  
 concatenate (Concatenate)   (None, 8)                    0         ['normalization[0][0]',   

In [12]:
history = model_feature_extraction.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FEATURE_EXTR,
                    verbose=False)

model_feature_extraction.save(Path("saved_models", "feature_extraction"))



INFO:tensorflow:Assets written to: saved_models\feature_extraction\assets


INFO:tensorflow:Assets written to: saved_models\feature_extraction\assets


## **Test 02-o**: Fine-tuning all layers without additional regularization (dropout)

The "dumb" approach. To show that fine-tuning can easily lead to overfitting.

For fine-tuning tests the learning rate is lowered.

In [13]:
# 50% of lr for fine-tuning as intital guess (must be properly tuned later on)
lr_schedule = schedules.InverseTimeDecay(0.0005, decay_steps=STEPS_PER_EPOCH*1000, decay_rate=1, staircase=False)

### Build model
- Take layers from pre-trained model with trained weights and biases
- Add natural data normalization layer at the beginning

In [14]:
model_prior_02o = clone_model(model_prior, input_tensors=Input(shape=(6,)))

In [15]:
OPT = Adam(lr_schedule)

model_fine_tune_reg = Sequential()
model_fine_tune_reg.add(normalisation_biotite_composition)
model_fine_tune_reg.add(model_prior_02o.layers[1])
model_fine_tune_reg.add(model_prior_02o.layers[2])
model_fine_tune_reg.add(model_prior_02o.layers[3])
model_fine_tune_reg.add(model_prior_02o.layers[4])
model_fine_tune_reg.add(model_prior_02o.layers[5])

model_fine_tune_reg.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_fine_tune_reg.summary()

Model: "sequential"


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 dense_4 (Dense)             (None, 2)                 130       
                                                                 
Total para

In [16]:
history = model_fine_tune_reg.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FINETUNE_NOREG,
                    verbose=False)

model_fine_tune_reg.save(Path("saved_models", "fine_tune_no_reg"))

INFO:tensorflow:Assets written to: saved_models\fine_tune_no_reg\assets


INFO:tensorflow:Assets written to: saved_models\fine_tune_no_reg\assets


## **Test 02a**: Fine-tuning all layers

In [17]:
model_prior_02a = clone_model(model_prior, input_tensors=Input(shape=(6,)))

In [18]:
OPT = Adam(lr_schedule)

model_fine_tune_all = Sequential()
model_fine_tune_all.add(normalisation_biotite_composition)
model_fine_tune_all.add(model_prior_02a.layers[1])
model_fine_tune_all.add(Dropout(0.2))
model_fine_tune_all.add(model_prior_02a.layers[2])
model_fine_tune_all.add(Dropout(0.2))
model_fine_tune_all.add(model_prior_02a.layers[3])
model_fine_tune_all.add(Dropout(0.2))
model_fine_tune_all.add(model_prior_02a.layers[4])
model_fine_tune_all.add(model_prior_02a.layers[5])

model_fine_tune_all.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_fine_tune_all.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dropout_1 (Dropout)         (None, 64)                0         
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                      

In [19]:
history = model_fine_tune_all.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FINETUNE_ALL,
                    verbose=False)

model_fine_tune_all.save(Path("saved_models", "fine_tune_all"))

INFO:tensorflow:Assets written to: saved_models\fine_tune_all\assets


INFO:tensorflow:Assets written to: saved_models\fine_tune_all\assets


## **Test 02b**: Fine-tuning only the 2 last layers

In [20]:
model_prior_02b = clone_model(model_prior, input_tensors=Input(shape=(6,)))

In [21]:
OPT = Adam(lr_schedule)

model_fine_tune_last2 = Sequential()
model_fine_tune_last2.add(normalisation_biotite_composition)
model_fine_tune_last2.add(model_prior_02b.layers[1])
model_fine_tune_last2.add(model_prior_02b.layers[2])

model_fine_tune_last2.layers[1].trainable = False
model_fine_tune_last2.layers[2].trainable = False

model_fine_tune_last2.add(model_prior_02b.layers[3])
model_fine_tune_last2.add(Dropout(0.2))
model_fine_tune_last2.add(model_prior_02b.layers[4])
model_fine_tune_last2.add(model_prior_02b.layers[5])

model_fine_tune_last2.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_fine_tune_last2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 dropout_3 (Dropout)         (None, 64)                0         
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                      

In [22]:
history = model_fine_tune_last2.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FINETUNE_LAST2,
                    verbose=False)

model_fine_tune_last2.save(Path("saved_models", "fine_tune_last2"))

INFO:tensorflow:Assets written to: saved_models\fine_tune_last2\assets


INFO:tensorflow:Assets written to: saved_models\fine_tune_last2\assets


## **Test 02c**: Fine-tuning only the last layer

In [23]:
model_prior_02c = clone_model(model_prior, input_tensors=Input(shape=(6,)))

In [24]:
OPT = Adam(lr_schedule)

model_fine_tune_last = Sequential()
model_fine_tune_last.add(normalisation_biotite_composition)
model_fine_tune_last.add(model_prior_02c.layers[1])
model_fine_tune_last.add(model_prior_02c.layers[2])
model_fine_tune_last.add(model_prior_02c.layers[3])


model_fine_tune_last.layers[1].trainable = False
model_fine_tune_last.layers[2].trainable = False
model_fine_tune_last.layers[3].trainable = False

model_fine_tune_last.add(model_prior_02c.layers[4])
model_fine_tune_last.add(model_prior_02c.layers[5])

model_fine_tune_last.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_fine_tune_last.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 dense_4 (Dense)             (None, 2)                 130       
                                                      

In [25]:
history = model_fine_tune_last.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FINETUNE_LAST,
                    verbose=False)

model_fine_tune_last.save(Path("saved_models", "fine_tune_last"))

INFO:tensorflow:Assets written to: saved_models\fine_tune_last\assets


INFO:tensorflow:Assets written to: saved_models\fine_tune_last\assets


## **Test 02d**: Fine-tuning with L2 regularization

In [26]:
model_prior_02d = clone_model(model_prior, input_tensors=Input(shape=(6,)))

In [27]:
OPT = Adam(lr_schedule)

model_fine_tune_l2 = Sequential()
model_fine_tune_l2.add(normalisation_biotite_composition)
model_fine_tune_l2.add(model_prior_02d.layers[1])
model_fine_tune_l2.layers[1].kernel_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.layers[1].bias_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.add(model_prior_02d.layers[2])
model_fine_tune_l2.layers[2].kernel_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.layers[2].bias_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.add(model_prior_02d.layers[3])
model_fine_tune_l2.layers[3].kernel_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.add(model_prior_02d.layers[4])
model_fine_tune_l2.layers[4].kernel_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.layers[4].bias_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.add(model_prior_02d.layers[5])
model_fine_tune_l2.layers[5].kernel_regularizer = tf.keras.regularizers.l2()
model_fine_tune_l2.layers[5].bias_regularizer = tf.keras.regularizers.l2()

model_fine_tune_l2.compile(optimizer=OPT, loss=LOSS, metrics=METRICS)
model_fine_tune_l2.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizati  (None, 6)                 13        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 64)                4160      
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 dense_4 (Dense)             (None, 2)                 130       
                                                      

In [28]:
model_fine_tune_l2.layers[1].kernel_regularizer

<keras.src.regularizers.L2 at 0x1c07b83a650>

In [29]:
history = model_fine_tune_l2.fit(data_train, pt_train_norm,
                    validation_data=(data_val, pt_val_norm),
                    batch_size=BATCH_SIZE,
                    epochs=MAX_EPOCHS,
                    callbacks=CALLBACKS_FINETUNE_L2,
                    verbose=False)

model_fine_tune_l2.save(Path("saved_models", "fine_tune_l2"))

INFO:tensorflow:Assets written to: saved_models\fine_tune_l2\assets


INFO:tensorflow:Assets written to: saved_models\fine_tune_l2\assets
