# Training file

This file regroups every kind of test for model training : 
- The first file & library imports are mandatory
- The last exports are mandatory
- The model training depends on the choice of algorithm

There is an option to automate the training using togglable parameters 
Make sure to have set up the correct file structure using env_setup.ipynb

## Documentation links

## Python libraries

In [10]:
# Data Libraries
import pandas as pd
import os

# TensorFlow Part
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Input,Dense
import keras_tuner
from sklearn.model_selection import train_test_split

## Import training DataFrame

### Set import parameters

In [11]:
IS_JSON_SUPPORTED = False
DATA_FOLDER = "versuch_f1"
VERSION_NB = 1
FILENAME="model_training.ipynb"
DIRNAME=os.path.abspath(FILENAME).replace(FILENAME,'')
DF_POINTS_RANGE = 11
DF_POINTS_LENGTH = 3000

### Import from Feather file

In [12]:
# Load DataFrame
full_df=pd.read_feather(DIRNAME+"data/feather/"+DATA_FOLDER+"_"+str(DF_POINTS_LENGTH)+"_"+str(VERSION_NB)+".feather")

# Drop unnecessary columns
full_df=full_df.drop(["t_Sec","t_nSec","Fx1","Fy1","Fz1","Tx1","Ty1","Tz1"],axis=1)

# Transform position from absolute to relative
OFFSET_XYZ=[-578.6,261.4,-375]
CARTESIAN_COLUMNS=['curCart_x','comdCart_x','curCart_y','comdCart_y','curCart_z','comdCart_z']
for i in range(len(CARTESIAN_COLUMNS)):
    full_df[CARTESIAN_COLUMNS[i]] = full_df[CARTESIAN_COLUMNS[i]] + OFFSET_XYZ[i//2]

### Split the dataset into train/validation sets

In [13]:
# Dataset parameters
TEST_TRAIN_RATIO = 0.1
VALID_TRAIN_RATIO = 0.2

# Shuffle DataFrame
shuffled_df=full_df.sample(frac=1,random_state=1)

# X & Y datasets
X = shuffled_df[['exT_A1','exT_A2','exT_A3','exT_A4','exT_A5','exT_A6','exT_A7',
                'msT_A1','msT_A2','msT_A3','msT_A4','msT_A5','msT_A6','msT_A7',
                'Fx','Fy','Fz','Tx','Ty','Tz']].to_numpy()
Y = shuffled_df[['curCart_x','curCart_y','curCart_z']].to_numpy()

# Train / Test / Validation dataset
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size=TEST_TRAIN_RATIO, random_state=2)
X_Train, X_Valid, Y_Train, Y_Valid = train_test_split(X_Train, Y_Train, test_size=VALID_TRAIN_RATIO, random_state=3)

## Model building

### Model 1 - FFNN Regression

In [14]:
def ffnn_regression(hp):
    # Create model
    model = Sequential()

    # Input layer
    model.add(Dense(20,input_shape=(20,),activation='relu'))

    # Tunable number of hidden layers
    for i in range(hp.Int("num_layers",1,5)):
        model.add(Dense(
            # Number of units
            units = hp.Int(f"units_{i}",min_value=32,max_value=512,step=32),
            # Tune activation function
            activation = hp.Choice("activation",["relu","tanh"])
            )
        )
    # Tune whether to use dropout
    if hp.Boolean("dropout"):
        model.add(Dropout(rate=0.2))

    # Output layer
    model.add(Dense(3,activation='linear'))

    # Define the learning rate as hyperparameter
    lr = hp.Float("lr",min_value=1e-4,max_value=1e-2,sampling="log")
    # Compile model
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=lr),loss="mse",metrics=["mae"])
    return model

### Model 2 - CNN Regression

### Model 3 - FFNN Classification

### Model 4 - CNN Classification

## Hyperparameter optimization

In [15]:
hp = keras_tuner.HyperParameters()
ffnn_regression(hp)

2024-01-29 17:57:14.960460: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-29 17:57:15.140356: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-29 17:57:15.140711: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

NameError: name 'input_shape' is not defined

### Algorithm 1 - RandomSearch

In [None]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_mae",
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="results/"+DATA_FOLDER
    project_name="first_test"
    max_consecutive_failed_trials=3
)

tuner.search(X_Train,Y_Train,epochs=2,
            validation_data=(X_Valid,Y_Valid),
            callbacks=[keras.callbacks.TensorBoard("results/"+DATA_FOLDER+"tb_mae2")])

### Algorithm 2 - GridSearch

### Algorithm 3 - Chained GridSearch