# Stage 4 AI training: Global hyperparameters

This stage focuses on selecting the **optimal model's global hyperparameters**: activation function, optimizer and learning rate. Different rotational frames will be used for every configuration.

In [None]:
# Import packages:
import MLQDM.MLmodel as ML_MLmodel
import MLQDM.timewindows as ML_twdw
import tensorflow as tf

# Check available GPU:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

## Load original data and general parameters
Load data from files (many segments) and store information in dataframes, one for each segment. The original data is in the **Laboratory rotational frame (RF1)**.

There are two possible sets of target labels, coming for the 'linear approximation' of 'physical model' approaches regarding the interpolated positions. You must choose one as {interp} for the training stage:
* **'lin_approx'** : linear approximation.
* **'phys_model'** : physical model based on acceleration profile.

In [None]:
# Choose Z-position interpolation method:
interp = 'lin_approx' # 'phys_model' or 'lin_approx'

# Prepare files information:
data_path =  'Data/Final_t_BxByBz_zAut_LabFrame/' # Datafiles path
gen_pars_path = 'ML_parameters/'

# Load data, general hyperparameters and rotational frames:
data, hypers, RFs = ML_MLmodel.load_data_and_gen_pars(
    data_path,gen_pars_path,interp=interp,final_stage=False)

## Generate time windows

### Load original data

Each data segment is processed into time windows, which must have a fixed time length (or equivalently, fixed number of points) by default. However, as the original data is stored in the Time_Wdw object, it can be reshaped later.

The distribution of training and testing datasets is chosen here. The validation dataset is included within the training dataset.

In [None]:
# Prepare time windows:
wdw_pp = 40
train_segm = [0,2,4] if interp == 'phys_model' else [2,3,4]
t_wdws_train, t_wdws_test = ML_twdw.prepare_time_windows(
    data,wdw_pp,train_segm=train_segm,
    plot_instances=True,instances=10,start_wdw=570,stride_pp=50)

### Global hyperparameters

#### Previous stages

From Stage 1 analysis, we've determined that only the full vector data, meaning all three $(Bx,By,Bz)$ magnetic components, is robust against rotations and has about 90% accuracy for position predictions using a 1-meter threshold.

From Stage 2, we've determined that increasing the number of time window points is associated with better ML performance, up to a certain limit. Using 40 points (equal to 4s) is a good compromise between performance and complexity of the ML model. We also proved that for time windows longer than 2s (20 points), the ML model works much better if Convolutional Neural Networks (CNN) are combined with Dense Neural Networks (DNN).

From Stage 3, we've determined the best main architecture for the ML algorithm: 

* Convolutional block: One-dimensional Convolutional layers (filter,kernel): [32,16] + [32,4]
* Pooling layers: None
* 1D-Conversion layer: Flattening
* Dense layers (neurons): [1024] + [512]

#### Current stage

In this stage, we explore the ML model's global hyperparameters: learning rate, optimizer and activation function.

* Activation_Function = ['relu','elu','tanh']
* Optimizer = ['adam', 'adadelta', 'adamax']
* Learning_Rate = [5e-2,1e-1,5e-3,1e-3,5e-4,1e-4,5e-5,1e-5]

In [None]:
# Define additional hyperparameters:
extra_hypers = {
    "Magnetic_Components": ['Bx','By','Bz'],
    "Time_Window_pp": wdw_pp,
    "Dropout_Fraction": 0,
    "Convolutional_Network": True,
    "Conv_Layers": [[32,8],[32,4]],
    "Pool_Layers": [None,None],
    "Dens_Layers": [1024,512],
    "Flatten_Average": True,
    "Dropout_Fraction": 0,
    "Model_Name": "S4_C16_C4_NP_Flatten_D1024_D512",
}

# Options for hyper-parameters:
activ_opts = ['tanh'] #['relu','elu','tanh']
optim_opts = ['adam'] #['adam', 'adadelta', 'adamax']
lr_opts = [
    1e1,
    1e-0,
    1e-1,
    1e-2,
    1e-3,
    1e-4,
    1e-5,
    1e-6,
    1e-7,
    1e-8,
    1e-9,
    1e-10,
    1e-11,
    1e-12
    ]

# Options for seeds:
seed_opts = [0,1,2]

# Combine all general hyperparameters:
gen_hyps = hypers | extra_hypers

# Prepare rotational frame options:
RF_opts = list(RFs.values())

In [None]:
# Prepare file path to export results:
results_path = f'Results_{interp}/Train_s1s4s5_Test_s2s3/'
# Prepare file path to check on already trained models and avoid repetitions:
check_rep_model = f'Results_{interp}/Train_s1s4s5_Test_s2s3/Stage4_{interp}_all_Train_s1s4s5_Test_s2s3.csv'

# Train all models:
df_results = ML_MLmodel.train_stage4(
    activ_opts,optim_opts,lr_opts,gen_hyps,RF_opts,
    t_wdws_train,t_wdws_test,seed_opts,seed_opts,
    results_path=results_path,interpolation=interp,
    check_rep_model=check_rep_model,quick_timing_test=False
)