## This notebook is used for per-scenario evaluation of the hybrid model.

In [1]:
from stesml.model_tools import train_and_validate_hybrid_model
from stesml.stes_model import stes_model
from stesml.model_tools import analyze_CV_results

  from pandas import MultiIndex, Int64Index


#### Get the data directory, XGBoost and Neural Network model parameters, and hybrid split time

In [2]:
data_dir = "../data/Sulfur_Models/heating/full_runs"
parameters_xgb = stes_model.get_parameters('XGBoost')
parameters_nn = stes_model.get_parameters('NN', truncated=True)

#### Set necessary parameters

In [3]:
hybrid_split_time = 360 # Transition time between predictions and calculations for sulfur average temperature
n_repeats = 20 # Number of repeats for five-fold CV
random_state = 7 # This is the seed for the CV ssplits. Set this to -1 to use a random seed.
features=["flow-time", "Tw", "Ti"] # Input features

#### Train and validate hybrid model & return addendum for each CV split

In [4]:
addenda = train_and_validate_hybrid_model(data_dir=data_dir, parameters_xgb=parameters_xgb, parameters_nn=parameters_nn, n_repeats=n_repeats, random_state=random_state, hybrid_split_time=hybrid_split_time, features=features)

../data/Sulfur_Models/heating/full_runs
                                             filepath
0   ../data/Sulfur_Models/heating/full_runs/ML_540...
1   ../data/Sulfur_Models/heating/full_runs/ML_640...
2   ../data/Sulfur_Models/heating/full_runs/ML_640...
3   ../data/Sulfur_Models/heating/full_runs/ML_600...
4   ../data/Sulfur_Models/heating/full_runs/ML_500...
..                                                ...
82  ../data/Sulfur_Models/heating/full_runs/ML_520...
83  ../data/Sulfur_Models/heating/full_runs/ML_620...
84  ../data/Sulfur_Models/heating/full_runs/ML_660...
85  ../data/Sulfur_Models/heating/full_runs/ML_480...
86  ../data/Sulfur_Models/heating/full_runs/ML_460...

[87 rows x 1 columns]


2022-09-21 00:40:00.442164: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-21 00:40:00.520056: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)


Epoch 1/6
Epoch 2/6
Epoch 3/6
Parameters: { "num_boost_round" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	val-rmse:31.83760
[20]	val-rmse:2.84519
[40]	val-rmse:2.23965
[60]	val-rmse:2.00784
[80]	val-rmse:1.94801
[100]	val-rmse:1.87551
[120]	val-rmse:1.83723
[140]	val-rmse:1.82046
[160]	val-rmse:1.79430
[180]	val-rmse:1.76174
[200]	val-rmse:1.74515
[220]	val-rmse:1.73772
[240]	val-rmse:1.73130
[249]	val-rmse:1.73125
RMSE: 1.7312551, R2: 0.9830370
Split #0, XGB h RMSE: 1.731255, XGB h RMSE Average: 1.731255
Split #0, XGB h R2: 0.983037, XGB h R2 Average: 0.983037
RMSE: 0.3320250, R2: 0.9999636
Split #0, NN Tavg RMSE: 0.332025, NN Tavg RMSE Average: 0.332025
Split #0, NN Tavg R2: 0.999964, NN Tavg R2 Average: 0.999964
RMSE: 54.9887868, R2: 0.0


KeyboardInterrupt



#### Break out addenda for Neural Network, XGBoost, and hybrid models
The NN model is trained to predict T for t <= 360, the XGBoost model is trained to predict h for t >= 360

The hybrid model combines NN model predictions of T with calculations of T from the XGBoost predictions of h
to get T for all time.

In [9]:
addenda_NN = list()
addenda_XGB = list()
addenda_Hybrid = list()
for addenda_composite in addenda:
    addenda_NN.append(addenda_composite['NN'])
    addenda_XGB.append(addenda_composite['XGBoost'])
    addenda_Hybrid.append(addenda_composite['Hybrid'])

#### Analyze CV results for Neural Network, XGBoost, and Hybrid models

`analyze_CV_results` saves a csv file with the per-scenario evaluation results. This function is specific to the representative set, and will need to be altered based on the training dataset.

In [10]:
analyze_CV_results(addenda_NN, t_max=hybrid_split_time, target='Tavg')

In [11]:
analyze_CV_results(addenda_XGB, t_min=hybrid_split_time, target='h')

In [12]:
analyze_CV_results(addenda_Hybrid, target='Tavg', hybrid=True)