### Demo how to use the training module for classification task

This notebook works with the processed dataset for classification task and shows how to:

- Instance models
- Training models with different strategies
- Evaluate the trained models
- Export models
- Load and use models

### Loading modules

In [1]:
import pandas as pd
import sys
sys.path.insert(0, "../src/")

from predictive_models.ClassificationModels import ClassificationModels
from utils.utilsLib import *

### Preparing dataset

- Read the processed dataset
- Separate dataset values of response column
- Split dataset into train and independent test

In [2]:
df_data = pd.read_csv("../processed_datasets/embedding_class_dataset.csv")
df_data.head(5)

Unnamed: 0,p_1,p_2,p_3,p_4,p_5,p_6,p_7,p_8,p_9,p_10,...,p_312,p_313,p_314,p_315,p_316,p_317,p_318,p_319,p_320,hl_category
0,0.215133,0.051171,0.247454,0.012051,-0.011711,-0.060513,-0.231637,-0.205868,-0.097614,-0.136191,...,0.085986,0.267091,0.040721,-0.056852,0.00095,0.046941,0.384855,0.065653,-0.079415,Medium
1,0.094518,-0.145713,0.224664,0.146596,-0.001474,-0.078727,-0.084605,-0.004563,0.030208,-0.152848,...,0.041869,0.032089,-0.042898,0.17521,-0.079063,-0.155911,0.073967,0.077606,-0.04011,Medium
2,0.142003,0.053739,0.366839,-0.021242,-0.038941,-0.039594,-0.112769,-0.15334,-0.11487,-0.163018,...,0.217359,0.107217,-0.161089,-0.012996,-0.101987,-0.002924,0.292161,0.105018,-0.149086,Medium
3,0.115404,-0.105004,0.129293,-0.102546,0.089786,-0.087965,-0.083683,-0.04868,-0.119285,-0.175827,...,0.066691,0.031502,-0.096098,0.023415,-0.145263,-0.06125,0.234779,0.020759,-0.252812,Medium
4,0.00448,-0.140809,0.003605,-0.14972,0.044082,-0.100952,-0.041461,-0.053537,-0.162078,-0.085417,...,0.093655,-0.019685,-0.010657,-0.028408,-0.211371,-0.111117,0.21265,-0.010299,-0.228238,Medium


In [3]:
dataset = df_data.drop(columns=["hl_category"]).values
response = df_data["hl_category"].values

In [4]:
X_train, X_test, y_train, y_test = applySplit(dataset, response, random_state=42, test_size=0.1)

### Instance model

- Instance the ClassificationModels class and call one of the method to instance a classification model

In [5]:
class_model = ClassificationModels(
    dataset, 
    response, 
    test_size=0.2, 
    random_state=42
)

class_model.instanceRandomForest()

### Training process

- Prepare dataset for input process (training and validation datasets division)
- Train the model
- Eval the model using the classification metrics

The performances of the models will be save in the attribute performances of the ClassificationModels object

In [6]:
class_model.processModel()
class_model.performances

{'Accuracy': 0.6846846846846847,
 'Precision': 0.6379549074140104,
 'Recall': 0.6846846846846847,
 'F1-score': 0.6441840223423488,
 'MCC': 0.18964179616875715,
 'Confusion Matrix': [[0.24675324675324675,
   0.7402597402597403,
   0.012987012987012988],
  [0.054838709677419356, 0.8903225806451613, 0.054838709677419356],
  [0.03508771929824561, 0.8070175438596491, 0.15789473684210525]]}

### Export model

In [7]:
class_model.exportModel(name_export="../demo_trained_models/rf_class_demo.joblib")

### Load and use the model

In [8]:
class_model.loadModel(name_model="../demo_trained_models/rf_class_demo.joblib")
predictions_model = class_model.makePredictionsWithModel(X_test)
performances_test = class_model.evalModel(y_true=y_test, y_pred=predictions_model)
performances_test

{'Accuracy': 0.6936936936936937,
 'Precision': 0.6584084084084085,
 'Recall': 0.6936936936936937,
 'F1-score': 0.6597392847392847,
 'MCC': 0.2314682445452855,
 'Confusion Matrix': [[0.30303030303030304, 0.696969696969697, 0.0],
  [0.07096774193548387, 0.8838709677419355, 0.04516129032258064],
  [0.029411764705882353, 0.7647058823529411, 0.20588235294117646]]}