# The Test Notebook
In this notebook, we will be exploring machine learning model training and evaluation using the `utils` module from the `machine_learning` package. We will be working with two types of tasks: regression and classification. This notebook serves to test whether the functions work, and to test new implementations.



## Importing Libraries and Modules
This cell is responsible for importing the necessary libraries and modules for the notebook. It imports `numpy` for numerical operations and `sys` for system-specific parameters and functions. It also appends the path to the codebase to the system path, which allows the notebook to import modules from that directory. Finally, it imports the `utils` module from the `machine_learning` package.


In [1]:
import numpy as np
import sys

#This should be the location where you have stored the codebase on your computer
sys.path.append('C:\\Users\\basvo\\Documents\\GitHub\\BV_codebase')

from machine_learning import utils

## Data Generation
This cell generates the data that will be used in the notebook. It creates a 2D numpy array `X` with 50 rows and 10 columns, filled with random floats from a normal distribution with a mean of 0 and a standard deviation of 1. It also creates two 1D numpy arrays `y` and `y_cont`. `y` is filled with random integers that are either 0 or 1, and `y_cont` is filled with random floats from a normal distribution with a mean of 0 and a standard deviation of 1.


In [2]:
# Generate a 2D numpy array of 50 rows and 10 columns
# filled with random floats with a mean of 0 and an sd of 1
X = np.random.normal(0, 1, (50, 10))

# Generate a numpy vector of length 50 that has random integers that are either 0 or 1
y = np.random.randint(2, size=50)
y_cont = np.random.normal(0, 1, size=50)


## Regression Model Training and Evaluation
This cell defines parameters for the `perform_stability_runs_regression` function from the `utils` module and calls this function. The function performs stability runs for a regression task. It splits the data into training and testing sets multiple times, fits the model, makes predictions, and calculates scores. The results are stored in a dictionary and returned. The `verbosity` parameter controls the level of output printed to the console.


In [3]:
method_params = {
    'algo':'xtr',
    'task':'regression',
    'imp_algo': utils.get_perm_imp,
}

results_dict = utils.perform_stability_runs_regression(
    X, 
    y_cont, 
    test_size=0.2,
    n_splits=5,
    method_params=method_params, 
    return_importances=True, 
    verbosity=2
    )

train_scores: 0.9057176763762202
val_scores: 1.029776826334084
test_scores: 0.9196891642457506

getting permutation importances


100%|██████████| 10/10 [00:00<00:00, 6654.46it/s]


shuffle 1 done
train_scores: 0.9057176763762202 +\- 0.0
val_scores: 1.029776826334084 +\- 0.0
test_scores: 0.9196891642457506 +\- 0.0



## Classification Model Training and Evaluation
This cell defines parameters for the `perform_stability_runs_classification` function from the `utils` module and calls this function. The function performs stability runs for a classification task. It splits the data into training and testing sets multiple times, fits the model, makes predictions, and calculates scores. The results are stored in a dictionary and returned. The `verbosity` parameter controls the level of output printed to the console.


In [None]:
method_params = {
    'algo':'xtr',
    'task':'classify'
}

results_dict = utils.perform_stability_runs_classification(
    X, 
    y, 
    test_size=0.2,
    n_splits=5,
    method_params=method_params, 
    verbosity=1
    )

shuffle 1 done
train_scores: 0.9323308270676691 +\- 0.0
val_scores: 0.7653061224489797 +\- 0.0
test_scores: 0.32 +\- 0.0

shuffle 2 done
train_scores: 0.9298245614035088 +\- 0.0025062656641603454
val_scores: 0.685374149659864 +\- 0.07993197278911568
test_scores: 0.52 +\- 0.20000000000000007

shuffle 3 done
train_scores: 0.9055973266499583 +\- 0.03432353982318149
val_scores: 0.67989417989418 +\- 0.06572270272317556
test_scores: 0.6266666666666666 +\- 0.22231109334044089

shuffle 4 done
train_scores: 0.8953634085213033 +\- 0.034608933532936016
val_scores: 0.6867913832199547 +\- 0.05815771213760402
test_scores: 0.62 +\- 0.19287301521985908

shuffle 5 done
train_scores: 0.8902255639097744 +\- 0.03261613731327806
val_scores: 0.6868480725623584 +\- 0.05201796266281966
test_scores: 0.64 +\- 0.17708754896942921

