### L2 regularisation experiment

This notebook runs an experiment to understant the effect of the L2 regularisation on the predicitons of matfact.  
The risk state are labeled with integers from 1 to 4: [1: Normal, 2: LowRisk, 3: HighRisk, 4: Cancer]  
Since the data is highly imbalanced towards Normal and Low risk states, there are a majority of 1 and 2 labels in the datasets.  
L2 regularisation on both U and V might promote lower values (Labels 1 and 2) in M.  

This experiment logs with mlflow the matfact results with increasing regularisation parameters for U and V with a synthetic dataset.  
Then, using the same dataset, the distribution of the labels is inverted so that the higher risk is represented by labels 1 and 2 and the lower risks by labels 3 and 4. This way the imbalance is also inverted having a mayority of labels 4 and 3.  
The results of the matfact are also logged to be later compared with visualisations.  


While running the experiment the confusion matrix for each different combination of regularisation parameters are generated and saved into an image in the results directory. 
The rest of the visualisations (matthew, accuracy, recall, precision) are generated at the end and saved in the same directory.

In [1]:
from l2_regularization.l2_reg_experiment import run_l2_regularization_experiments
from l2_regularization.l2_reg_experiment import MATFACT_ALS, SKLEARN_NMF, SKLEARN_DL, SKLEARN_TSVD

In [None]:
# Run experiments with increasing parameters for the U and V l2 regularisations.
lambda_values = [0, 21, 189] # [0, 18, 63, 189]  # [int(1e12)]
lambda_values_l1 = [0.005, 0.01, 0.015] # [0, 0.005, 0.01, 0.015] 
run_l2_regularization_experiments(
    lambda_values, 
    experiment_name=MATFACT_ALS+"_l1", 
    model_type=MATFACT_ALS, 
    U_l1_regularization=True, 
    lambda_values_l1=lambda_values_l1,
    N=100, T=200, rank=5, sparsity=90,
)

In [None]:
# Run experiments with increasing parameters for the U and V l2 regularisations.
lambda_values = [0, 21, 189] # [0, 18, 63, 189]  # [int(1e12)]
lambda_values_l1 = [0.005, 0.01, 0.015] # [0, 0.005, 0.01, 0.015] 
run_l2_regularization_experiments(
    lambda_values, 
    experiment_name=MATFACT_ALS+"_l2", 
    model_type=MATFACT_ALS, 
    lambda_values_l1=lambda_values_l1
)

In [None]:
# Run experiments for sklearn non-negative matrix factorization
lambda_values = [0,9,18,21,63,126,189]  # [0, 18, 63, 189]  # [int(1e12)]  #
run_l2_regularization_experiments(lambda_values, model_type=SKLEARN_NMF)  # 

In [None]:
import mlflow
from mlflow import MlflowClient
from mlflow.entities import ViewType
print(mlflow.get_tracking_uri())

# Check existing experiments (remove if necessary)
client = MlflowClient()
for e in client.search_experiments(ViewType.ALL):
    print(e)
    client.delete_experiment(e.experiment_id)

! rm -r mlruns/.trash/*
# ! rm -r ../results/*


In [None]:
from mlflow import MlflowClient

experiments_dict = {}
for e in MlflowClient().search_experiments():
    experiments_dict[e.experiment_id] = e.name
    print(f"{e.experiment_id}: {e.name}")

In [None]:
from mlflow import MlflowClient

# Clean experiments?
client = MlflowClient()
exp_ids = experiments_dict.keys()  # [""]
for exp_id in exp_ids:
    client.delete_experiment(exp_id)
    ! rm -r mlruns/.trash/*$exp_id*

In [None]:
results_dirs =  ! ls ../results/
print(results_dirs)

In [None]:
rm_results_dirs = []
for dir in rm_results_dirs:
    ! rm -r ../results/$dir