### L2 regularisation experiment

This notebook runs an experiment to understant the effect of the L2 regularisation on the predicitons of matfact.  
The risk state are labeled with integers from 1 to 4: [1: Normal, 2: LowRisk, 3: HighRisk, 4: Cancer]  
Since the data is highly imbalanced towards Normal and Low risk states, there are a majority of 1 and 2 labels in the datasets.  
L2 regularisation on both U and V might promote lower values (Labels 1 and 2) in M.  

This experiment logs with mlflow the matfact results with increasing regularisation parameters for U and V with a synthetic dataset.  
Then, using the same dataset, the distribution of the labels is inverted so that the higher risk is represented by labels 1 and 2 and the lower risks by labels 3 and 4. This way the imbalance is also inverted having a mayority of labels 4 and 3.  
The results of the matfact are also logged to be later compared with visualisations.  


While running the experiment the confusion matrix for each different combination of regularisation parameters are generated and saved into an image in the results directory. 
The rest of the visualisations (matthew, accuracy, recall, precision) are generated at the end and saved in the same directory.

In [1]:
from l2_regularization_experiments import run_l2_regularization_experiments

# Run experiments with increasing parameters for the U and V l2 regularisations.
lambda_values = [0, 10]  # [0,3,9,18,21,63,126,189]  # [0]  # , int(1e12)]  #
run_l2_regularization_experiments(lambda_values)

INFO:root:normal dataset X histogram: [(0, 10000), (1, 506488), (2, 452938), (3, 29701), (4, 873)]
INFO:root:normaldataset M histogram: [(1, 581153), (2, 404043), (3, 14391), (4, 413)]
INFO:root:inverted dataset X histogram: [(0, 10000), (1, 873), (2, 29701), (3, 452938), (4, 506488)]
INFO:root:inverteddataset M histogram: [(1, 413), (2, 14391), (3, 404043), (4, 581153)]
 10%|█         | 210/2000 [00:04<00:34, 51.14it/s]
 10%|█         | 210/2000 [00:04<00:35, 50.87it/s]
 10%|█         | 210/2000 [00:03<00:33, 52.82it/s]
 10%|█         | 210/2000 [00:04<00:34, 51.58it/s]
 10%|█         | 210/2000 [00:04<00:36, 49.37it/s]
 10%|█         | 210/2000 [00:04<00:40, 44.26it/s]
 10%|█         | 210/2000 [00:04<00:35, 50.77it/s]
 10%|█         | 210/2000 [00:04<00:35, 49.75it/s]


In [None]:
from mlflow import MlflowClient

for e in MlflowClient().search_experiments():
    print(e.experiment_id, ":",e.name)

In [16]:
import matplotlib.pyplot as plt
from pathlib import Path

image_format = "jpg"
image_dir = Path("/Users/martaq/Develop/decipher/matfact/results/figures")
experiment_list = ["471306282903400606","311848354679767989"]
image_sub_path = f"U0_V0/confusion_.{image_format}"
image_paths = [image_dir / subdir / image_sub_path for subdir in experiment_list]

fig, axs = plt.subplots(1, 2, figsize=(10,5))
for i, image_path in enumerate(image_paths):
    image = plt.imread(image_path)
    axs[i].imshow(image)
    axs[i].axis('off')
    axs[i].set_title(image_path.parent.name)
fig.savefig("/Users/martaq/Develop/decipher/matfact/results/test_2.jpg") # , format='svg', dpi=1200)

In [141]:
import numpy as np
import math
def fetch_probabilities(lambda_values):
    size = len(lambda_values)
    p = [i**2 for i in range(1, math.floor(size/2) + 1)] + [i**2 for i in range(math.ceil(size/2), 0, -1)]
    print(len(p))
    return [i/sum(p) for i in p]

def fetch_lambda_samples(lambda_values, sample_num):
    lambda_samples = []
    if sample_num == len(lambda_values):
        lambda_samples =  lambda_values
    else:
        lambda_samples.append(lambda_values[0])
        p = fetch_probabilities(lambda_values[1:-1])
        lambda_samples.extend(sorted(np.random.choice(lambda_values[1:-1], sample_num - 2, p=p,replace=False)))
        lambda_samples.append(lambda_values[-1])
    return lambda_samples

In [147]:
lambda_values = [0,9,18,21,63,126,189]
sample_num = 7
fetch_lambda_samples(lambda_values, sample_num)

[0, 9, 18, 21, 63, 126, 189]

In [163]:
import pandas as pd

df = pd.DataFrame({
    "lambda1": [0, 0, 0, 3, 3, 3, 8, 8, 8], 
    "lambda2": [0, 3, 8, 0, 3, 8, 0, 3, 8],
    "artifac": [1, 1, 1, 1, 1, 1, 1, 1, 1],
    })

df[((df["lambda1"]==0)&(df["lambda2"]==8))].iat[0,-1]

1

In [2]:
import mlflow
from mlflow import MlflowClient
from mlflow.entities import ViewType
print(mlflow.get_tracking_uri())

# Check existing experiments (remove if necessary)
client = MlflowClient()
for e in client.search_experiments(ViewType.ALL):
    print(e)
    client.delete_experiment(e.experiment_id)

! rm -r mlruns/.trash/*
! rm -r ../results/figures/*

# Clean experiments?
# client = MlflowClient()
# client.delete_experiment("658405861631685322")

file:///Users/martaq/Develop/decipher/matfact/experiments/mlruns
<Experiment: artifact_location='file:///Users/martaq/Develop/decipher/matfact/experiments/mlruns/395462898655105160', creation_time=1672233095448, experiment_id='395462898655105160', last_update_time=1672233095448, lifecycle_stage='active', name='exp_inverted_221228_141135', tags={}>
<Experiment: artifact_location='file:///Users/martaq/Develop/decipher/matfact/experiments/mlruns/441819387276405595', creation_time=1672233087936, experiment_id='441819387276405595', last_update_time=1672233087936, lifecycle_stage='active', name='exp_normal_221228_141127', tags={}>
<Experiment: artifact_location='file:///Users/martaq/Develop/decipher/matfact/experiments/mlruns/738801988578485607', creation_time=1672232983874, experiment_id='738801988578485607', last_update_time=1672232983874, lifecycle_stage='active', name='exp_inverted_221228_140943', tags={}>
<Experiment: artifact_location='file:///Users/martaq/Develop/decipher/matfact/expe

In [7]:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(5,2)

In [4]:
len(axs)

2