Working with the med3pa Subpackage
-----------------------------------------

This tutorial guides you through the process of setting up and running comprehensive experiments using the `med3pa` subpackage. It includes steps to execute MED3pa experiment with `Med3paExperiment` and the combination of MED3pa and Detectron using `Med3paDetectronExperiment`.


## Running the MED3pa Experiment


### Step 1: Setting up the Datasets
First, configure the `DatasetsManager`. In the case of MED3pa only experiment you only need to set the DatasetManager with either `testing` and `reference` dataset:


In [1]:
import sys
import os

sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))

from MED3pa.datasets import DatasetsManager

# Initialize the DatasetsManager
datasets = DatasetsManager()

# Load datasets for reference, and testing
datasets.set_from_file(dataset_type="reference", file='./data/test_data.csv', target_column_name='Outcome')
datasets.set_from_file(dataset_type="testing", file='./data/test_data_shifted_0.6.csv', target_column_name='Outcome')


## Step 2: Configuring the Model
Next, utilize the `ModelFactory` to load a pre-trained model, and set it as the base model for the experiment. Alternatively, you can train your own model and use it:


In [2]:
from MED3pa.models import BaseModelManager, ModelFactory

# Initialize the model factory and load the pre-trained model
factory = ModelFactory()
model = factory.create_model_from_pickled("./models/diabetes_xgb_model.pkl")

# Set the base model using BaseModelManager
base_model_manager = BaseModelManager()
base_model_manager.set_base_model(model=model)


## Step 3: Running the MED3pa Experiment
Execute the MED3PA experiment with the specified datasets and base model. You can also specify other parameters as needed. See the documentation of the subpackage for more information about the parameters.

The experiment outputs two structure one for the reference set and the other for the testing set, both containing files indicating the extracted profiles at different declaration rates, the performance of the model on these profiles..etc.


In [3]:
from MED3pa.med3pa import Med3paExperiment
from MED3pa.med3pa.uncertainty import AbsoluteError

# Define parameters for the experiment
ipc_params = {'n_estimators': 100}
apc_params = {'max_depth': 3}
med3pa_metrics = ['Auc', 'Accuracy', 'BalancedAccuracy']

# Execute the MED3PA experiment
results = Med3paExperiment.run(
                                datasets_manager=datasets,
                                base_model_manager=base_model_manager,
                                uncertainty_metric="absolute_error",
                                ipc_type='RandomForestRegressor',
                                ipc_params=ipc_params,
                                apc_params=apc_params,
                                samples_ratio_min=0,
                                samples_ratio_max=50,
                                samples_ratio_step=5,
                                med3pa_metrics=med3pa_metrics,
                                evaluate_models=True,
                                models_metrics=['MSE', 'RMSE']
                                )


Running MED3pa Experiment on the reference set:
IPC Model training completed.
APC Model training completed.
Confidence scores calculated for minimum_samples_ratio =  0
Results extracted for minimum_samples_ratio =  0
Confidence scores calculated for minimum_samples_ratio =  5
Results extracted for minimum_samples_ratio =  5
Confidence scores calculated for minimum_samples_ratio =  10
Results extracted for minimum_samples_ratio =  10
Confidence scores calculated for minimum_samples_ratio =  15
Results extracted for minimum_samples_ratio =  15
Confidence scores calculated for minimum_samples_ratio =  20
Results extracted for minimum_samples_ratio =  20
Confidence scores calculated for minimum_samples_ratio =  25
Results extracted for minimum_samples_ratio =  25
Confidence scores calculated for minimum_samples_ratio =  30
Results extracted for minimum_samples_ratio =  30
Confidence scores calculated for minimum_samples_ratio =  35
Results extracted for minimum_samples_ratio =  35
Confiden

## Step 4: Analyzing and Saving the Results
After running the experiment, you can analyze and save the results using the returned `Med3paResults` instance:


In [4]:
# Save the results to a specified directory
results.save(file_path='./med3pa_experiment_results')


## Running the MED3pa and Detectron Experiment
You can also run an experiment that combines the forces of Detectron in covariate shift detection with MED3pa problematic profiles extraction using the `Med3paDetectronExperiment` class. To be able to run this experiment, all datasets of the `DatasetsManager` should be set, alongside the ``BaseModelManager``. This experiment will run MED3pa experiment on the `testing` and `reference` sets and then run the `detectron` experiment on the `testing` set as a whole, and then on the **extracted profiles** from MED3pa:


In [5]:
# We will add the training and validation set to the datasets
datasets.set_from_file(dataset_type="training", file='./data/train_data.csv', target_column_name='Outcome')
datasets.set_from_file(dataset_type="validation", file='./data/val_data.csv', target_column_name='Outcome')


In [6]:
from MED3pa.med3pa import Med3paDetectronExperiment
from MED3pa.detectron.strategies import EnhancedDisagreementStrategy

# Execute the integrated MED3PA and Detectron experiment
med3pa_results, detectron_results = Med3paDetectronExperiment.run(
    datasets=datasets,
    base_model_manager=base_model_manager,
    uncertainty_metric="absolute_error",
    samples_size=20,
    ensemble_size=10,
    num_calibration_runs=100,
    patience=3,
    test_strategies="enhanced_disagreement_strategy",
    allow_margin=False,
    margin=0.05,
    ipc_params=ipc_params,
    apc_params=apc_params,
    samples_ratio_min=0,
    samples_ratio_max=50,
    samples_ratio_step=5,
    med3pa_metrics=med3pa_metrics,
    evaluate_models=True,
    models_metrics=['MSE', 'RMSE']
)


Running MED3pa Experiment on the reference set:
IPC Model training completed.
APC Model training completed.
Confidence scores calculated for minimum_samples_ratio =  0
Results extracted for minimum_samples_ratio =  0
Confidence scores calculated for minimum_samples_ratio =  5
Results extracted for minimum_samples_ratio =  5
Confidence scores calculated for minimum_samples_ratio =  10
Results extracted for minimum_samples_ratio =  10
Confidence scores calculated for minimum_samples_ratio =  15
Results extracted for minimum_samples_ratio =  15
Confidence scores calculated for minimum_samples_ratio =  20
Results extracted for minimum_samples_ratio =  20
Confidence scores calculated for minimum_samples_ratio =  25
Results extracted for minimum_samples_ratio =  25
Confidence scores calculated for minimum_samples_ratio =  30
Results extracted for minimum_samples_ratio =  30
Confidence scores calculated for minimum_samples_ratio =  35
Results extracted for minimum_samples_ratio =  35
Confiden

running seeds: 100%|██████████| 100/100 [00:20<00:00,  4.94it/s]


Detectron execution on reference set completed.


running seeds: 100%|██████████| 100/100 [00:23<00:00,  4.33it/s]


Detectron execution on testing set completed.
Running Profiled Detectron Experiment:
Running Detectron on Profile: *, Age <= -0.8201243877410889 & BloodPressure <= 0.4599476158618927 & Glucose <= -0.06547205429524183


running seeds: 100%|██████████| 100/100 [00:22<00:00,  4.35it/s]


Detectron execution on reference set completed.


running seeds: 100%|██████████| 100/100 [00:12<00:00,  7.91it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age <= -0.8201243877410889 & BloodPressure <= 0.4599476158618927
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:15<00:00,  6.36it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age <= -0.8201243877410889
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:21<00:00,  4.70it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age > -0.8201243877410889 & BMI <= -0.550771951675415
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:22<00:00,  4.40it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age > -0.8201243877410889 & BMI > -0.550771951675415 & Glucose > -1.1373384594917297
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:28<00:00,  3.51it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age > -0.8201243877410889 & BMI > -0.550771951675415
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:29<00:00,  3.36it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, Age > -0.8201243877410889
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:27<00:00,  3.64it/s]


Detectron execution on testing set completed.
Running Detectron on Profile: *, 
Calibration record on reference set provided, skipping Detectron execution on reference set.


running seeds: 100%|██████████| 100/100 [00:25<00:00,  3.92it/s]


Detectron execution on testing set completed.


In [7]:
# Save the results to a specified directory
med3pa_results.save(file_path='./med3pa_detectron_experiment_results/')
detectron_results.save(file_path='./med3pa_detectron_experiment_results/detectron')
