## MIPHA proof of concept

The purpose of this notebook is to test the MIPHA framework with pre-extracted data. It was used to debug the [MIPHA framework](https://github.com/snowhawkeye/mipha), and as such is not as well-documented as the rest of the experiments.



In [None]:
import sys
from importlib import reload

import pandas as pd
from mipha.framework import MiphaPredictor
import code.models.mipha_poc_implementation as impl  # if the import is not done this way, saving to pickle does not work
from code.features.mipha_poc_datasource import Stage5CkdData
from code.datasets.mimic_dataset import MimicDataset

In [None]:
reload(sys.modules['code.models.mipha_poc_implementation'])
reload(sys.modules['code.features.mipha_poc_datasource'])
reload(sys.modules['code.datasets.mimic_dataset'])

### Framework implementation

We test a simple implementation of the framework applied to the prediction of stage 5 CKD (using a year of history for a prediction up to 15 months in advance).
The data sources used in this example are:
- The evolution of creatinine over time.
- The age and gender of the patient.

The framework is implemented as such:
- Feature extraction for the first data source is performed using the `tsfel` package.
- Aggregation is a simple concatenation of the extracted features.
- The machine learning model is a simple CNN.

In [None]:
## Data setup
mimic_dataset_config_path = "config/mimic_dataset.mipha.json"
poc_data_config_path = "config/mipha_poc_data.mipha.json"

In [None]:
# Uncomment to generate config files
# MimicDataset.create_config_file(mimic_dataset_config_path)
# Stage5CkdData.create_config_file(poc_data_config_path)

In [None]:
dataset = MimicDataset.from_config_file(mimic_dataset_config_path)
data = Stage5CkdData.from_config_file(dataset=dataset, config_path=poc_data_config_path)

In [None]:
data_sources_train, labels_train, data_sources_test, labels_test = data.load_stage_5_ckd(random_state=25)

In [None]:
mipha = MiphaPredictor(
    feature_extractors=[
        impl.BiologyFeatureExtractor(component_name="BiologyFeatureExtractor", managed_data_types=["Creatinine"]),
        impl.DemographicsFeatureExtractor(component_name="BiologyFeatureExtractor",
                                          managed_data_types=["Demographics"]),
    ],
    aggregator= impl.SimpleAggregator(),
    model= impl.SimpleCnnModel(rows=1, columns=142, output_dim=1, n_filters=3),
    # input dimensions are picked for the aggregator, output is binary
    evaluator= impl.SimpleEvaluator(),
)

In [None]:
mipha.fit(data_sources_train, labels_train, epochs=3)

In [None]:
mipha.evaluate(data_sources=data_sources_test, test_labels=pd.DataFrame(labels_test), threshold=0.5)

In [None]:
from datetime import datetime

now = datetime.now()
formatted_time = now.strftime("%Y-%m-%d_%H-%M-%S")
file_path = f"out/mipha_real_data"
mipha.save(file_path)
data.save_pickle(mipha.last_computed_features, "out/mipha_computed_features.pkl")

In [None]:
# Reusing a feature extraction
mipha_loaded = mipha.load("out/mipha_real_data.zip")
precomputed_features = data.load_pickle("out/mipha_computed_features.pkl")

In [None]:
mipha.evaluate(data_sources_test, labels_test, precomputed_features=precomputed_features)