# In-hospital Mortality Prediction

This notebook showcases in-hospital mortality prediction due to heart failure on a subset of the MIMIC-III dataset using CyclOps.

## Import Libraries

In [1]:
import copy
import inspect
import shutil

import plotly.express as px
import plotly.graph_objects as go
from datasets import Dataset
from datasets.features import ClassLabel
from kaggle.api.kaggle_api_extended import KaggleApi
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

from cyclops.data.slicer import SliceSpec
from cyclops.evaluate.fairness import FairnessConfig  # noqa: E402
from cyclops.evaluate.metrics import MetricCollection, create_metric
from cyclops.models.catalog import create_model
from cyclops.process.feature.feature import TabularFeatures
from cyclops.report import ModelCardReport
from cyclops.report.plot.classification import ClassificationPlotter
from cyclops.tasks.mortality_prediction import MortalityPredictionTask
from cyclops.utils.file import join, load_dataframe



CyclOps offers a package for documentation of the model through a model card. The `ModelCardReport` class is used to populate and generate the model card as an HTML file. The model card has the following sections:
- Model Details: This section contains descriptive metadata about the model such as the owners, version, license, etc.
- Model Parameters: This section contains the technical details of the model such as the model architecture, training parameters, etc.
- Considerations: This section contains descriptions of the considerations involved in developing and using the model such as the intended use, limitations, etc.
- Quantitative Analysis: This section contains the performance metrics of the model for different sets of the data and subpopulations.
- Explainaibility Analysis: This section contains the explainability metrics of the model.
- Fairness Analysis: This section contains the fairness metrics of the model.

We will use this to document the model development process as we go along and generate the model card at the end.

`The model card tool is a work in progress and is subject to change.`

In [2]:
report = ModelCardReport()

## Constants

In [3]:
DATA_DIR = "./data"
RANDOM_SEED = 85
NAN_THRESHOLD = 0.75
TRAIN_SIZE = 0.8

## Data Loading

Before starting, make sure to install the Kaggle API by running `pip install kaggle`. To use the Kaggle API, you need to sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (`https://www.kaggle.com/<username>/account`) and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials. Place this file in the location `~/.kaggle/kaggle.json` on your machine.

In [4]:
api = KaggleApi()
api.authenticate()
api.dataset_download_files(
    "saurabhshahane/in-hospital-mortality-prediction", path=DATA_DIR, unzip=True
)



In [5]:
df = load_dataframe(join(DATA_DIR, "data01.csv"), file_format="csv")
df

2023-07-25 13:14:32,898 [1;37mINFO[0m cyclops.utils.file - Loading DataFrame from ./data/data01.csv


2023-07-25 13:14:32,898 - Loading DataFrame from ./data/data01.csv


Unnamed: 0_level_0,ID,outcome,age,gendera,BMI,hypertensive,atrialfibrillation,CHD with no MI,diabetes,deficiencyanemias,...,Blood sodium,Blood calcium,Chloride,Anion gap,Magnesium ion,PH,Bicarbonate,Lactic acid,PCO2,EF
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,125047,0.0,72,1,37.588179,0,0,0,1,1,...,138.750000,7.463636,109.166667,13.166667,2.618182,7.230,21.166667,0.5,40.0,55
1,139812,0.0,75,2,,0,0,0,0,1,...,138.888889,8.162500,98.444444,11.444444,1.887500,7.225,33.444444,0.5,78.0,55
1,109787,0.0,83,2,26.572634,0,0,0,0,1,...,140.714286,8.266667,105.857143,10.000000,2.157143,7.268,30.571429,0.5,71.5,35
1,130587,0.0,43,2,83.264629,0,0,0,0,0,...,138.500000,9.476923,92.071429,12.357143,1.942857,7.370,38.571429,0.6,75.0,55
1,138290,0.0,75,2,31.824842,1,0,0,0,1,...,136.666667,8.733333,104.500000,15.166667,1.650000,7.250,22.000000,0.6,50.0,55
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,171130,0.0,62,1,25.516870,1,1,0,1,0,...,136.714286,10.271429,94.428571,20.142857,2.714286,,27.714286,,,40
2,101659,0.0,78,1,25.822710,0,1,0,1,0,...,135.680000,10.523529,101.720000,18.160000,2.012500,,20.480000,,,30
2,162069,0.0,85,2,23.891779,1,1,0,1,1,...,136.000000,8.466667,97.285714,14.000000,2.028571,,28.857143,,,55
2,120967,0.0,79,2,35.288554,0,0,1,1,1,...,140.000000,8.183333,104.000000,15.750000,2.090000,,24.375000,,,25


## Data Inspection and Preprocessing

#### Drop NaNs based on the `NAN_THRESHOLD`

In [6]:
null_counts = df.isnull().sum()[df.isnull().sum() > 0]
fig = go.Figure(data=[go.Bar(x=null_counts.index, y=null_counts.values)])

fig.update_layout(
    title="Number of Null Values per Column",
    xaxis_title="Columns",
    yaxis_title="Number of Null Values",
    height=600,
)

fig.show()

**Add the figure to the report**

We can use the log_plotly_figure method to add the figure to a section of the report. One can specify whether the figure should be interactive or not by setting the `interactive` parameter to `True` or `False` respectively. The default value is `True`. This
also affects the final size of the report. If the figure is interactive, the size of the report will be larger than if the figure is not interactive. 

In [7]:
report.log_plotly_figure(
    fig=fig,
    caption="Number of Null Values per Column",
    section_name="datasets",
    interactive=True,
)

In [8]:
thresh_nan = int(NAN_THRESHOLD * len(df))
df = df.dropna(axis=1, thresh=thresh_nan)
df = df.dropna(axis=0, subset=["outcome"])

#### Gender values

In [9]:
# Female: gender = 1
# Male: gender = 0
df = df.rename(columns={"gendera": "gender"})
df["gender"] = df["gender"].replace({1: 0, 2: 1})

In [10]:
fig = px.pie(df, names="gender")

fig.update_layout(
    title="Gender Distribution",
)

fig.show()

**Add the figure to the report**

In [11]:
report.log_plotly_figure(
    fig=fig,
    caption="Gender Distribution",
    section_name="datasets",
)

####  Age distribution

In [12]:
fig = px.histogram(df, x="age")
fig.update_layout(
    title="Age Distribution",
    xaxis_title="Age",
    yaxis_title="Count",
    bargap=0.2,
)

fig.show()

**Add the figure to the report**

In [13]:
report.log_plotly_figure(
    fig=fig,
    caption="Age Distribution",
    section_name="datasets",
)

#### Outcome distribution

In [14]:
df["outcome"] = df["outcome"].astype("int")

In [15]:
fig = px.pie(df, names="outcome")
fig.update_traces(textinfo="percent+label")
fig.update_layout(title_text="Outcome Distribution")
fig.update_traces(
    hovertemplate="Outcome: %{label}<br>Count: %{value}<br>Percent: %{percent}"
)
fig.show()

**Add the figure to the report**

In [16]:
report.log_plotly_figure(
    fig=fig,
    caption="Outcome Distribution",
    section_name="datasets",
)

In [17]:
class_counts = df["outcome"].value_counts()
class_ratio = class_counts[0] / class_counts[1]
class_ratio

6.39622641509434

From all the features in the dataset, we select 20 of them which was reported by [Li et al.](https://pubmed.ncbi.nlm.nih.gov/34301649/)  to be the most important features in this classification task. 

In [18]:
features_list = [
    "Anion gap",
    "Lactic acid",
    "Blood calcium",
    "Lymphocyte",
    "Leucocyte",
    "heart rate",
    "Blood sodium",
    "Urine output",
    "Platelets",
    "Urea nitrogen",
    "age",
    "MCH",
    "RBC",
    "Creatine kinase",
    "PCO2",
    "Blood potassium",
    "Diastolic blood pressure",
    "Respiratory rate",
    "Renal failure",
    "NT-proBNP",
]
features_list = sorted(features_list)

#### Identifying feature types

Cyclops `TabularFeatures` class helps to identify feature types, an essential step before preprocessing the data. Understanding feature types (numerical/categorical/binary) allows us to apply appropriate preprocessing steps for each type.

In [19]:
tab_features = TabularFeatures(
    data=df.reset_index(),
    features=features_list,
    by="ID",
    targets="outcome",
)
tab_features.types

{'Anion gap': 'numeric',
 'Platelets': 'numeric',
 'Respiratory rate': 'numeric',
 'MCH': 'numeric',
 'age': 'numeric',
 'Diastolic blood pressure': 'numeric',
 'NT-proBNP': 'numeric',
 'Renal failure': 'binary',
 'RBC': 'numeric',
 'Lymphocyte': 'numeric',
 'Creatine kinase': 'numeric',
 'Urine output': 'numeric',
 'heart rate': 'numeric',
 'Lactic acid': 'numeric',
 'PCO2': 'numeric',
 'Leucocyte': 'numeric',
 'Blood calcium': 'numeric',
 'Blood potassium': 'numeric',
 'Blood sodium': 'numeric',
 'outcome': 'binary',
 'Urea nitrogen': 'numeric'}

#### Creating data preprocessors

We create a data preprocessor using sklearn's ColumnTransformer. This helps in applying different preprocessing steps to different columns in the dataframe. For instance, binary features might be processed differently from numeric features.

In [20]:
numeric_transformer = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="mean")), ("scaler", MinMaxScaler())]
)

binary_transformer = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="most_frequent"))]
)

In [21]:
numeric_features = sorted((tab_features.features_by_type("numeric")))
numeric_indices = [
    df[features_list].columns.get_loc(column) for column in numeric_features
]
numeric_features

['Anion gap',
 'Blood calcium',
 'Blood potassium',
 'Blood sodium',
 'Creatine kinase',
 'Diastolic blood pressure',
 'Lactic acid',
 'Leucocyte',
 'Lymphocyte',
 'MCH',
 'NT-proBNP',
 'PCO2',
 'Platelets',
 'RBC',
 'Respiratory rate',
 'Urea nitrogen',
 'Urine output',
 'age',
 'heart rate']

In [22]:
binary_features = sorted(tab_features.features_by_type("binary"))
binary_features.remove("outcome")
binary_indices = [
    df[features_list].columns.get_loc(column) for column in binary_features
]
binary_features

['Renal failure']

In [23]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_indices),
        ("bin", binary_transformer, binary_indices),
    ],
    remainder="passthrough",
)

Let's document the dataset in the model card. This can be done using the `log_dataset` method, which takes the following arguments:
- description: A description of the dataset.
- citation: The citation for the dataset.
- link: A link to a resource for the dataset.
- license_id: The SPDX license identifier for the dataset.
- version: The version of the dataset.
- features: A list of features in the dataset.
- split: The split of the dataset (train, test, validation, etc.).
- sensitive_features: A list of sensitive features used to train/evaluate the model.
- sensitive_feature_justification: A justification for the sensitive features used to train/evaluate the model.

In [24]:
report.log_dataset(
    description="MIMIC-III is a large, freely-available database comprising of \
        deidentified health-related data associated with over forty thousand \
        patients who stayed in the ICU of the Beth Israel Deaconess Medical Center \
        between 2001 and 2012. The data includes vital sign measurements, medications, \
        laboratory measurements, imaging reports, and more.",
    citation=inspect.cleandoc(
        """
        @article{li2021prediction,
            title={Prediction model of in-hospital mortality in intensive care unit
             patients with heart failure: machine learning-based, retrospective
             analysis of the MIMIC-III database},
            author={Li, Fuhai and Xin, Hui and Zhang, Jidong and Fu, Mingqiang and
             Zhou, Jingmin and Lian, Zhexun},
            journal={BMJ open},
            volume={11},
            number={7},
            pages={e044779},
            year={2021},
            publisher={British Medical Journal Publishing Group}
        }
    """
    ),
    link="""
    https://www.kaggle.com/datasets/saurabhshahane/in-hospital-mortality-prediction
    """,
    license_id="CC0-1.0",
    version="Version 1",
    features=features_list,
    sensitive_features=["gender", "age"],
    sensitive_feature_justification="Demographic information like age and gender \
        often have a strong correlation with health outcomes. For example, older \
        patients are more likely to have a higher risk of mortality.",
)

## Creating Hugging Face Dataset

We convert our processed Pandas dataframe into a Hugging Face dataset, a powerful and easy-to-use data format which is also compatible with CyclOps models and evaluator modules. The dataset is then split to train and test sets.

In [25]:
dataset = Dataset.from_pandas(df)
dataset.cleanup_cache_files()
dataset

Dataset({
    features: ['ID', 'outcome', 'age', 'gender', 'BMI', 'hypertensive', 'atrialfibrillation', 'CHD with no MI', 'diabetes', 'deficiencyanemias', 'depression', 'Hyperlipemia', 'Renal failure', 'COPD', 'heart rate', 'Systolic blood pressure', 'Diastolic blood pressure', 'Respiratory rate', 'temperature', 'SP O2', 'Urine output', 'hematocrit', 'RBC', 'MCH', 'MCHC', 'MCV', 'RDW', 'Leucocyte', 'Platelets', 'Neutrophils', 'Basophils', 'Lymphocyte', 'PT', 'INR', 'NT-proBNP', 'Creatine kinase', 'Creatinine', 'Urea nitrogen', 'glucose', 'Blood potassium', 'Blood sodium', 'Blood calcium', 'Chloride', 'Anion gap', 'Magnesium ion', 'PH', 'Bicarbonate', 'Lactic acid', 'PCO2', 'EF', 'group'],
    num_rows: 1176
})

In [26]:
dataset = dataset.cast_column("outcome", ClassLabel(num_classes=2))
dataset = dataset.train_test_split(
    train_size=TRAIN_SIZE, stratify_by_column="outcome", seed=RANDOM_SEED
)

Casting the dataset:   0%|          | 0/1176 [00:00<?, ? examples/s]

## Model Creation

CyclOps model registry allows for straightforward creation and selection of models. This registry maintains a list of pre-configured models, which can be instantiated with a single line of code. Here we use a SGD classifier to fit a logisitic regression model. The model configurations can be passed to `create_model` based on the sllearn parameters for SGDClassifer.

In [27]:
model_name = "sgd_classifier"
model = create_model(model_name, random_state=123, verbose=0, class_weight="balanced")

2023-07-25 13:14:35,361 - verbose: 0
loss: log_loss
random_state: 123
early_stopping: true
class_weight: balanced



## Task Creation

We use Cyclops tasks to define our model's task (in this case, MortalityPrediction), train the model, make predictions, and evaluate performance. Cyclops task classes encapsulate the entire ML pipeline into a single, cohesive structure, making the process smooth and easy to manage.

In [28]:
mortality_task = MortalityPredictionTask(
    {model_name: model}, task_features=features_list, task_target="outcome"
)

In [29]:
mortality_task.list_models()

['sgd_classifier']

## Training

If `best_model_params` is passed to the `train` method, the best model will be selected after the hyperparameter search. The parameters in `best_model_params` indicate the values to create the parameters grid.

Note that the data preprocessor needs to be passed to the tasks methods if the Hugging Face dataset is not already preprocessed. 

In [30]:
best_model_params = {
    "alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
    "learning_rate": ["constant", "optimal", "invscaling", "adaptive"],
    "eta0": [0.1, 0.01, 0.001, 0.0001],
    "metric": "roc_auc",
    "method": "grid",
}

mortality_task.train(
    dataset["train"],
    model_name=model_name,
    transforms=preprocessor,
    best_model_params=best_model_params,
)

2023-07-25 13:14:41,268 [1;37mINFO[0m cyclops.models.wrappers.sk_model - Best alpha: 0.01


2023-07-25 13:14:41,268 - Best alpha: 0.01


2023-07-25 13:14:41,269 [1;37mINFO[0m cyclops.models.wrappers.sk_model - Best eta0: 0.1


2023-07-25 13:14:41,269 - Best eta0: 0.1


2023-07-25 13:14:41,269 [1;37mINFO[0m cyclops.models.wrappers.sk_model - Best learning_rate: optimal


2023-07-25 13:14:41,269 - Best learning_rate: optimal


In [31]:
model_params = mortality_task.list_models_params()[model_name]
model_params

{'alpha': 0.01,
 'average': False,
 'class_weight': 'balanced',
 'early_stopping': True,
 'epsilon': 0.1,
 'eta0': 0.1,
 'fit_intercept': True,
 'l1_ratio': 0.15,
 'learning_rate': 'optimal',
 'loss': 'log_loss',
 'max_iter': 1000,
 'n_iter_no_change': 5,
 'n_jobs': None,
 'penalty': 'l2',
 'power_t': 0.5,
 'random_state': 123,
 'shuffle': True,
 'tol': 0.001,
 'validation_fraction': 0.1,
 'verbose': 0,
 'warm_start': False}

**Log the model parameters to the report.**

We can add model parameters to the model card using the `log_model_parameters` method.

In [32]:
report.log_model_parameters(params=model_params)

## Prediction

The prediction output can be either the whole Hugging Face dataset with the prediction columns added to it or the single column containing the predicted values.

In [33]:
y_pred = mortality_task.predict(
    dataset["test"],
    model_name=model_name,
    transforms=preprocessor,
    proba=False,
    only_predictions=True,
)
len(y_pred)

Map:   0%|          | 0/236 [00:00<?, ? examples/s]

236

## Evaluation

Evaluation is done using various evaluation metrics that provide different perspectives on the model's predictive abilities i.e. standard performance metrics and fairness metrics.

The standard performance metrics can be created using the `MetricCollection` object.

In [34]:
metric_names = [
    "accuracy",
    "precision",
    "recall",
    "f1_score",
    "auroc",
    "roc_curve",
    "precision_recall_curve",
]
metrics = [create_metric(metric_name, task="binary") for metric_name in metric_names]
metric_collection = MetricCollection(metrics)

In addition to overall metrics, it might be interesting to see how the model performs on certain subpopulations. We can define these subpopulations using `SliceSpec` objects. 

In [35]:
spec_list = [
    {
        "age": {
            "min_value": 30,
            "max_value": 50,
            "min_inclusive": True,
            "max_inclusive": False,
        }
    },
    {
        "age": {
            "min_value": 50,
            "max_value": 80,
            "min_inclusive": True,
            "max_inclusive": False,
        }
    },
    {"gender": {"value": 1}},
    {"gender": {"value": 0}},
]
slice_spec = SliceSpec(spec_list)

A `MetricCollection` can also be defined for the fairness metrics.

In [36]:
specificity = create_metric(
    metric_name="specificity",
    task="binary",
)
sensitivity = create_metric(
    metric_name="sensitivity",
    task="binary",
)

fpr = 1 - specificity
fnr = 1 - sensitivity

ber = (fpr + fnr) / 2

fairness_metric_collection = MetricCollection(
    {
        "Sensitivity": sensitivity,
        "Specificity": specificity,
        "BER": ber,
    }
)

The FairnessConfig helps in setting up and evaluating the fairness of the model predictions.

In [37]:
fairness_config = FairnessConfig(
    metrics=fairness_metric_collection,
    dataset=None,  # dataset is passed from the evaluator
    target_columns=None,  # target columns are passed from the evaluator
    groups=["gender", "age"],
    group_bins={"age": [50, 70]},
    group_base_values={"age": 20, "gender": 0},
    thresholds=[0.5],
)

The evaluate methods outputs the evaluation results and the Hugging Face dataset with the predictions added to it.

In [38]:
results, dataset_with_preds = mortality_task.evaluate(
    dataset["test"],
    metric_collection,
    model_names=model_name,
    transforms=preprocessor,
    prediction_column_prefix="preds",
    slice_spec=slice_spec,
    batch_size=64,
    fairness_config=fairness_config,
    override_fairness_metrics=False,
)
dataset_with_preds

Map:   0%|          | 0/236 [00:00<?, ? examples/s]

Flattening the indices:   0%|          | 0/236 [00:00<?, ? examples/s]

Flattening the indices:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> age:[30 - 50):   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> age:[50 - 80):   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:1:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:0:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> overall:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:0&age:(-inf - 50.0]:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:0&age:(50.0 - 70.0]:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:0&age:(70.0 - inf]:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:1&age:(-inf - 50.0]:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:1&age:(50.0 - 70.0]:   0%|          | 0/236 [00:00<?, ? examples/s]

Filter -> gender:1&age:(70.0 - inf]:   0%|          | 0/236 [00:00<?, ? examples/s]

Dataset({
    features: ['ID', 'outcome', 'age', 'gender', 'BMI', 'hypertensive', 'atrialfibrillation', 'CHD with no MI', 'diabetes', 'deficiencyanemias', 'depression', 'Hyperlipemia', 'Renal failure', 'COPD', 'heart rate', 'Systolic blood pressure', 'Diastolic blood pressure', 'Respiratory rate', 'temperature', 'SP O2', 'Urine output', 'hematocrit', 'RBC', 'MCH', 'MCHC', 'MCV', 'RDW', 'Leucocyte', 'Platelets', 'Neutrophils', 'Basophils', 'Lymphocyte', 'PT', 'INR', 'NT-proBNP', 'Creatine kinase', 'Creatinine', 'Urea nitrogen', 'glucose', 'Blood potassium', 'Blood sodium', 'Blood calcium', 'Chloride', 'Anion gap', 'Magnesium ion', 'PH', 'Bicarbonate', 'Lactic acid', 'PCO2', 'EF', 'group', 'preds.sgd_classifier'],
    num_rows: 236
})

In [39]:
results[model_name].keys()

dict_keys(['age:[30 - 50)', 'age:[50 - 80)', 'gender:1', 'gender:0', 'overall'])

In [40]:
results[model_name]["overall"].keys()

dict_keys(['BinaryAccuracy', 'BinaryPrecision', 'BinaryRecall', 'BinaryF1Score', 'BinaryAUROC', 'BinaryROCCurve', 'BinaryPrecisionRecallCurve'])

In [41]:
results["fairness"].keys()

dict_keys(['gender:0&age:(-inf - 50.0]', 'gender:0&age:(50.0 - 70.0]', 'gender:0&age:(70.0 - inf]', 'gender:1&age:(-inf - 50.0]', 'gender:1&age:(50.0 - 70.0]', 'gender:1&age:(70.0 - inf]'])

In [42]:
results["fairness"]["gender:0&age:(-inf - 50.0]"]

{'Group Size': 14,
 'BER@0.5': 0.175,
 'Sensitivity@0.5': 0.75,
 'Specificity@0.5': 0.9,
 'BER Parity@0.5': 1.0,
 'Sensitivity Parity@0.5': 1.0,
 'Specificity Parity@0.5': 1.0}

**Log the performance metrics to the report.**

We can add a performance metric to the model card using the `log_performance_metric` method, which expects a dictionary where the keys are in the following format: `slice_name/metric_name`. For instance, `overall/accuracy`. 

We first need to process the evaluation results to get the metrics in the right format.

In [43]:
# flatten the results to follow the model card schema
results_flat = {}
for name, model_results in results.items():
    results_flat[name] = {}
    for slice_name, slice_results in model_results.items():
        for metric_name, metric_value in slice_results.items():
            # remove the curve data
            if metric_name not in ["BinaryROCCurve", "BinaryPrecisionRecallCurve"]:
                results_flat[name][f"{slice_name}/{metric_name}"] = metric_value

In [44]:
for name, metric in results_flat[model_name].items():
    split, name = name.split("/")
    if name == "BinaryAUROC":
        report.log_quantitative_analysis(
            "performance",
            name=name,
            value=metric,
            slice=split,
            pass_fail_thresholds=0.8,
            pass_fail_threshold_fns= lambda x, threshold: x >= threshold
        )
    elif name == "BinaryAccuracy":
        report.log_quantitative_analysis(
            "performance",
            name=name,
            value=metric,
            slice=split,
            pass_fail_thresholds=0.6,
            pass_fail_threshold_fns= lambda x, threshold: x >= threshold
        )
    elif name == "BinaryPrecision":
        report.log_quantitative_analysis(
            "performance",
            name=name,
            value=metric,
            slice=split,
            pass_fail_thresholds=0.6,
            pass_fail_threshold_fns= lambda x, threshold: x >= threshold
        )
    elif name == "BinaryRecall":
        report.log_quantitative_analysis(
            "performance",
            name=name,
            value=metric,
            slice=split,
            pass_fail_thresholds=0.6,
            pass_fail_threshold_fns= lambda x, threshold: x >= threshold
        )
    elif name == "BinaryF1Score":
        report.log_quantitative_analysis(
            "performance",
            name=name,
            value=metric,
            slice=split,
            pass_fail_thresholds=0.6,
            pass_fail_threshold_fns= lambda x, threshold: x >= threshold
        )

We can also use the `ClassificationPlotter` to plot the performance metrics and the add the figure to the model card using the `log_plotly_figure` method.

In [45]:
plotter = ClassificationPlotter(task_type="binary", class_names=["0", "1"])
plotter.set_template("plotly_white")
plotter.set_colorway(
    [
        "#006ba6",
        "#ffbc42",
        "#0496ff",
        "#d81159",
        "#8f2d56",
        "#75dddd",
        "#508991",
        "#172a3a",
        "#004346",
        "#09bc8a",
    ]
)

In [46]:
# plotting the ROC curves for all the slices
roc_curves = {
    slice_name: slice_results["BinaryROCCurve"]
    for slice_name, slice_results in results[model_name].items()
}
aurocs = {
    slice_name: slice_results["BinaryAUROC"]
    for slice_name, slice_results in results[model_name].items()
}
roc_plot = plotter.roc_curve_comparison(roc_curves, aurocs=aurocs)
report.log_plotly_figure(
    fig=roc_plot,
    caption="ROC Curve for Female Patients",
    section_name="quantitative analysis",
)
roc_plot.show()

In [47]:
# Plotting the overall classification metric values.
overall_performance = {
    metric_name: metric_value
    for metric_name, metric_value in results[model_name]["overall"].items()
    if metric_name not in ["BinaryROCCurve", "BinaryPrecisionRecallCurve"]
}
overall_performance_plot = plotter.metrics_value(
    overall_performance, title="Overall Performance"
)
report.log_plotly_figure(
    fig=overall_performance_plot,
    caption="Overall Performance",
    section_name="quantitative analysis",
)
overall_performance_plot.show()

In [48]:
# Plotting the metric values for all the slices.
slice_metrics = {
    slice_name: {
        metric_name: metric_value
        for metric_name, metric_value in slice_results.items()
        if metric_name not in ["BinaryROCCurve", "BinaryPrecisionRecallCurve"]
    }
    for slice_name, slice_results in results[model_name].items()
}
slice_metrics_plot = plotter.metrics_comparison_bar(slice_metrics)
report.log_plotly_figure(
    fig=slice_metrics_plot,
    caption="Slice Metric Comparison",
    section_name="quantitative analysis",
)
slice_metrics_plot.show()

In [49]:
# plotting the fairness metrics
fairness_results = copy.deepcopy(results["fairness"])
fairness_metrics = {}
# remove the group size from the fairness results and add it to the slice name
for slice_name, slice_results in fairness_results.items():
    group_size = slice_results.pop("Group Size")
    fairness_metrics[f"{slice_name} (Size={group_size})"] = slice_results

fairness_plot = plotter.metrics_comparison_scatter(
    fairness_metrics, title="Fairness Metrics"
)
report.log_plotly_figure(
    fig=fairness_plot,
    caption="Fairness Metrics",
    section_name="fairness analysis",
)
fairness_plot.show()

## Report Generation

Before generating the model card, let us document some of the details of the model and some considerations involved in developing and using the model.


Let's start with populating the model details section, which includes the following fields by default:
- description: A high-level description of the model and its usage for a general audience.
- version: The version of the model.
- owners: The individuals or organizations that own the model.
- license: The license under which the model is made available.
- citation: The citation for the model.
- references: Links to resources that are relevant to the model.
- path: The path to where the model is stored.
- regulatory_requirements: The regulatory requirements that are relevant to the model.

We can add additional fields to the model details section by passing a dictionary to the `log_from_dict` method and specifying the section name as `model_details`. You can also use the `log_descriptor` method to add a new field object with a `description` attribute to any section of the model card.

In [50]:
report.log_from_dict(
    data={
        "description": "The model was trained on a subset of the MIMIC-III dataset \
            to predict in-hospital mortality for patients admitted to the ICU with \
            heart failure. In-hospital mortality is defined as vital status at \
            hospital discharge.",
    },
    section_name="model_details",
)
report.log_version(
    version_str="0.0.1",
    date="2023-06-05",
    description="Initial Release",
)
report.log_owner(name="CyclOps Team", contact="vectorinstitute.github.io/cyclops/")
report.log_license(identifier="Apache-2.0")
report.log_reference(
    link="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html"  # noqa: E501
)

Next, let's populate the considerations section, which includes the following fields by default:
- users: The intended users of the model.
- use_cases: The use cases for the model. These could be primary, downstream or out-of-scope use cases.
- fairness_assessment: A description of the benefits and harms of the model for different groups as well as the steps taken to mitigate the harms.
- ethical_considerations: The risks associated with using the model and the steps taken to mitigate them. This can be populated using the  `log_risk` method.



In [51]:
report.log_from_dict(
    data={
        "users": [
            {"description": "Hospitals"},
            {"description": "Clinicians"},
        ]
    },
    section_name="considerations",
)
report.log_user(description="ML Engineers")
report.log_use_case(
    description="Predicting mortality of patients admitted to the ICU with heart \
        failure.",
    kind="primary",
)
report.log_use_case(
    description="Predicting mortality patients admitted to other units of a hospital \
        with different conditions.",
    kind="out-of-scope",
)
report.log_fairness_assessment(
    affected_group="gendera, age",
    benefit="Improved health outcomes for patients.",
    harm="Biased predictions for patients in certain groups (e.g. older patients) \
        may lead to worse health outcomes.",
    mitigation_strategy="We will monitor the performance of the model on these groups \
        and retrain the model if the performance drops below a certain threshold.",
)
report.log_risk(
    risk="The model may be used to make decisions that affect the health of patients.",
    mitigation_strategy="The model should be continuously monitored for performance \
        and retrained if the performance drops below a certain threshold.",
)

Once the model card is populated, you can generate the report using the `export` method. The report is generated in the form of an HTML file. A JSON file containing the model card data will also be generated along with the HTML file. By default, the files will be saved in a folder named `cyclops_reports` in the current working directory. You can change the path by passing a `output_dir` argument when instantiating the `ModelCardReport` class.

In [52]:
report_path = report.export()
shutil.copy(f"{report_path}", ".")

'./model_card.html'

You can view the generated HTML [report](./model_card.html).