# Excercise 03 - Evaluating the trained readmission model

Welcome to the third hands-on excercise to learn about evaluating clinical ML using CyclOps!

We will use a trained model, evaluate it across different patient subpopulations and across various metrics. At the end of this excercise, you will be able to:

1. Run inference using a trained ML model, to generate predictions on a test dataset
2. Evaluate the model on the test set across across different sub-groups

## Step 01 - Install CyclOps

CyclOps is available as a [python package](https://pypi.org/project/pycyclops/) and can be installed using ``pip``. Note that we now install ``CyclOps`` with and extra dependency ``xgboost`` since we will be using the [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html) library.

``Colab`` would ask you to restart the session, which is normal. Click on ``Restart Session`` and re-run the cell to install ``CyclOps``.

**NOTE**: We uninstall ``cupy`` from the colab runtime to avoid conflicts with ``CyclOps`` which would attempt to use ``cupy`` if it is installed. Since the runtime does not support GPUs, we will uninstall ``cupy``.

In [None]:
!pip uninstall cupy-cuda12x -y
!pip install 'pycyclops[xgboost]'
!pip install ucimlrepo

## Step 02 - Create a model report class and learn about the different sections of the report!

CyclOps offers a package for documentation of the model through a model report. The ``ModelReport`` class is used to populate and generate the model report as an HTML file. The model report has the following sections:

#### Overview
Provides a high level overview of how the model is doing (a quick glance of important metrics), and how it is doing over time (performance over several metrics and subgroups over time).
#### Datasets
High level statistics of the training data, including changes in distribution over time.

#### Quantitative Analysis
This section contains additional detailed performance metrics of the model for different sets of the data and subpopulations.

#### Fairness Analysis
This section contains the fairness metrics of the model.

#### Model Details
This section contains descriptive metadata about the model such as the owners, version, license, etc.

#### Model Parameters
This section contains the technical details of the model such as the model architecture, training parameters, etc.

#### Considerations
This section contains descriptions of the considerations involved in developing and using the model such as the intended use, limitations, etc.

Let's first go over the different methods available to log information to the report! Open the [API documentation link](https://vectorinstitute.github.io/cyclops/api/reference/api/_autosummary/cyclops.report.report.ModelCardReport.html#cyclops.report.report.ModelCardReport).

Now we can create a report object and log some information to it!

In [None]:
from cyclops.report import ModelCardReport

In [None]:
report = ModelCardReport()

Perhaps we can log some important information about the dataset for the end-user! Let's load the dataset and use the metadata to log some information to the ``dataset`` section of the report.

In [None]:
import inspect
from ucimlrepo import fetch_ucirepo

In [None]:
diabetes_130_data = fetch_ucirepo(
    id=296
)  # This ID specifically corresponds to the Diabetes 130 dataset
features = diabetes_130_data["data"]["features"]
targets = diabetes_130_data["data"]["targets"]
metadata = diabetes_130_data["metadata"]
variables = diabetes_130_data["variables"]

In [None]:
report.log_dataset(
    description=metadata["abstract"],
    citation=inspect.cleandoc(
        """
        @article{strack2014impact,
          title={Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records},
          author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N and others},
          journal={BioMed research international},
          volume={2014},
          year={2014},
          publisher={Hindawi}
        }
    """,
    ),
    link=metadata["repository_url"],
    license_id="CC0-1.0",
    version="Version 1",
    features=list(features.columns),
    sensitive_features=["gender", "age", "race"],
    sensitive_feature_justification="Demographic information like age and gender \
        often have a strong correlation with health outcomes. For example, older \
        patients are more likely to have a higher risk of readmission.",
)