## Modeling Report Triage

This notebook summarizes the following aspects of the modeling experiment: 

- The predictors we created
- The temporal crossvalidation setup we used to validate our models
- The models we ran
- The results we got interms of the efficiency, effectiveness, and equity metrics
- A deeper dive into what the ML models are learning from the data to make the predictions

In [None]:
import pandas as pd
import sqlalchemy
import os

from sqlalchemy.engine.url import URL
from triage.util.db import create_engine


from triage.component.postmodeling.modeling_report_functions import *

import matplotlib.pyplot as plt

pd.set_option('precision', 4)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

plt.rcParams.update({"figure.dpi": 120})
%matplotlib inline

## Database Connection

In [None]:
db_url = URL(
            'postgres',
            host=os.getenv('PGHOST'),
            username=os.getenv('PGUSER'),
            database=os.getenv('PGDATABASE'),
            password=os.getenv('PGPASSWORD'),
            port=5432,
        )

db_engine = create_engine(db_url)

## 1. Parameters for the Report

The following values are the default parameters for the report. If you are using this interactively, you can change the parameter values.

In [None]:
# The most recent completed experiment hash
# Note that this has to be a list
experiment_hashes = [get_most_recent_experiment_hash(db_engine)]

# Model Performance metric and threshold defaulting to reacll@1_pct
performance_metric = 'recall@'
threshold = '1_pct'

# Bias metric defaults to tpr_disparity and bias metric values for all groups generated (if bias audit specified in the experiment config)
bias_metric = 'tpr_disparity'
bias_priority_groups=None

#### 1.1 Updating the parameters based on the config

In [None]:
params = load_report_parameters_from_config(db_engine, experiment_hashes[0])

if params['performance_metric'] is not None:
    performance_metric = params['performance_metric']

if params['threshold'] is not None:
    threshold = params['threshold']

if params['bias_metric'] is not None:
    bias_metric = params['bias_metric']

if params['priority_groups'] is not None:
    bias_priority_groups = params['priority_groups']

In [None]:
performance_metric, threshold, bias_metric, bias_priority_groups

## 2. Visualizing the Temporal Validation Splits

In [None]:
visualize_validation_splits(db_engine, experiment_hashes[0])

## 3. Modeling Cohorts

In [None]:
cohort_summary = summarize_cohorts(db_engine, experiment_hashes[0], generate_plots=True)

In [None]:
cohort_summary

In [None]:
cohort_summary[['cohort_size', 'baserate']].describe()

## 4. Predictors 

In [None]:
features = list_all_features(db_engine, experiment_hashes[0])
features

### 4.1 Missingness of Features 

In [None]:
feature_missingness_stats(db_engine)

## 5. Model Groups Built

In [None]:
summarize_model_groups(db_engine, experiment_hashes)

## 6. All Models Built

In [None]:
list_all_models(db_engine, experiment_hashes)

## 7. Model Performance

### 7(a) Overall Cohort

In [None]:
plot_performance_all_models(db_engine, experiment_hashes, performance_metric, threshold)

### 7(b) Cohort subsets

In [None]:
plot_subset_performance(db_engine, experiment_hashes, threshold, performance_metric)

## 8. Model Performance vs Bias

In [None]:
plot_performance_against_bias(
    engine=db_engine,
    experiment_hashes=experiment_hashes,
    metric=performance_metric,
    parameter=threshold,
    bias_metric=bias_metric,
    groups=None # This attribute need to be updated for 
)

## 9. Precision-Recall Graphs

In [None]:
# plot_prk_curves(db_engine, experiment_hashes, step_size=0.01)

## 10. Initial Model Selection and Further analysis on best models
For the purposes of this report, by default, we pick the best performing model from each model type based on average performance to generate additional outputs about the developed models. We would not assume the existence of predictions at this stage. Therefore, we will not do analysis such as list comparisons, crosstabs, score distribution type stuff. we'll look at more higher level comparisons between the different model types

In [None]:
best_models = get_best_hp_config_for_each_model_type(db_engine, experiment_hashes, performance_metric, threshold)
best_models

In [None]:
try: 
    rep = PostmodelingReport(
        engine=db_engine,
        experiment_hashes=experiment_hashes,
        model_groups=best_models.index.tolist()
    )
except Exception as e:
    rep = None
    logging.error('No best models')

### 10.1 Feature Importance

In [None]:
if rep:
    rep.plot_feature_importance()

In [None]:
if rep: 
    rep.plot_feature_group_importance(n_top_groups=20)

### 10.2 Recall Curves

In [None]:
if rep:
    rep.plot_recall_curves_overlaid(n_splits=5)