# Introduction

This is the second notebook in this example of how to scan models using Certifai. If you have not already done so, please run the [first notebook](patient-readmission-train.ipynb) to train the models to be explained.

In this notebook, we will:
1. Create a Certifai scan object with the information Certifai needs to explain the models
2. Run the explanations scan and save its definition for future use
3. View the results in the Console

See the [third notebook](patient-readmission-explain-results.ipynb) for how to work with the results of the explanations scan in a notebook.

In [1]:
import numpy as np
import pandas as pd
import pickle
import pprint

from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiPredictorWrapper, CertifaiModel,
                                      CertifaiDataset, CertifaiDatasetSource, CertifaiGroupingFeature,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue,
                                      CertifaiFeatureDataType, CertifaiFeatureSchema, CertifaiDataSchema)

# Creating the Certifai Scan object

In this section, we create a Certifai scan object containing with the information Certifai needs to run a scan that explains the models. This information consists of:
* Metadata about the prediction task being performed
* What evaluations to run
* The models to be scanned
* The datasets to be used
* Metadata about the datasets that is needed for the scan

Create a Certifai scan object, providing metadata about the prediction task that is performed by the models. Define the evaluations to be performed, which in this case is just 'explanation'.

In [2]:
task = CertifaiPredictionTask(CertifaiTaskOutcomes.classification(
    [
        CertifaiOutcomeValue(0, name='Not Readmitted', favorable=True),
        CertifaiOutcomeValue(1, name='Readmitted')
    ]),
    prediction_description='Determine whether a patient will be readmitted')

scan = CertifaiScanBuilder.create('readmission',
                                  prediction_task=task)
scan.add_evaluation_type('explanation')

Load the two models we saved in the first notebook, and wrap them so that they can be called by Certifai. Add these models into the scan object.

In [3]:
from encoder import Encoder

for model_name in ['logit', 'mlp']:
    with open(f'readmission_{model_name}.pkl', 'rb') as f:
        saved = pickle.load(f)
        model = CertifaiPredictorWrapper(saved.get('model'), encoder=Encoder())
        scan.add_model(CertifaiModel(model_name, local_predictor=model))

Create a 100-row sample of the full dataset, for which explanations will be generated.

In [4]:
df = pd.read_csv('diabetic_data_processed.csv')
explain_df = df.sample(100)
explain_df.to_csv('diabetic_data_explain.csv', index=False)

Add the full evaluation and the explanation datasets to the scan. The evaluation dataset is used by Certifai to create an initial population for the genetic algorithm used in the scan, and needs to be a representative sample of the expected data (minimum c. 1K rows, ideally 10-50K rows, larger is OK). The explanation dataset contains the points to be explained. Note the time to run the scan will depend linearly on the size of the explanation dataset, so it is best to keep this relatively small at least initially.

In [5]:
eval_dataset = CertifaiDataset('evaluation',
                               CertifaiDatasetSource.csv('diabetic_data_processed.csv'))
scan.add_dataset(eval_dataset)
scan.evaluation_dataset_id = 'evaluation'

explan_dataset = CertifaiDataset('explanation', CertifaiDatasetSource.csv('diabetic_data_explain.csv'))
scan.add_dataset(explan_dataset)
scan.explanation_dataset_id = 'explanation'

Read in the metadata about one-hot encoding that we saved in the first notebook and use this to define the feature schema in the scan object. This lets Certifai know the value mappings to columns for both the analysis and when presenting explanations.

In [6]:
with open('cat_value_mappings.pkl', 'rb') as f:
    cat_value_mappings = pickle.load(f)

cat_features = []
for feature, value_columns in cat_value_mappings.items():
    data_type = CertifaiFeatureDataType.categorical(value_columns=value_columns.items())
    feature_schema = CertifaiFeatureSchema(name=feature, data_type=data_type)
    cat_features.append(feature_schema)
schema = CertifaiDataSchema(features=cat_features)
scan.dataset_schema = schema


Tell Certifai about the label/outcome column in the dataset, so that it won't be passed in the predict calls or used in the genetic algorithm. 

In [7]:
scan.dataset_schema.outcome_feature_name = 'readmitted'

# Run the Explanations Scan

Run the scan, saving the results in the `reports` folder. 

In [8]:
results = scan.run(write_reports=True)

Starting scan with model_use_case_id: 'readmission' and scan_id: '5c54861f2e8a'
[--------------------] 2020-10-04 12:20:03.419482 - 0 of 2 reports (0.0% complete) - Running explanation evaluation for model: logit
[##########----------] 2020-10-04 12:20:43.433004 - 1 of 2 reports (50.0% complete) - Running explanation evaluation for model: mlp
[####################] 2020-10-04 12:21:39.721667 - 2 of 2 reports (100.0% complete) - Completed all evaluations


Save the scan definition as a yaml file so that it can be rerun in the future, either in a notebook or from the CLI. This is useful for example to get explanations for additional datapoints, for updated models, or for a model that has been deployed as a service.

In [9]:
with open('explain-scan-def.yaml', "w") as f:
    scan.save(f)

The scan definition can be loaded into a new notebook using `CertifaiScanBuilder.from_file('explain-scan-def.yaml')`.

# View the Results

The results can be viewed in the Certifai console using the CLI command `certifai console`, run from this folder. 
Go to `http://localhost:8000` in your browser. 

The results can also be analyzed in the same notebook; or analyzed later in a separate notebook. See the [third notebook](patient-readmission-explain-results.ipynb) for how to load and work with the results of the explanations scan in a separate notebook.