Copyright (c) 2020. Cognitive Scale Inc. All rights reserved.
Licensed under CognitiveScale Example Code [License](https://github.com/CognitiveScale/cortex-certifai-examples/blob/7998b8a481fccd467463deb1fc46d19622079b0e/LICENSE.md)

# Introduction

This is the fifth notebook in this example of how to scan models using Certifai. If you have not already done so, please run the [first notebook](patient-readmission-train.ipynb) to train the models to be explained and the [second notebook](patient-readmission-explain-scan.ipynb) to create the `explain-scan-def.yaml` scan definition. **This notebook requires at least Certifai 1.3.13.**

In this notebook, we will:
1. Create a Certifai scan object with the information Certifai needs to explain a model
2. Run explanation scans using counterfactual search
3. View the results in the Console

Counterfactual search is an alternative approach for finding counterfactuals, where, Certifai searches the evaluation dataset for counterfactuals instead of generating counterfactuals via its genetic algorithm (GA). Unlike a traditional explanation scan, counterfactual search can be run whether or not there is access to model predictions. When Certifai has access to the model it is able to simplify the counterfactuals found in the dataset to remove unnecessary feature changes. When there is no access to the model the counterfactuals are exactly in the evaluation dataset.

In this notebook we will run two explanations scan using counterfactual search, one with access to the model and one without access to the model. See the [sixth notebook](patient-readmission-sampling-results.ipynb) for how to work with the results of the explanations scan in a notebook and for a comparison between the counterfactuals from each scan.

In [1]:
import numpy as np
import pandas as pd
import pickle
import pprint

from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiPredictorWrapper, CertifaiModel,
                                      CertifaiDataset, CertifaiDatasetSource, CertifaiGroupingFeature,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue,
                                      CertifaiFeatureDataType, CertifaiFeatureSchema, CertifaiDataSchema)

# Loading the Certifai Scan object

In this section, we load the previously defined scan definition to use as a starting point. This is a convenience that avoids us having to recreate information about the prediction task, datasets and feature schema. 

Load the scan definition from file.

In [2]:
scan = CertifaiScanBuilder.from_file('explain-scan-def.yaml')



For this analysis we will only use a single model. Because we're running with local models in the notebook, we need to reload the model and reassociate it with the scan. If the models were running externally, we would instead update the scan definition with the correct predict_endpoint URL.

In [3]:
# remove existing models
for model_name in ['logit', 'mlp']:
    scan.remove_model(model_name)

# include the logitstic regression model
with open('readmission_logit.pkl', 'rb') as f:
    saved = pickle.load(f)
    logit_model = saved.get('model')
    model = CertifaiPredictorWrapper(logit_model)
    scan.add_model(CertifaiModel('logit', local_predictor=model))

### Running with model access

Below we run the scan and save the results in the `reports` folder. The `sampling` kwarg specifies the explanation scan should be run using Counterfactual Search.

In [4]:
results = scan.run_explain(write_reports=True, sampling=True)

Starting scan with model_use_case_id: 'readmission' and scan_id: '56c3cf74279e'
[--------------------] 2021-10-12 19:06:36.293999 - 0 of 1  (0.0% complete) - Running sampling explanation evaluation for model: logit
[####################] 2021-10-12 19:08:47.069785 - 1 of 1  (100.0% complete) - Completed all evaluations


### Preparing the scan to run without model access

To perform the scan without model access we will include the model predictions for each dataset as an extra column within the dataset. This is similar to having historical predictions for your model.

In [5]:
# label the predicted outcome column
predicted_label = 'logit_predictions'
scan.dataset_schema.predicted_outcome_feature_name = predicted_label

def add_predicted_outcome(df, model):
    X = df.drop('readmitted', axis=1)
    predictions = model.predict(X)
    df['logit_predictions'] = predictions
    return df
    

# insert the model predictions in the evaluation dataset and update the scan definition
evaluation_df = add_predicted_outcome(pd.read_csv('diabetic_data_processed.csv'), logit_model)
eval_dataset = CertifaiDataset('evaluation', CertifaiDatasetSource.dataframe(evaluation_df))
scan.remove_dataset('evaluation')
scan.add_dataset(eval_dataset)
scan.evaluation_dataset_id = 'evaluation'

# insert the model predictions in the explanation dataset and update the scan definition
explanation_df = add_predicted_outcome(pd.read_csv('diabetic_data_explain.csv'), logit_model)
explan_dataset = CertifaiDataset('explanation', CertifaiDatasetSource.dataframe(explanation_df))
scan.remove_dataset('explanation')
scan.add_dataset(explan_dataset)
scan.explanation_dataset_id = 'explanation'

Specify in that the scan does not have model access

In [6]:
scan.no_model_access = True

Remove the underlying model from the previous scan

In [7]:
scan.remove_model('logit')
scan.add_model(CertifaiModel('logit'))

### Running without model access

Below we run the scan and save the results in the `reports` folder. The `sampling` kwarg specifies the explanation scan should be run using Counterfactual Search.

In [8]:
results = scan.run_explain(write_reports=True, sampling=True)

[--------------------] 2021-10-12 19:08:51.417783 - 0 of 1  (0.0% complete) - Starting scan with model_use_case_id: 'readmission' and scan_id: '9bdb2bc8dfd5'
[--------------------] 2021-10-12 19:08:51.417963 - 0 of 1  (0.0% complete) - Running sampling explanation evaluation for model: logit
[####################] 2021-10-12 19:11:07.835351 - 1 of 1  (100.0% complete) - Completed all evaluations


# View the Results

The results can be viewed in the Certifai console using the CLI command `certifai console`, run from this folder. 
Go to `http://localhost:8000` in your browser. 

The results can also be analyzed in the same notebook; or analyzed later in a separate notebook. See the [sixth notebook](patient-readmission-sampling-results.ipynb) for how to load and work with the results of the explanations scan in a separate notebook.