# Introduction
This notebook is the second in a series of notebooks illustrating how to interface a model to Certifai, run a scan and perform some simple analyses

## Part 2 - Run a Certifai scan
In this notebook we'll set up a scan to run fairness and explanation analyses of the two models we created in Part 1.

First we'll reload those trained models

In [1]:
import pickle

with open('models.pkl', 'rb') as f:
    model_dict = pickle.load(f)

logistic_model = model_dict['logistic']
dtree_model = model_dict['dtree']

# Define the Certifai scan

In this cell we define the details of the scan we want to perform

In [2]:
from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiPredictorWrapper, CertifaiModel,
                                      CertifaiDataset, CertifaiGroupingFeature, CertifaiDatasetSource,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue)

task = CertifaiPredictionTask(CertifaiTaskOutcomes.classification(
    [
        CertifaiOutcomeValue(1, name='Loan granted', favorable=True),
        CertifaiOutcomeValue(2, name='Loan denied')
    ]),
    prediction_description='Determine whether a loan should be granted')

scan = CertifaiScanBuilder.create('German_credit_use_case',
                                  prediction_task=task)

# We want to get explanations
scan.add_evaluation_type('explanation')

# We also want to look at fairness with respect to a couple of features - 'age' and 'status'
scan.add_evaluation_type('fairness')
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('age'))
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('status'))

# Define the datasets to use
Here we tell Certifai what datasets to run the scans against.  We can specify files or pe-loaded Pandas DataFrames.  Here we'll use the raw CSV files we already have

In [3]:
# We'll just use the full composite of the train/test split we used earlier, but thi could be any dataset
# conforming to the same schema
base_path = '..'
all_data_file = f"{base_path}/datasets/german_credit_eval.csv"

eval_dataset = CertifaiDataset('evaluation',
                               CertifaiDatasetSource.csv(all_data_file))
scan.add_dataset(eval_dataset)

# Because the dataset contains a ground truth outcome column which the model does not
# expect to receive as input we need to state that in the dataset schema (since it cannot
# be inferred from the CSV)
scan.dataset_schema.outcome_feature_name = 'outcome'

# The fairness scan uses the general 'evaluation' dataset
scan.evaluation_dataset_id = 'evaluation'

# Explanations scans use a separately specified 'explanation' dataset, typically because you don't necessarily
# want individual explanations for all your data, but more typically specific examples.  However, here we'll just
# explain everything and use the dataet we already connected to
scan.explanation_dataset_id = 'evaluation'

# Attach the models to the scan
We finally need to define what models we are scanning, so here we attach the models we loaded earlier
to the scan.

In [4]:
scan.add_model(CertifaiModel('logistic',
                             local_predictor=CertifaiPredictorWrapper(logistic_model)))
scan.add_model(CertifaiModel('dtree',
                             local_predictor=CertifaiPredictorWrapper(dtree_model)))

# Run the scan
Now we'll run the scan, and save the results in a scan report that can be read eitehr by the Certifai Console
or by subsequent notebooks in this series

In [5]:
_ = scan.run(write_reports=True)

Starting scan with model_use_case_id: 'German_credit_use_case' and scan_id: '3e98a7314ac6'
[--------------------] 2021-02-22 13:50:35.126748 - 0 of 4 reports (0.0% complete) - Running explanation evaluation for model: logistic




[#####---------------] 2021-02-22 13:57:38.843538 - 1 of 4 reports (25.0% complete) - Running fairness evaluation for model: logistic




[##########----------] 2021-02-22 13:59:24.914187 - 2 of 4 reports (50.0% complete) - Running explanation evaluation for model: dtree




[###############-----] 2021-02-22 14:04:25.272006 - 3 of 4 reports (75.0% complete) - Running fairness evaluation for model: dtree




[####################] 2021-02-22 14:06:07.034742 - 4 of 4 reports (100.0% complete) - Completed all evaluations
