# Tutorial: Build Trusted ML models with Certifai on Azure Notebooks

This tutorial picks up after part one of the Azure regression tutorial. In part one you prepared the NYC taxi data for regression modeling. The referenced parts 1 & 2 can be found under My projects on the [Azure Notebooks Portal](https://notebooks.azure.com/)

In this tutorial, you learn:

> * How to set up a Certifai scan from scratch
> * How to run Explainability, Fairness and Robustness scans
> * Explore their results
> * Log Certifai results to the Azure portal

If you don't have an Azure subscription, create a [free account](https://aka.ms/AMLfree) before you begin. 

If you don't have the Certifai Toolkit, get it now from our [Certifai page](https://www.cognitivescale.com/download-certifai/)

## Usage/Preparation Steps
There are two ways to enjoy this tutorial
1. Running on Azure Notebooks: By logging to the [Azure Notebooks Portal](https://notebooks.azure.com/)
2. Running locally


### 1. Running on Azure Notebook portal
1. In the [Azure Notebooks Portal](https://notebooks.azure.com/) go to My Projects. Use the "Clone Repo" button on the top right. Clone this project's [repo](https://github.com/mdungarov-cs/cortex_certifai_azure_notebooks_ny_taxi.git)
2. Download the toolkit from [Certifai page](https://www.cognitivescale.com/download-certifai/) and unzip
3. Upload the cat_encoder.py from certifai_toolkit/examples/notebooks
4. Upload the scanner, engine and from certifai_toolkit/packages folder
5. Upload the requirements.txt file from the certifai_toolkit folder
6. Use terminal or your notebook to `pip install` the files from steps 3&4 (remember to `pip install -r` the requirements file)  
NB: if using hosted terminal to install dependencies: ensure it is in the right environment and running the right python version).  
NB: If using the notebook to install dependencies, you might need to restart the notebook after installations

You are all set!

### 2. Running locally

1. Ensure you have a Python 3.x notebook server with the following installed:
2. Install Azure SDK dependencies `pip install --upgrade azureml-sdk[notebooks,automl,widgets]`
3. Follow the instructions to install the Certifai toolkit and dependencies from the [documentation page](https://cognitivescale.github.io/cortex-certifai/docs/toolkit/setup/install-certifai-cli-lib)

You are all set!

# Tutorial Contents

1. Data prep
2. Training an AutoML model
3. Model Selection for Certifai scan
4. Certifai Scan Setup
5. Review of Results and Evaluation

# Data prep

This part follows closely part 2 of the tutorial mentioned before, with minor differences to data prep needed for Certifai to run properly.

We start by loading data from part 1, and selecting relevant columns, storing results as csv - Certifai will directly consume the data as a csv file. For some of the runs we can also do with a smaller dataset, hence we also prepare a `_sample` dataset with only 500 rows to shorten the time needed to run

In [1]:
import os
import azureml.dataprep as dprep

file_path = os.path.join(os.getcwd(), "dflows.dprep")
dflow_prepared = dprep.Dataflow.open(file_path)

dflow_reduced = dflow_prepared.keep_columns(['pickup_weekday','pickup_hour', 'distance',
                                             'passengers', 'vendor', 'cost'])

df=dflow_reduced.to_pandas_dataframe()

# NOTE: DATASET CANNOT HAVE AN INDEX COLUMN FOR CERTIFAI
df.to_csv('all_data_NY_Taxi.csv',index=False)
df.sample(500,random_state=0).to_csv('all_data_NY_Taxi_sample.csv',index=False)



Here, we split the data into test and train but also prepare the CatEncoder by specifying the correct columns as categoricals. This enables multi-processing in the context of a notebook

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import numpy as np
import random
from cat_encoder import CatEncoder

In [3]:

base_path = '.'
all_data_file = f"{base_path}/all_data_NY_taxi.csv"
sample_data_file = f"{base_path}/all_data_NY_taxi_sample.csv"
df = pd.read_csv(all_data_file)

label_column = 'cost'

# Separate outcome
y = df[label_column]
X = df.drop(label_column, axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

cat_columns = [
    'vendor',
    'pickup_weekday',
    'passengers'
    ]
# Note - to support python multi-processing in the context of a notebook the encoder MUST
# be in a separate file, which is why `CatEncoder` is defined outside of this notebook
encoder = CatEncoder(cat_columns, X)


# Training AutoML

The below simply runs Azure AutoML in the same fashion as in the Azure regression Tutorial after the minor modifications to the data and model encoder we had to do.

We will:
- log in to our Azure Workspace
- Set up and run AutoML
- Review results and select models for further review using Certifai

### Azure Workspace Login

- Type in your workspace credentials
- use those to creste a config file
- build workspace from config file

Please note that you need to populate the appropriate credentials

In [4]:
subscription_id = os.getenv("SUBSCRIPTION_ID", default="d5b574b0-fd80-4454-b38d-24730ca95b69")
resource_group = os.getenv("RESOURCE_GROUP", default="TaxiTest")
workspace_name = os.getenv("WORKSPACE_NAME", default="SteveTaxiTest")
workspace_region = os.getenv("WORKSPACE_REGION", default="eastus")

In [5]:
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
except:
    # Assuming you have set up a fresh resource group, and workspace, in the Azure console then deploy with the
    # following
    ws = Workspace.create(name=workspace_name,
                   subscription_id=subscription_id,
                   resource_group=resource_group,
                   create_resource_group=False,
                   location=workspace_region
                   )
    # write the details of the workspace to a configuration file to the notebook library
    ws.write_config()

In [6]:
from azureml.core.experiment import Experiment
experiment = Experiment(ws, "taxi-experiment")

### Auto ML

Setup and run of Auto ML

In [7]:
import  logging

automl_settings = {
    "iteration_timeout_minutes": 2,
    "iterations": 20,
    "primary_metric": 'spearman_correlation',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "n_cross_validations": 5
}

In [8]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(task='regression',
                             debug_log='automated_ml_errors.log',
                             X=encoder(np.array(X_train)),
                             y=y_train.values.flatten(),
                             **automl_settings)



In [9]:

from azureml.core.experiment import Experiment
experiment = Experiment(ws, "taxi-experiment")
local_run = experiment.submit(automl_config, show_output=True)

Running on local machine
Parent Run ID: AutoML_dff0aee0-d34f-466e-934c-f77a1bcba2a2

Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high car

In [10]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

# Model Selection for Certifai Scan

After training a range of models and can now select a pair of the best performing ones to evaluate with Certifai. Note that you can select any number of models to evaluate, however, we only pick two here for illustration.

From the above we see that the best performing model is the Voting Ensemble. Followed closely by a Stack Ensemble model. We will consider the Voting Ensemble to be our "Champion" model

For our challenger, we will use a the best performing Random Tree Classifier (Model 12, Extreme Random Trees). This is simply to demonstrate that in a Certifai context, model type does not make a difference and allows us to compare any selection of models

In [11]:
champ_run, champion_model = local_run.get_output()

challenger_run, challenger_model = local_run.get_output(12)


# Certifai Scan Setup

The below sections take us through the scan setup. We start with library imports

In [12]:
from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiPredictorWrapper, CertifaiModel, CertifaiModelMetric,
                                      CertifaiDataset, CertifaiGroupingFeature, CertifaiDatasetSource,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue)

### Task definition

This is the core of what we will evaluate. Specifically, we will be running:

- Regression type task, as opposed to classification. The task is defined as `CertifaiTaskOutcomes.regression`
- The favorable outcome is a reduction of the outcome variable, ie we consider favorable for the Taxi Ride to cost less rather than more `increased_favorable=False`
- significant change is about 50% of the empirical standard deviation. `change_std_deviation=0.5`. In this case, the empirical standard deviation is about $9.6, hence we consider significant change to be about $5. 


We start defining a scan by assigning it this prediction task. In the remaining steps we will add additional characteristics of the scan

In [13]:

task = CertifaiPredictionTask(CertifaiTaskOutcomes.regression(
        increased_favorable=False,
        change_std_deviation=0.5),
    prediction_description='Predict taxi fare based on features of the trip')

scan = CertifaiScanBuilder.create('test_user_case',
                                  prediction_task=task)

### Adding Models & Data to the Scan

We add the selected models to the scan. Here, we use a PredictorWrapper

We also add the dataset by specifying its location. Here this is done via the `all_data_file` value we set earlier when preparing the data for training. Notice that this is simply a pointer to the location of the dataset 

In [14]:
# Wrap the model up for use by Certifai as a local model
champion_model_proxy = CertifaiPredictorWrapper(champion_model, encoder=encoder)
challenger_model_proxy = CertifaiPredictorWrapper(challenger_model, encoder=encoder)

# Add our local models
first_model = CertifaiModel('champion',
                            local_predictor=champion_model_proxy)
scan.add_model(first_model)

second_model = CertifaiModel('challenger',
                            local_predictor=challenger_model_proxy)
scan.add_model(second_model)

# Add the eval dataset
eval_dataset = CertifaiDataset('evaluation',
                               CertifaiDatasetSource.csv(all_data_file))
scan.add_dataset(eval_dataset)
eval_dataset = CertifaiDataset('explanation',
                               CertifaiDatasetSource.csv(sample_data_file))
scan.add_dataset(eval_dataset)

scan.evaluation_dataset_id = 'evaluation'
# For this analysis we'll generate explanations for the entire dataset so we have a good number
# on which to base statistical measures
scan.explanation_dataset_id = 'explanation'

### Adding Evaluations

Adding evaluations is now very simple, one can just list the ones needed and those will be run by the scan and included in the result object.

Notice that for 'explanation' and 'robustness', simply adding evaluation is sufficient to have the report run. For fairness, we also need to specify the actual sensitive feature we want to assess 'fairness' for. In this case: `passengers` feature

In [None]:
# Setup an evaluation for explanation on the above dataset using the model
scan.add_evaluation_type('explanation')
scan.add_evaluation_type('robustness')
scan.add_evaluation_type('fairness')

scan.add_fairness_grouping_feature(CertifaiGroupingFeature('passengers'))


# Run the scan.
# By default this will write the results into individual report files (one per model and evaluation
# type) in the 'reports' directory relative to the Jupyter root.  This may be disabled by specifying
# `write_reports=False` as below
# The result is a dictionary of dictionaries of reports.  The top level dict key is the evaluation type
# and the second level key is model id.
# Reports saved as JSON (which `write_reports=True` will do) may be visualized in the console app
result = scan.run(write_reports=False)

2020-05-06 08:43:54,451 root   INFO     Validating license...
2020-05-06 08:43:54,452 root   INFO     License is valid - expires: n/a
2020-05-06 08:43:54,466 root   INFO     Generated unique scan id: 96f38c7aa131
2020-05-06 08:43:54,467 root   INFO     Validating input data...
2020-05-06 08:43:54,469 root   INFO     Creating dataset with id: evaluation
2020-05-06 08:43:54,484 root   INFO     Inferring dataset features and applying user overrides
2020-05-06 08:43:54,490 root   INFO     Reading configs from: /Users/sdraper/.certifai/certifai_config.ini
2020-05-06 08:43:54,492 root   INFO     Reading default config (fallback) from: /Users/sdraper/miniconda3/envs/notebooks/lib/python3.6/site-packages/certifai/common/utils/default_certifai_config.ini
2020-05-06 08:43:54,496 root   INFO     Read config marker: config['default']['marker'] = 0.1
2020-05-06 08:43:54,499 root   INFO     Integer-valued feature 'pickup_hour' inferred to be numeric (sample cardinality 24)
2020-05-06 08:43:54,500 ro

2020-05-06 08:53:43,549 root   INFO     Batch run time per generation for instances 288 to 319: 0.06200
2020-05-06 08:53:43,550 root   INFO     Processed 320 examples
2020-05-06 08:54:07,346 root   INFO     Batch run time per generation for instances 320 to 351: 0.06073
2020-05-06 08:54:07,347 root   INFO     Processed 352 examples
2020-05-06 08:54:30,942 root   INFO     Batch run time per generation for instances 352 to 383: 0.06059
2020-05-06 08:54:30,943 root   INFO     Processed 384 examples
2020-05-06 08:54:51,981 root   INFO     Batch run time per generation for instances 384 to 415: 0.06498
2020-05-06 08:54:51,982 root   INFO     Processed 416 examples
2020-05-06 08:55:15,954 root   INFO     Batch run time per generation for instances 416 to 447: 0.06378
2020-05-06 08:55:15,955 root   INFO     Processed 448 examples
2020-05-06 08:55:39,284 root   INFO     Batch run time per generation for instances 448 to 479: 0.06110
2020-05-06 08:55:39,285 root   INFO     Processed 480 example

2020-05-06 09:01:45,719 root   INFO     Current min non-exhausted protected class samples 480 (min for early stop 100)
2020-05-06 09:02:07,695 root   INFO     Batch run time per generation for instances 480 to 511: 0.06255
2020-05-06 09:02:07,696 root   INFO     Current max sampling error 0.017950496048613537 (max for early stop 0.005102040816326531)
2020-05-06 09:02:07,697 root   INFO     Current min non-exhausted protected class samples 512 (min for early stop 100)
2020-05-06 09:02:33,443 root   INFO     Batch run time per generation for instances 512 to 543: 0.06047
2020-05-06 09:02:33,444 root   INFO     Current max sampling error 0.017508137014938033 (max for early stop 0.005102040816326531)
2020-05-06 09:02:33,445 root   INFO     Current min non-exhausted protected class samples 544 (min for early stop 100)
2020-05-06 09:02:54,377 root   INFO     Batch run time per generation for instances 544 to 575: 0.06234
2020-05-06 09:02:54,378 root   INFO     Current max sampling error 0.01

2020-05-06 09:11:31,105 root   INFO     Batch run time per generation for instances 1216 to 1247: 0.06098
2020-05-06 09:11:31,106 root   INFO     Current max sampling error 0.011792712431171337 (max for early stop 0.005102040816326531)
2020-05-06 09:11:31,106 root   INFO     Current min non-exhausted protected class samples 1248 (min for early stop 100)
2020-05-06 09:11:53,709 root   INFO     Batch run time per generation for instances 1248 to 1279: 0.06005
2020-05-06 09:11:53,710 root   INFO     Current max sampling error 0.011628870232954964 (max for early stop 0.005102040816326531)
2020-05-06 09:11:53,711 root   INFO     Current min non-exhausted protected class samples 1280 (min for early stop 100)
2020-05-06 09:12:18,730 root   INFO     Batch run time per generation for instances 1280 to 1311: 0.06419
2020-05-06 09:12:18,731 root   INFO     Current max sampling error 0.01144427375921394 (max for early stop 0.005102040816326531)
2020-05-06 09:12:18,731 root   INFO     Current min n

2020-05-06 09:20:52,420 root   INFO     Current max sampling error 0.009522216268903377 (max for early stop 0.005102040816326531)
2020-05-06 09:20:52,421 root   INFO     Current min non-exhausted protected class samples 1984 (min for early stop 100)
2020-05-06 09:21:22,120 root   INFO     Batch run time per generation for instances 1984 to 2015: 0.07781
2020-05-06 09:21:22,121 root   INFO     Current max sampling error 0.00944478050955007 (max for early stop 0.005102040816326531)
2020-05-06 09:21:22,122 root   INFO     Current min non-exhausted protected class samples 2016 (min for early stop 100)
2020-05-06 09:21:48,984 root   INFO     Batch run time per generation for instances 2016 to 2047: 0.07343
2020-05-06 09:21:48,985 root   INFO     Current max sampling error 0.009369945617252124 (max for early stop 0.005102040816326531)
2020-05-06 09:21:48,986 root   INFO     Current min non-exhausted protected class samples 2048 (min for early stop 100)
2020-05-06 09:22:15,182 root   INFO    

2020-05-06 09:30:59,954 root   INFO     Current min non-exhausted protected class samples 2720 (min for early stop 100)
2020-05-06 09:31:26,400 root   INFO     Batch run time per generation for instances 2720 to 2751: 0.07392
2020-05-06 09:31:26,401 root   INFO     Current max sampling error 0.008184813751203385 (max for early stop 0.005102040816326531)
2020-05-06 09:31:26,402 root   INFO     Current min non-exhausted protected class samples 2752 (min for early stop 100)
2020-05-06 09:31:53,759 root   INFO     Batch run time per generation for instances 2752 to 2783: 0.07127
2020-05-06 09:31:53,760 root   INFO     Current max sampling error 0.008153659037083734 (max for early stop 0.005102040816326531)
2020-05-06 09:31:53,761 root   INFO     Current min non-exhausted protected class samples 2784 (min for early stop 100)
2020-05-06 09:32:19,765 root   INFO     Batch run time per generation for instances 2784 to 2815: 0.07107
2020-05-06 09:32:19,766 root   INFO     Current max sampling e

2020-05-06 09:41:17,378 root   INFO     Batch run time per generation for instances 3456 to 3487: 0.06832
2020-05-06 09:41:17,379 root   INFO     Current max sampling error 0.007261008886191098 (max for early stop 0.005102040816326531)
2020-05-06 09:41:17,380 root   INFO     Current min non-exhausted protected class samples 3488 (min for early stop 100)
2020-05-06 09:41:45,099 root   INFO     Batch run time per generation for instances 3488 to 3519: 0.07057
2020-05-06 09:41:45,100 root   INFO     Current max sampling error 0.007222794686667346 (max for early stop 0.005102040816326531)
2020-05-06 09:41:45,101 root   INFO     Current min non-exhausted protected class samples 3520 (min for early stop 100)
2020-05-06 09:42:09,273 root   INFO     Batch run time per generation for instances 3520 to 3551: 0.07072
2020-05-06 09:42:09,274 root   INFO     Current max sampling error 0.0071965750308973534 (max for early stop 0.005102040816326531)
2020-05-06 09:42:09,275 root   INFO     Current min

2020-05-06 09:51:29,795 root   INFO     Current max sampling error 0.006599716551313218 (max for early stop 0.005102040816326531)
2020-05-06 09:51:29,796 root   INFO     Current min non-exhausted protected class samples 4224 (min for early stop 100)
2020-05-06 09:51:56,181 root   INFO     Batch run time per generation for instances 4224 to 4255: 0.06689
2020-05-06 09:51:56,182 root   INFO     Current max sampling error 0.00658247431551084 (max for early stop 0.005102040816326531)
2020-05-06 09:51:56,183 root   INFO     Current min non-exhausted protected class samples 4256 (min for early stop 100)
2020-05-06 09:52:24,633 root   INFO     Batch run time per generation for instances 4256 to 4287: 0.07080
2020-05-06 09:52:24,634 root   INFO     Current max sampling error 0.0065590737800079795 (max for early stop 0.005102040816326531)
2020-05-06 09:52:24,634 root   INFO     Current min non-exhausted protected class samples 4288 (min for early stop 100)
2020-05-06 09:52:51,326 root   INFO   

2020-05-06 10:01:55,465 root   INFO     Current min non-exhausted protected class samples 4960 (min for early stop 100)
2020-05-06 10:02:22,564 root   INFO     Batch run time per generation for instances 4960 to 4991: 0.06985
2020-05-06 10:02:22,565 root   INFO     Current max sampling error 0.006056555482230735 (max for early stop 0.005102040816326531)
2020-05-06 10:02:22,566 root   INFO     Current min non-exhausted protected class samples 4992 (min for early stop 100)
2020-05-06 10:02:50,032 root   INFO     Batch run time per generation for instances 4992 to 5023: 0.07297
2020-05-06 10:02:50,033 root   INFO     Current max sampling error 0.006038769560288824 (max for early stop 0.005102040816326531)
2020-05-06 10:02:50,034 root   INFO     Current min non-exhausted protected class samples 5024 (min for early stop 100)
2020-05-06 10:03:19,124 root   INFO     Batch run time per generation for instances 5024 to 5055: 0.07310
2020-05-06 10:03:19,125 root   INFO     Current max sampling e

2020-05-06 10:13:00,213 root   INFO     Batch run time per generation for instances 5696 to 5727: 0.07396
2020-05-06 10:13:00,214 root   INFO     Current max sampling error 0.005664536556836592 (max for early stop 0.005102040816326531)
2020-05-06 10:13:00,215 root   INFO     Current min non-exhausted protected class samples 5728 (min for early stop 100)
2020-05-06 10:13:27,604 root   INFO     Batch run time per generation for instances 5728 to 5759: 0.07256
2020-05-06 10:13:27,605 root   INFO     Current max sampling error 0.005642253656934809 (max for early stop 0.005102040816326531)
2020-05-06 10:13:27,606 root   INFO     Current min non-exhausted protected class samples 5760 (min for early stop 100)
2020-05-06 10:13:57,554 root   INFO     Batch run time per generation for instances 5760 to 5791: 0.06917
2020-05-06 10:13:57,555 root   INFO     Current max sampling error 0.005625484082465779 (max for early stop 0.005102040816326531)
2020-05-06 10:13:57,556 root   INFO     Current min 

2020-05-06 10:40:34,870 root   INFO     Running analysis variant 'prediction decreased'
2020-05-06 10:40:35,779 root   INFO     Running with explanation reduction ON
2020-05-06 10:40:35,780 root   INFO     Total dataset size is 6246
2020-05-06 10:41:00,311 root   INFO     Batch run time per generation for instances 0 to 31: 0.08531
2020-05-06 10:41:00,312 root   INFO     Current max sampling error 0.44721358953711005 (max for early stop 0.005102040816326531)
2020-05-06 10:41:00,313 root   INFO     Current min non-exhausted protected class samples 5 (min for early stop 100)
2020-05-06 10:41:20,068 root   INFO     Batch run time per generation for instances 32 to 63: 0.06337
2020-05-06 10:41:20,069 root   INFO     Current max sampling error 0.41548919416105584 (max for early stop 0.005102040816326531)
2020-05-06 10:41:20,070 root   INFO     Current min non-exhausted protected class samples 10 (min for early stop 100)
2020-05-06 10:41:39,458 root   INFO     Batch run time per generation f

2020-05-06 10:48:41,432 root   INFO     Batch run time per generation for instances 704 to 735: 0.06437
2020-05-06 10:48:41,433 root   INFO     Current max sampling error 0.1039858634812004 (max for early stop 0.005102040816326531)
2020-05-06 10:48:41,434 root   INFO     Current min non-exhausted protected class samples 139 (min for early stop 100)
2020-05-06 10:49:04,954 root   INFO     Batch run time per generation for instances 736 to 767: 0.06610
2020-05-06 10:49:04,955 root   INFO     Current max sampling error 0.10511661561925002 (max for early stop 0.005102040816326531)
2020-05-06 10:49:04,956 root   INFO     Current min non-exhausted protected class samples 145 (min for early stop 100)
2020-05-06 10:49:28,345 root   INFO     Batch run time per generation for instances 768 to 799: 0.06726
2020-05-06 10:49:28,346 root   INFO     Current max sampling error 0.10322873413149175 (max for early stop 0.005102040816326531)
2020-05-06 10:49:28,347 root   INFO     Current min non-exhauste

2020-05-06 10:56:40,766 root   INFO     Current min non-exhausted protected class samples 341 (min for early stop 100)
2020-05-06 10:57:01,557 root   INFO     Batch run time per generation for instances 1440 to 1471: 0.06668
2020-05-06 10:57:01,559 root   INFO     Current max sampling error 0.06540773356461144 (max for early stop 0.005102040816326531)
2020-05-06 10:57:01,559 root   INFO     Current min non-exhausted protected class samples 351 (min for early stop 100)
2020-05-06 10:57:23,840 root   INFO     Batch run time per generation for instances 1472 to 1503: 0.06969
2020-05-06 10:57:23,841 root   INFO     Current max sampling error 0.06446796113267807 (max for early stop 0.005102040816326531)
2020-05-06 10:57:23,842 root   INFO     Current min non-exhausted protected class samples 362 (min for early stop 100)
2020-05-06 10:57:44,568 root   INFO     Batch run time per generation for instances 1504 to 1535: 0.06502
2020-05-06 10:57:44,569 root   INFO     Current max sampling error 

2020-05-06 11:05:23,449 root   INFO     Batch run time per generation for instances 2176 to 2207: 0.06615
2020-05-06 11:05:23,450 root   INFO     Current max sampling error 0.04975838419050608 (max for early stop 0.005102040816326531)
2020-05-06 11:05:23,451 root   INFO     Current min non-exhausted protected class samples 597 (min for early stop 100)
2020-05-06 11:05:45,179 root   INFO     Batch run time per generation for instances 2208 to 2239: 0.06071
2020-05-06 11:05:45,180 root   INFO     Current max sampling error 0.04916916295167297 (max for early stop 0.005102040816326531)
2020-05-06 11:05:45,180 root   INFO     Current min non-exhausted protected class samples 607 (min for early stop 100)
2020-05-06 11:06:04,370 root   INFO     Batch run time per generation for instances 2240 to 2271: 0.06909
2020-05-06 11:06:04,371 root   INFO     Current max sampling error 0.048800312632972825 (max for early stop 0.005102040816326531)
2020-05-06 11:06:04,372 root   INFO     Current min non-

2020-05-06 11:13:36,822 root   INFO     Batch run time per generation for instances 2912 to 2943: 0.06691
2020-05-06 11:13:36,823 root   INFO     Current max sampling error 0.03256308151538981 (max for early stop 0.005102040816326531)
2020-05-06 11:13:36,824 root   INFO     Current min non-exhausted protected class samples 937 (min for early stop 100)
2020-05-06 11:14:03,477 root   INFO     Batch run time per generation for instances 2944 to 2975: 0.06425
2020-05-06 11:14:03,478 root   INFO     Current max sampling error 0.03208722858710107 (max for early stop 0.005102040816326531)
2020-05-06 11:14:03,479 root   INFO     Current min non-exhausted protected class samples 953 (min for early stop 100)
2020-05-06 11:14:23,318 root   INFO     Batch run time per generation for instances 2976 to 3007: 0.06775
2020-05-06 11:14:23,319 root   INFO     Current max sampling error 0.0316687048448096 (max for early stop 0.005102040816326531)
2020-05-06 11:14:23,320 root   INFO     Current min non-ex

2020-05-06 11:22:14,142 root   INFO     Current max sampling error 0.028349792569272166 (max for early stop 0.005102040816326531)
2020-05-06 11:22:14,142 root   INFO     Current min non-exhausted protected class samples 1305 (min for early stop 100)
2020-05-06 11:22:35,085 root   INFO     Batch run time per generation for instances 3680 to 3711: 0.07177
2020-05-06 11:22:35,087 root   INFO     Current max sampling error 0.02841695719469219 (max for early stop 0.005102040816326531)
2020-05-06 11:22:35,088 root   INFO     Current min non-exhausted protected class samples 1321 (min for early stop 100)
2020-05-06 11:22:57,781 root   INFO     Batch run time per generation for instances 3712 to 3743: 0.06965
2020-05-06 11:22:57,782 root   INFO     Current max sampling error 0.028199304829438016 (max for early stop 0.005102040816326531)
2020-05-06 11:22:57,783 root   INFO     Current min non-exhausted protected class samples 1337 (min for early stop 100)
2020-05-06 11:23:23,674 root   INFO    

2020-05-06 11:30:24,144 root   INFO     Batch run time per generation for instances 4384 to 4415: 0.06806
2020-05-06 11:30:24,145 root   INFO     Current max sampling error 0.02206685114410366 (max for early stop 0.005102040816326531)
2020-05-06 11:30:24,146 root   INFO     Current min non-exhausted protected class samples 1731 (min for early stop 100)
2020-05-06 11:30:44,539 root   INFO     Batch run time per generation for instances 4416 to 4447: 0.06602
2020-05-06 11:30:44,540 root   INFO     Current max sampling error 0.021851813053330418 (max for early stop 0.005102040816326531)
2020-05-06 11:30:44,541 root   INFO     Current min non-exhausted protected class samples 1763 (min for early stop 100)
2020-05-06 11:31:04,437 root   INFO     Batch run time per generation for instances 4448 to 4479: 0.06636
2020-05-06 11:31:04,438 root   INFO     Current max sampling error 0.02155785752058456 (max for early stop 0.005102040816326531)
2020-05-06 11:31:04,439 root   INFO     Current min no

2020-05-06 11:38:27,209 root   INFO     Current max sampling error 0.01880722831376587 (max for early stop 0.005102040816326531)
2020-05-06 11:38:27,210 root   INFO     Current min non-exhausted protected class samples 2467 (min for early stop 100)
2020-05-06 11:38:47,463 root   INFO     Batch run time per generation for instances 5152 to 5183: 0.06252
2020-05-06 11:38:47,464 root   INFO     Current max sampling error 0.018704641954426084 (max for early stop 0.005102040816326531)
2020-05-06 11:38:47,465 root   INFO     Current min non-exhausted protected class samples 2499 (min for early stop 100)
2020-05-06 11:39:07,222 root   INFO     Batch run time per generation for instances 5184 to 5215: 0.06195
2020-05-06 11:39:07,223 root   INFO     Current max sampling error 0.018605002156300076 (max for early stop 0.005102040816326531)
2020-05-06 11:39:07,224 root   INFO     Current min non-exhausted protected class samples 2531 (min for early stop 100)
2020-05-06 11:39:28,497 root   INFO    

2020-05-06 11:46:16,379 root   INFO     Current min non-exhausted protected class samples 3203 (min for early stop 100)
2020-05-06 11:46:40,151 root   INFO     Batch run time per generation for instances 5888 to 5919: 0.07079
2020-05-06 11:46:40,152 root   INFO     Current max sampling error 0.018071327253873875 (max for early stop 0.005102040816326531)
2020-05-06 11:46:40,152 root   INFO     Current min non-exhausted protected class samples 3235 (min for early stop 100)
2020-05-06 11:46:57,432 root   INFO     Batch run time per generation for instances 5920 to 5951: 0.06675
2020-05-06 11:46:57,433 root   INFO     Current max sampling error 0.018011133205650544 (max for early stop 0.005102040816326531)
2020-05-06 11:46:57,434 root   INFO     Current min non-exhausted protected class samples 3267 (min for early stop 100)
2020-05-06 11:47:16,175 root   INFO     Batch run time per generation for instances 5952 to 5983: 0.06672
2020-05-06 11:47:16,176 root   INFO     Current max sampling e

2020-05-06 11:50:16,877 root   INFO     Running without scaler, using identity scaler.
2020-05-06 11:50:16,882 root   INFO     Running analysis variant 'prediction increased'
2020-05-06 11:50:17,623 root   INFO     Running with explanation reduction ON
2020-05-06 11:50:17,624 root   INFO     Total dataset size is 500
2020-05-06 11:50:34,475 root   INFO     Batch run time per generation for instances 0 to 31: 0.03697
2020-05-06 11:50:34,475 root   INFO     Processed 32 examples
2020-05-06 11:50:47,162 root   INFO     Batch run time per generation for instances 32 to 63: 0.03133
2020-05-06 11:50:47,163 root   INFO     Processed 64 examples
2020-05-06 11:50:59,337 root   INFO     Batch run time per generation for instances 64 to 95: 0.03265
2020-05-06 11:50:59,337 root   INFO     Processed 96 examples
2020-05-06 11:51:12,624 root   INFO     Batch run time per generation for instances 96 to 127: 0.03078
2020-05-06 11:51:12,625 root   INFO     Processed 128 examples
2020-05-06 11:51:25,098 

2020-05-06 11:57:15,576 root   INFO     Current max sampling error 0.05089357448306229 (max for early stop 0.005102040816326531)
2020-05-06 11:57:15,577 root   INFO     Current min non-exhausted protected class samples 64 (min for early stop 100)
2020-05-06 11:57:27,755 root   INFO     Batch run time per generation for instances 64 to 95: 0.03338
2020-05-06 11:57:27,756 root   INFO     Current max sampling error 0.043334796693234694 (max for early stop 0.005102040816326531)
2020-05-06 11:57:27,757 root   INFO     Current min non-exhausted protected class samples 96 (min for early stop 100)
2020-05-06 11:57:40,788 root   INFO     Batch run time per generation for instances 96 to 127: 0.03325
2020-05-06 11:57:40,788 root   INFO     Current max sampling error 0.03712743531463795 (max for early stop 0.005102040816326531)
2020-05-06 11:57:40,789 root   INFO     Current min non-exhausted protected class samples 128 (min for early stop 100)
2020-05-06 11:57:53,425 root   INFO     Batch run ti

2020-05-06 12:02:06,739 root   INFO     Current min non-exhausted protected class samples 800 (min for early stop 100)
2020-05-06 12:02:21,510 root   INFO     Batch run time per generation for instances 800 to 831: 0.03818
2020-05-06 12:02:21,511 root   INFO     Current max sampling error 0.014394652559826764 (max for early stop 0.005102040816326531)
2020-05-06 12:02:21,511 root   INFO     Current min non-exhausted protected class samples 832 (min for early stop 100)
2020-05-06 12:02:34,521 root   INFO     Batch run time per generation for instances 832 to 863: 0.03278
2020-05-06 12:02:34,522 root   INFO     Current max sampling error 0.014267289844870036 (max for early stop 0.005102040816326531)
2020-05-06 12:02:34,523 root   INFO     Current min non-exhausted protected class samples 864 (min for early stop 100)
2020-05-06 12:02:47,114 root   INFO     Batch run time per generation for instances 864 to 895: 0.03480
2020-05-06 12:02:47,115 root   INFO     Current max sampling error 0.01

2020-05-06 12:07:18,309 root   INFO     Batch run time per generation for instances 1536 to 1567: 0.03367
2020-05-06 12:07:18,309 root   INFO     Current max sampling error 0.010862500405076914 (max for early stop 0.005102040816326531)
2020-05-06 12:07:18,310 root   INFO     Current min non-exhausted protected class samples 1568 (min for early stop 100)
2020-05-06 12:07:30,915 root   INFO     Batch run time per generation for instances 1568 to 1599: 0.03474
2020-05-06 12:07:30,916 root   INFO     Current max sampling error 0.010752685246707897 (max for early stop 0.005102040816326531)
2020-05-06 12:07:30,917 root   INFO     Current min non-exhausted protected class samples 1600 (min for early stop 100)
2020-05-06 12:07:43,351 root   INFO     Batch run time per generation for instances 1600 to 1631: 0.03238
2020-05-06 12:07:43,352 root   INFO     Current max sampling error 0.010642438376489963 (max for early stop 0.005102040816326531)
2020-05-06 12:07:43,352 root   INFO     Current min 

2020-05-06 12:12:15,526 root   INFO     Current max sampling error 0.00906375939776098 (max for early stop 0.005102040816326531)
2020-05-06 12:12:15,526 root   INFO     Current min non-exhausted protected class samples 2304 (min for early stop 100)
2020-05-06 12:12:29,028 root   INFO     Batch run time per generation for instances 2304 to 2335: 0.03545
2020-05-06 12:12:29,029 root   INFO     Current max sampling error 0.009019522237750304 (max for early stop 0.005102040816326531)
2020-05-06 12:12:29,030 root   INFO     Current min non-exhausted protected class samples 2336 (min for early stop 100)
2020-05-06 12:12:41,219 root   INFO     Batch run time per generation for instances 2336 to 2367: 0.03183
2020-05-06 12:12:41,220 root   INFO     Current max sampling error 0.008955080692227321 (max for early stop 0.005102040816326531)
2020-05-06 12:12:41,221 root   INFO     Current min non-exhausted protected class samples 2368 (min for early stop 100)
2020-05-06 12:12:53,766 root   INFO    

2020-05-06 12:17:01,323 root   INFO     Current min non-exhausted protected class samples 3040 (min for early stop 100)
2020-05-06 12:17:12,999 root   INFO     Batch run time per generation for instances 3040 to 3071: 0.03222
2020-05-06 12:17:13,000 root   INFO     Current max sampling error 0.00798634924348838 (max for early stop 0.005102040816326531)
2020-05-06 12:17:13,001 root   INFO     Current min non-exhausted protected class samples 3072 (min for early stop 100)
2020-05-06 12:17:26,639 root   INFO     Batch run time per generation for instances 3072 to 3103: 0.03182
2020-05-06 12:17:26,640 root   INFO     Current max sampling error 0.007936658774854725 (max for early stop 0.005102040816326531)
2020-05-06 12:17:26,641 root   INFO     Current min non-exhausted protected class samples 3104 (min for early stop 100)
2020-05-06 12:17:38,831 root   INFO     Batch run time per generation for instances 3104 to 3135: 0.03400
2020-05-06 12:17:38,832 root   INFO     Current max sampling er

2020-05-06 12:22:03,989 root   INFO     Batch run time per generation for instances 3776 to 3807: 0.03315
2020-05-06 12:22:03,990 root   INFO     Current max sampling error 0.007139209392716991 (max for early stop 0.005102040816326531)
2020-05-06 12:22:03,991 root   INFO     Current min non-exhausted protected class samples 3808 (min for early stop 100)
2020-05-06 12:22:16,468 root   INFO     Batch run time per generation for instances 3808 to 3839: 0.03207
2020-05-06 12:22:16,468 root   INFO     Current max sampling error 0.007106805266777703 (max for early stop 0.005102040816326531)
2020-05-06 12:22:16,469 root   INFO     Current min non-exhausted protected class samples 3840 (min for early stop 100)
2020-05-06 12:22:28,331 root   INFO     Batch run time per generation for instances 3840 to 3871: 0.03332
2020-05-06 12:22:28,332 root   INFO     Current max sampling error 0.007082524846294007 (max for early stop 0.005102040816326531)
2020-05-06 12:22:28,333 root   INFO     Current min 

2020-05-06 12:27:01,661 root   INFO     Current max sampling error 0.006550889493647335 (max for early stop 0.005102040816326531)
2020-05-06 12:27:01,662 root   INFO     Current min non-exhausted protected class samples 4544 (min for early stop 100)
2020-05-06 12:27:14,720 root   INFO     Batch run time per generation for instances 4544 to 4575: 0.03486
2020-05-06 12:27:14,721 root   INFO     Current max sampling error 0.0065197149019744745 (max for early stop 0.005102040816326531)
2020-05-06 12:27:14,721 root   INFO     Current min non-exhausted protected class samples 4576 (min for early stop 100)
2020-05-06 12:27:28,876 root   INFO     Batch run time per generation for instances 4576 to 4607: 0.03542
2020-05-06 12:27:28,877 root   INFO     Current max sampling error 0.006492191652079503 (max for early stop 0.005102040816326531)
2020-05-06 12:27:28,878 root   INFO     Current min non-exhausted protected class samples 4608 (min for early stop 100)
2020-05-06 12:27:41,334 root   INFO  

2020-05-06 12:32:14,026 root   INFO     Current min non-exhausted protected class samples 5280 (min for early stop 100)
2020-05-06 12:32:28,304 root   INFO     Batch run time per generation for instances 5280 to 5311: 0.03552
2020-05-06 12:32:28,305 root   INFO     Current max sampling error 0.006041161755316982 (max for early stop 0.005102040816326531)
2020-05-06 12:32:28,306 root   INFO     Current min non-exhausted protected class samples 5312 (min for early stop 100)
2020-05-06 12:32:40,408 root   INFO     Batch run time per generation for instances 5312 to 5343: 0.03392
2020-05-06 12:32:40,409 root   INFO     Current max sampling error 0.006024301530618477 (max for early stop 0.005102040816326531)
2020-05-06 12:32:40,409 root   INFO     Current min non-exhausted protected class samples 5344 (min for early stop 100)
2020-05-06 12:32:53,622 root   INFO     Batch run time per generation for instances 5344 to 5375: 0.03477
2020-05-06 12:32:53,623 root   INFO     Current max sampling e

2020-05-06 12:37:44,833 root   INFO     Batch run time per generation for instances 6016 to 6047: 0.03663
2020-05-06 12:37:44,834 root   INFO     Current max sampling error 0.005676117486439008 (max for early stop 0.005102040816326531)
2020-05-06 12:37:44,834 root   INFO     Current min non-exhausted protected class samples 6048 (min for early stop 100)
2020-05-06 12:37:57,829 root   INFO     Batch run time per generation for instances 6048 to 6079: 0.03493
2020-05-06 12:37:57,830 root   INFO     Current max sampling error 0.005665926131422506 (max for early stop 0.005102040816326531)
2020-05-06 12:37:57,831 root   INFO     Current min non-exhausted protected class samples 6080 (min for early stop 100)
2020-05-06 12:38:10,356 root   INFO     Batch run time per generation for instances 6080 to 6111: 0.03578
2020-05-06 12:38:10,357 root   INFO     Current max sampling error 0.0056489235953245155 (max for early stop 0.005102040816326531)
2020-05-06 12:38:10,357 root   INFO     Current min

2020-05-06 13:01:45,205 root   INFO     Current min non-exhausted protected class samples 49 (min for early stop 100)
2020-05-06 13:01:55,750 root   INFO     Batch run time per generation for instances 288 to 319: 0.03482
2020-05-06 13:01:55,751 root   INFO     Current max sampling error 0.185043553443403 (max for early stop 0.005102040816326531)
2020-05-06 13:01:55,752 root   INFO     Current min non-exhausted protected class samples 55 (min for early stop 100)
2020-05-06 13:02:05,969 root   INFO     Batch run time per generation for instances 320 to 351: 0.04010
2020-05-06 13:02:05,970 root   INFO     Current max sampling error 0.1801967140892845 (max for early stop 0.005102040816326531)
2020-05-06 13:02:05,970 root   INFO     Current min non-exhausted protected class samples 62 (min for early stop 100)
2020-05-06 13:02:17,476 root   INFO     Batch run time per generation for instances 352 to 383: 0.03521
2020-05-06 13:02:17,477 root   INFO     Current max sampling error 0.1717701385

2020-05-06 13:05:56,882 root   INFO     Current min non-exhausted protected class samples 202 (min for early stop 100)
2020-05-06 13:06:06,690 root   INFO     Batch run time per generation for instances 1024 to 1055: 0.03542
2020-05-06 13:06:06,691 root   INFO     Current max sampling error 0.09459216997068481 (max for early stop 0.005102040816326531)
2020-05-06 13:06:06,691 root   INFO     Current min non-exhausted protected class samples 213 (min for early stop 100)
2020-05-06 13:06:16,398 root   INFO     Batch run time per generation for instances 1056 to 1087: 0.03679
2020-05-06 13:06:16,399 root   INFO     Current max sampling error 0.09186286161839087 (max for early stop 0.005102040816326531)
2020-05-06 13:06:16,400 root   INFO     Current min non-exhausted protected class samples 223 (min for early stop 100)
2020-05-06 13:06:27,842 root   INFO     Batch run time per generation for instances 1088 to 1119: 0.03791
2020-05-06 13:06:27,843 root   INFO     Current max sampling error 

2020-05-06 13:10:16,559 root   INFO     Batch run time per generation for instances 1760 to 1791: 0.03743
2020-05-06 13:10:16,559 root   INFO     Current max sampling error 0.06540821265635777 (max for early stop 0.005102040816326531)
2020-05-06 13:10:16,560 root   INFO     Current min non-exhausted protected class samples 458 (min for early stop 100)
2020-05-06 13:10:27,415 root   INFO     Batch run time per generation for instances 1792 to 1823: 0.03707
2020-05-06 13:10:27,416 root   INFO     Current max sampling error 0.06453822003416465 (max for early stop 0.005102040816326531)
2020-05-06 13:10:27,417 root   INFO     Current min non-exhausted protected class samples 469 (min for early stop 100)
2020-05-06 13:10:37,766 root   INFO     Batch run time per generation for instances 1824 to 1855: 0.03621
2020-05-06 13:10:37,767 root   INFO     Current max sampling error 0.06378241517952105 (max for early stop 0.005102040816326531)
2020-05-06 13:10:37,767 root   INFO     Current min non-e

# Review of Certifai Scan Results

The results of the scan are stored in an extensive result variable. Here we would like to just preview the high-level outcomes.

## Explanations
We start with model level explanations to gauge what are the main drivers of taxi fares.

What we will do here is essentially aggregate the number of times a certain variable has been used for generating a counterfactual, thus assessing its overall importance to the model

The outcome is not surprising - by far the most important variable is distance traveled. Second in only a third of the first is pickup hour, which also makes sense as in some cases late evening and early morning hours are charged differently. Number of passengers  as well as the actual vendor used seem to play the least role in determining the fare. All of this is good as it validates our expectations on how taxi fares work and the sensibility of the models we have in place.

Notice, that crucially, this level of detail is rarely available from simply training a model in itself or interpretability might be lost due to latent and/or dummy features generated along the way of making the model work. This analysis, however, is crucial for our understanding, confidence and ultimately trust in the behavior of the model

In [None]:
%load_ext autoreload

In [None]:
%autoreload 2
from tutorial_utils import get_feature_frequency, plot_histogram,plot_fairness_burden

In [None]:
# Plot a histogram of frequency of occurrence of changes to each feature in counterfactuals
%matplotlib inline
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=[15,6])
fig.suptitle('Feature occurrence frequency by model', fontsize=20)

plot_histogram(ax1, 'champion', result)
plot_histogram(ax2, 'challenger', result)

# Put them both on the same scale
ylim = max(ax1.get_ylim()[1], ax2.get_ylim()[1])
ax1.set_ylim(top=ylim)
ax2.set_ylim(top=ylim)

plt.show()

## Fairness to different sized groups

Finally, we set out here to determine the fairness of Taxi fares between differently sized groups of riders

Looking at the results it seems that the burden on all groups is largely the same with groups of 4 passengers seemingly receiving a slightly higher burden than others. However, those also come with a much wider confidence range which, when taken into account, makes them not significantly different from other groups. Additionally, looking at group counts, groups of 4 are also far less present in the sample - only 41 examples out of ca 6250.


In [None]:
%matplotlib inline

from certifai.scanner.report_utils import scores, construct_scores_dataframe


df_rslt=construct_scores_dataframe(scores('fairness', result, max_depth=1))
display(df_rslt)

group_categories=[f"({i}.0)" for i in range(1,6)]
group_xlabels=['single']+[f'{i} passengers' for i in range(2,6)]

plot_fairness_burden(df_rslt,group_categories,group_xlabels)

# Logging Results

Finally, using the workspace we already set up and assign a dedicated experiment for these runs. Here, we can log some key metrics for future reference

In [None]:

experiment = Experiment(ws, "certifai-rslt")

run = experiment.start_logging()
run.log('Fairness-Champion',value=result['fairness']['champion']['fairness']['score'])
run.log('Fairness-Challenger',value=result['fairness']['challenger']['fairness']['score'])
run.complete()