# Building a scan programmatically with your own model and compare the performance of different predictive models

**In this notebook, we will be demonstrating how to create a scan in Certifai using your own model. We will show some examples of how to use models and datasets to run scans**

### Insert the Credentials in the below cell as 'credentials' from the dropdown option of any of the assets 

In [1]:
credentials = {
    'IAM_SERVICE_ID': '',
    'IBM_API_KEY_ID': '',
    'ENDPOINT': '',
    'IBM_AUTH_ENDPOINT': '',
    'BUCKET': '',
    'FILE': ''
}


In [2]:
from ibm_botocore.client import Config
import ibm_boto3

cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])

#### Set the working directory to '/home/dsxuser/work'

In [3]:
import os
os.getcwd()

'/home/dsxuser/work'

### Download the files from object storage to current working directory

In [4]:
cos.download_file(Bucket=credentials['BUCKET'],Key='german_credit_eval.csv',Filename='/home/dsxuser/work/german_credit_eval.csv')
cos.download_file(Bucket=credentials['BUCKET'],Key='cat_encoder.py',Filename='/home/dsxuser/work/cat_encoder.py')
cos.download_file(Bucket=credentials['BUCKET'],Key='cortex-certifai-common-1.3.4-126-g06d3fae5.zip',Filename='/home/dsxuser/work/cortex-certifai-common-1.3.4-126-g06d3fae5.zip')
cos.download_file(Bucket=credentials['BUCKET'],Key='cortex-certifai-scanner-1.3.4-126-g06d3fae5.zip',Filename='/home/dsxuser/work/cortex-certifai-scanner-1.3.4-126-g06d3fae5.zip')
cos.download_file(Bucket=credentials['BUCKET'],Key='cortex-certifai-engine-1.3.4-126-g06d3fae5-py3.6.10.zip',Filename='/home/dsxuser/work/cortex-certifai-engine-1.3.4-126-g06d3fae5-py3.6.10.zip')

### Check for files in the directory

In [5]:
ls

cat_encoder.py
cortex-certifai-common-1.3.4-126-g06d3fae5.zip
cortex-certifai-engine-1.3.4-126-g06d3fae5-py3.6.10.zip
cortex-certifai-scanner-1.3.4-126-g06d3fae5.zip
german_credit_eval.csv
[0m[01;34m__pycache__[0m/


**To begin, we will install the libraries required to run Certifai scans via Jupyter Notebook in Watson Studio**

In [6]:
!pip install cortex-certifai-common-1.3.4-126-g06d3fae5.zip
!pip install cortex-certifai-scanner-1.3.4-126-g06d3fae5.zip
!pip install cortex-certifai-engine-1.3.4-126-g06d3fae5-py3.6.10.zip
!pip install pandas_profiling
!pip install --upgrade matplotlib
!pip install lightgbm

Processing ./cortex-certifai-common-1.3.4-126-g06d3fae5.zip
Collecting pandas<=1.0.3,>=0.23.4 (from cortex-certifai-common==1.3.4)
[?25l  Downloading https://files.pythonhosted.org/packages/bb/71/8f53bdbcbc67c912b888b40def255767e475402e9df64050019149b1a943/pandas-1.0.3-cp36-cp36m-manylinux1_x86_64.whl (10.0MB)
[K     |████████████████████████████████| 10.0MB 10.3MB/s eta 0:00:01
Building wheels for collected packages: cortex-certifai-common
  Building wheel for cortex-certifai-common (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/b8/17/f5/8328fdf5ca54127883b8c07dabad35cd21d7c79f3af43a721c
Successfully built cortex-certifai-common
Installing collected packages: pandas, cortex-certifai-common
  Found existing installation: pandas 1.1.3
    Uninstalling pandas-1.1.3:
      Successfully uninstalled pandas-1.1.3
  Found existing installation: cortex-certifai-common 1.3.4
    Uninstalling cortex-certifai-common-1.3.4:
      Successfully uninstalled c

In [7]:
import pandas as pd
import matplotlib as plt
from IPython.display import display
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import numpy as np
import random
import pandas_profiling as pp
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
import lightgbm as lgb
from lightgbm import LGBMClassifier
from sklearn import metrics
from sklearn import svm
from copy import copy
import yaml
from cat_encoder import CatEncoder

from certifai.scanner.builder import (CertifaiScanBuilder, CertifaiPredictorWrapper, CertifaiModel, CertifaiModelMetric,
                                      CertifaiDataset, CertifaiGroupingFeature, CertifaiDatasetSource,
                                      CertifaiPredictionTask, CertifaiTaskOutcomes, CertifaiOutcomeValue)
from certifai.scanner.report_utils import scores, construct_scores_dataframe

**For multiprocessing to work in a Notebook, we need the encoder ( cat_encoder.py file ) to be outside of the notebook. This code imports the encoder (for categorical encoding) in a way that works in hosted notebooks as well as locally.**

# STEP (1): Setting up the dataset and models to be scanned

**Task 1): Setting up the dataset**

**Load the data into a DataFrame and rename it as 'df' for use in both training and later analysing the model. In this example we use the German Credit dataset**

In [8]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_943f0d0348cb4b5fb50f9c338e8d8cc1 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='iGfONi2T8r2eFxE5bjfYHkOGXYbv0NxOqpZ4kTNrN5Mr',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_943f0d0348cb4b5fb50f9c338e8d8cc1.get_object(Bucket='rhmcortexcertifai-donotdelete-pr-yfgtf2gen9cp86',Key='german_credit_eval.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.head()


Unnamed: 0,checkingstatus,duration,history,purpose,amount,savings,employ,installment,status,others,...,property,age,otherplans,housing,cards,job,liable,telephone,foreign,outcome
0,... >= 200 DM / salary assignments for at leas...,6,critical account/ other credits existing (not ...,car (new),1343,... < 100 DM,.. >= 7 years,1,male : single,others - none,...,real estate,> 25 years,none,own,2,skilled employee / official,2,phone - none,foreign - no,1
1,... < 0 DM,28,existing credits paid back duly till now,car (new),4006,... < 100 DM,1 <= ... < 4 years,3,male : single,others - none,...,"car or other, not in attribute 6",> 25 years,none,own,1,unskilled - resident,1,phone - none,foreign - yes,2
2,no checking account,24,existing credits paid back duly till now,radio/television,2284,... < 100 DM,4 <= ... < 7 years,4,male : single,others - none,...,"car or other, not in attribute 6",> 25 years,none,own,1,skilled employee / official,1,"phone - yes, registered under the customers name",foreign - yes,1
3,no checking account,24,existing credits paid back duly till now,radio/television,1533,... < 100 DM,... < 1 year,4,female : divorced/separated/married,others - none,...,"car or other, not in attribute 6",> 25 years,stores,own,1,skilled employee / official,1,"phone - yes, registered under the customers name",foreign - yes,1
4,no checking account,12,existing credits paid back duly till now,car (new),1101,... < 100 DM,1 <= ... < 4 years,3,male : married/widowed,others - none,...,real estate,> 25 years,none,own,2,skilled employee / official,1,"phone - yes, registered under the customers name",foreign - yes,1


### Exploratory data analysis

In [9]:
pp.ProfileReport(df)

HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=35.0), HTML(value='')))




HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0), HTML(value='')))






**Task 2): Set the categorical columns of the dataset up for the encoder (in our case we will encapsulate this in the CatEncoder class, which may be found in the same directory as this notebook). We also note the column that contains the ground truth labels for training in 'label_column' (in this dataset this is 'outcome').**

In [10]:
cat_columns = [
    'checkingstatus',
    'history',
    'purpose',
    'savings',
    'employ',
    'status',
    'others',
    'property',
    'age',
    'otherplans',
    'housing',
    'job',
    'telephone',
    'foreign'
    ]

label_column = 'outcome'

**In our example we use a simple logistic classifier from sklearn. This is where you can add your own model. Rather than using the one provided, you can import and set up your model to be used here.**

**Task 3) Because the outcome column won't be presented to the model at prediction time we need to drop it from the dataset. We then split into a test and train set.**

In [11]:
y = df[label_column]
X = df.drop(label_column, axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)


**Task 4) Set the encoder**

In [12]:
encoder = CatEncoder(cat_columns, X)

**Task 5) Fit the classification model and build the first model**

In [13]:
def build_model_lr(data, name, test=None):
    if test is None:
        test = data
        

    parameters = {'C': (0.5, 1.0, 2.0), 'solver': ['lbfgs'], 'max_iter': [1000]}
    m = LogisticRegression()
    model = GridSearchCV(m, parameters, cv=3)
    model.fit(data[0], data[1])

    # Assess on the test data
    accuracy = model.score(test[0], test[1].values)
    print(f"Model '{name}' accuracy is {accuracy}")
    return model

logistic_model = build_model_lr((encoder(X_train.values), y_train),
                        'Logistic classifier',
                        test=(encoder(X_test.values), y_test))

Model 'Logistic classifier' accuracy is 0.77


**Task 6) Wrap up the model and the encoder so that Certifai sees it as part of the model**

In [14]:
logistic_model_proxy = CertifaiPredictorWrapper(logistic_model, encoder=encoder)

**Task 7) Compute model's accuracy with the test dataset**

In [15]:
logistic_accuracy = logistic_model.score(encoder(X_test.values), y_test.values)
print(f"Logistic classifier model accuracy on test data is {logistic_accuracy}")

Logistic classifier model accuracy on test data is 0.77


In [16]:
print("Trained logistic model :: ", logistic_model)

Trained logistic model ::  GridSearchCV(cv=3, error_score='raise-deprecating',
       estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': (0.5, 1.0, 2.0), 'solver': ['lbfgs'], 'max_iter': [1000]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)


**Build the second model**

In [17]:
def build_model_rf(data, name, test=None):
    if test is None:
        test = data
        
    model = RandomForestClassifier()
    model.fit(data[0], data[1])

    # Assess on the test data
    accuracy = model.score(test[0], test[1].values)
    print(f"Model '{name}' accuracy is {accuracy}")
    return model

rf_model = build_model_rf((encoder(X_train.values), y_train),
                        'Random Forest Classifier',
                        test=(encoder(X_test.values), y_test))

Model 'Random Forest Classifier' accuracy is 0.765




In [18]:
rf_model_proxy = CertifaiPredictorWrapper(rf_model, encoder=encoder)

In [19]:
rf_accuracy = rf_model.score(encoder(X_test.values), y_test.values)
print(f"rf classifier model accuracy on test data is {rf_accuracy}")

rf classifier model accuracy on test data is 0.765


In [20]:
print("Trained random forest model :: ", rf_model)

Trained random forest model ::  RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)


**Build the third model**

In [21]:
def build_model_lgbm(data, name, test=None):
    if test is None:
        test = data

    model = LGBMClassifier()
    model.fit(data[0], data[1])

    # Assess on the test data
    accuracy = model.score(test[0], test[1].values)
    print(f"Model '{name}' accuracy is {accuracy}")
    return model

lgbm_model = build_model_lgbm((encoder(X_train.values), y_train),
                        'Light GBM Classifier',
                        test=(encoder(X_test.values), y_test))

Model 'Light GBM Classifier' accuracy is 0.79


In [22]:
lgbm_model_proxy = CertifaiPredictorWrapper(lgbm_model, encoder=encoder)

In [23]:
lgbm_accuracy = lgbm_model.score(encoder(X_test.values), y_test.values)
print(f"lgbm classifier model accuracy on test data is {lgbm_accuracy}")

lgbm classifier model accuracy on test data is 0.79


In [24]:
print("Trained lgbm model :: ", lgbm_model)

Trained lgbm model ::  LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
        importance_type='split', learning_rate=0.1, max_depth=-1,
        min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
        n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
        random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
        subsample=1.0, subsample_for_bin=200000, subsample_freq=0)


**Build the fourth model**

In [25]:
def build_model_dt(data, name, test=None):
    if test is None:
        test = data
        
    model = DecisionTreeClassifier()
    model.fit(data[0], data[1])

    # Assess on the test data
    accuracy = model.score(test[0], test[1].values)
    print(f"Model '{name}' accuracy is {accuracy}")
    return model

dt_model = build_model_dt((encoder(X_train.values), y_train),
                        'Decision Tree Classifier',
                        test=(encoder(X_test.values), y_test))

Model 'Decision Tree Classifier' accuracy is 0.735


In [26]:
dt_model_proxy = CertifaiPredictorWrapper(dt_model, encoder=encoder)

In [27]:
dt_accuracy = dt_model.score(encoder(X_test.values), y_test.values)
print(f"dt classifier model accuracy on test data is {dt_accuracy}")

dt classifier model accuracy on test data is 0.735


In [28]:
print("Trained decision tree model :: ", dt_model)

Trained decision tree model ::  DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')


# Step (2) Create the scan object using the ScanBuilder class

**To allow easy working with Certifai from notebooks, or other programmatic use cases, the `ScanBuilder` class abstracts the scan definition and provides an object model to manipulate it.  Building up a definition in this way allows either direct running of the scan in the notebook, or export as a scan definition file, which can be run by the Certifai scanner.**

**Task 1) Define the outcomes of the classification task**

In [29]:
# Define the possible prediction outcomes
task = CertifaiPredictionTask(CertifaiTaskOutcomes.classification(
    [
        CertifaiOutcomeValue(1, name='Loan granted', favorable=True),
        CertifaiOutcomeValue(2, name='Loan denied')
    ]),
    prediction_description='Determine whether a loan should be granted')

**Task 2) Create the Certifai scan object**

In [30]:
scan = CertifaiScanBuilder.create('test_case',
                                  prediction_task=task)

**Task 3) Create the Certifai dataset from the local dataset**

In [31]:
base_path = '/home/dsxuser/work/'
all_data_file = f"{base_path}/german_credit_eval.csv"
# Add the eval dataset
eval_dataset = CertifaiDataset('evaluation',
                               CertifaiDatasetSource.csv(all_data_file))

**Task 4) Create the Certifai model from the local model**


In [32]:
# Add our first model
first_model = CertifaiModel('logistic_regression',
                            local_predictor=logistic_model_proxy)
scan.add_model(first_model)

In [33]:
# Add our second model
second_model = CertifaiModel('random_forest',
                            local_predictor=rf_model_proxy)
scan.add_model(second_model)

In [34]:
# Add our third model
third_model = CertifaiModel('light_GBM',
                            local_predictor=lgbm_model_proxy)
scan.add_model(third_model)

In [35]:
# Add our fourth model
fourth_model = CertifaiModel('decision_tree',
                            local_predictor=dt_model_proxy)
scan.add_model(fourth_model)

**Task 5) Setup an evaluation for fairness, robustness, and explainability on the above dataset using the model**

**We can have one or many of the following analysis types:**
- fairness
- robustness
- explainability
- explanation
- performance

In [36]:
scan.add_dataset(eval_dataset)
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('age'))
scan.add_fairness_grouping_feature(CertifaiGroupingFeature('status'))
scan.add_evaluation_type('fairness')
scan.add_evaluation_type('explainability')
scan.add_evaluation_type('robustness')
scan.evaluation_dataset_id = 'evaluation'

**Task 6) Because the dataset contains a ground truth outcome column which the model does not expect to receive as input we need to state that in the dataset schema (since it cannot be inferred from the CSV) so that the scan can be rerun from the definition.**

In [37]:
scan.dataset_schema.outcome_feature_name = 'outcome'

**Task 7) Run the scan. 
    By default this will write the results into individual report files (one per model and evaluation
    type) in the 'reports' directory relative to the notebook.  This may be disabled by specifying
    `write_reports=False` as below**

In [38]:
#Run the Scan
result = scan.run(write_reports=False)



Starting scan with model_use_case_id: 'test_case' and scan_id: '2e164cfee346'
[--------------------] 2020-10-14 11:51:16.969564 - 0 of 12 reports (0.0% complete) - Running fairness evaluation for model: logistic_regression




[#-------------------] 2020-10-14 11:57:15.770191 - 1 of 12 reports (8.33% complete) - Running explainability evaluation for model: logistic_regression
[###-----------------] 2020-10-14 12:00:54.762527 - 2 of 12 reports (16.67% complete) - Running robustness evaluation for model: logistic_regression




[#####---------------] 2020-10-14 12:04:17.961837 - 3 of 12 reports (25.0% complete) - Running fairness evaluation for model: random_forest




[######--------------] 2020-10-14 12:06:14.541279 - 4 of 12 reports (33.33% complete) - Running explainability evaluation for model: random_forest
[########------------] 2020-10-14 12:07:13.721539 - 5 of 12 reports (41.67% complete) - Running robustness evaluation for model: random_forest




[##########----------] 2020-10-14 12:08:04.986376 - 6 of 12 reports (50.0% complete) - Running fairness evaluation for model: light_GBM




[###########---------] 2020-10-14 12:17:40.783923 - 7 of 12 reports (58.33% complete) - Running explainability evaluation for model: light_GBM
[#############-------] 2020-10-14 12:22:29.214493 - 8 of 12 reports (66.67% complete) - Running robustness evaluation for model: light_GBM




[###############-----] 2020-10-14 12:27:09.204948 - 9 of 12 reports (75.0% complete) - Running fairness evaluation for model: decision_tree




[################----] 2020-10-14 12:29:24.610403 - 10 of 12 reports (83.33% complete) - Running explainability evaluation for model: decision_tree
[##################--] 2020-10-14 12:30:23.908799 - 11 of 12 reports (91.67% complete) - Running robustness evaluation for model: decision_tree
[####################] 2020-10-14 12:31:15.007343 - 12 of 12 reports (100.0% complete) - Completed all evaluations


**The result is a dictionary keyed on analysis, containing reports keyed on model id **

**We will be extracting the score information in the form of a DataFrame from the results dictionary**

In [39]:
df_f = construct_scores_dataframe(scores('fairness', result), include_confidence=False)
display(df_f)

df_r = construct_scores_dataframe(scores('robustness', result), include_confidence=False)
display(df_r)

df_e = construct_scores_dataframe(scores('explainability', result), include_confidence=False)
display(df_e)

Unnamed: 0,context,type,overall fairness,Feature (age),Group details (<= 25 years),Group details (> 25 years),Feature (status),Group details (female : divorced/separated/married),Group details (male : divorced/separated),Group details (male : married/widowed),Group details (male : single)
logistic_regression (burden),logistic_regression,burden,66.986178,68.847741,0.080435,0.04232,71.373645,0.074194,0.07478,0.03037,0.036758
random_forest (burden),random_forest,burden,77.218738,78.401282,0.060272,0.038762,83.73693,0.055147,0.049949,0.035993,0.03616
light_GBM (burden),light_GBM,burden,80.2284,83.08724,0.056871,0.040358,84.285291,0.052149,0.054941,0.034899,0.038967
decision_tree (burden),decision_tree,burden,82.005751,84.619545,0.056125,0.041311,85.484647,0.053229,0.057959,0.040076,0.038363


Unnamed: 0,context,robustness
logistic_regression,logistic_regression,86.440972
random_forest,random_forest,77.467913
light_GBM,light_GBM,52.954744
decision_tree,decision_tree,53.857724


Unnamed: 0,context,explainability,Num features (1),Num features (10),Num features (2),Num features (3),Num features (4),Num features (5),Num features (6),Num features (7),Num features (8),Num features (9)
logistic_regression,logistic_regression,91.015625,46.09375,0.0,39.0625,10.9375,3.125,0.0,0.78125,0.0,0.0,0.0
random_forest,random_forest,94.6875,53.90625,0.0,41.40625,4.6875,0.0,0.0,0.0,0.0,0.0,0.0
light_GBM,light_GBM,92.890625,50.0,0.0,35.9375,14.0625,0.0,0.0,0.0,0.0,0.0,0.0
decision_tree,decision_tree,99.765625,97.65625,0.0,2.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**Merge the dataframes on common attribute, convert it to csv format and upload it into cloud object storage bucket**

In [40]:
df_f_e = pd.merge(df_f, df_e, on='context')
df_complete = pd.merge(df_f_e, df_r, on='context')

In [41]:
df_complete.to_csv('/home/dsxuser/work/scan_results.csv', index=False)

In [42]:
cos.upload_file(Filename='/home/dsxuser/work/scan_results.csv',Bucket=credentials['BUCKET'],Key='scan_results.csv')

# Step (3) Creating the exportable scan object
**Task 1) Next we'll make modify the scan definition to make it suitable for running against a version of the model deployed as a web service, and export this scan definition as a YAML file. 

**The two things that need to be changed are:**
- *predict_endpoint*: Since the model will be running in a web service, we need to provide the URL for its intended predict endpoint
- *dataset url*: Similarly, since the data will be read from persistent storage rather than an already populated DataFrame, we'll need to modify the data source accordingly. If the URL is a relative file path, it will be interpreted relative to where the scan definition is stored.

In [43]:
scan.models[0].predict_endpoint = 'http://mymodel/logistic_regression/predict'
scan.models[1].predict_endpoint = 'http://mymodel/random_forest/predict'
scan.models[2].predict_endpoint = 'http://mymodel/light_GBM/predict'
scan.models[3].predict_endpoint = 'http://mymodel/decision_tree/predict'
scan.datasets[0].source = CertifaiDatasetSource.csv('newdatafile.csv')

**The scan object contains the scan definition, which consists of all of the metadata needed to rerun the scan**

**Task 2) Viewing the scan definition**

In [44]:
print(scan.extract_yaml())

dataset_schema:
  outcome_column: outcome
datasets:
- dataset_id: evaluation
  delimiter: ','
  file_type: csv
  has_header: true
  quote_character: '"'
  url: newdatafile.csv
evaluation:
  evaluation_dataset_id: evaluation
  evaluation_types:
  - fairness
  - explainability
  - robustness
  fairness_grouping_features:
  - name: age
  - name: status
  name: test_case
  prediction_description: Determine whether a loan should be granted
  prediction_favorability: explicit
  prediction_values:
  - favorable: true
    name: Loan granted
    value: 1
  - favorable: false
    name: Loan denied
    value: 2
model_use_case:
  model_use_case_id: test_case
  name: test_case
  task_type: binary-classification
models:
- model_id: logistic_regression
  name: logistic_regression
  predict_endpoint: http://mymodel/logistic_regression/predict
  prediction_value_order:
  - 1
  - 2
- model_id: random_forest
  name: random_forest
  predict_endpoint: http://mymodel/random_forest/predict
  prediction_value

**Task 3) Save the Scan Definition.**

**Save the scan definition to a file. The file path is relative to the notebook or user defined.**

In [45]:
scan_file="/home/dsxuser/work/scan_definition.yaml"
with open(scan_file, "w") as f:
    scan.save(f)
    print(f"Saved template to: {scan_file}")

Saved template to: /home/dsxuser/work/scan_definition.yaml


**Upload the scan definition file to cloud object storage for future reference. The scan definition file can be used to run from command line interface**

In [46]:
cos.upload_file(Filename='/home/dsxuser/work/scan_definition.yaml',Bucket=credentials['BUCKET'],Key='scan_definition.yaml')

**We have learnt how to quickly build and compare multiple models using Certifai modules and deploy the best model onto production. This will greatly help in identifying the bias, make the models explainable and robust.**