# Notebook Summary 


### Quickstart

  1. Import etiq library - for install please check our docs (https://docs.etiq.ai/) 

  2. Login to the dashboard - this way you can send the results to your dashboard instance (Etiq AWS instance if you use the SaaS version). To deploy on your own cloud instance, get in touch (info@etiq.ai)

  3. Create or open a project 
  
  
### Custom Metrics Scans


  4. Load Adult dataset and train a model
  
  5. Create a custom metric 
  
  6. Add your new metric to the config file
  
  5. Load your config file and create your snapshot based on the model you've just trained
  
  6. Scan for accuracy issues, including your custom metric scan 
  
 

## Why custom metrics?

From talking to data scientists and being data scientists ourselves, we know that a specific use case and company requires perhaps a different set of metrics than another. 

This functionality allows you to add your custom metrics to the scan suite and run your scan as you would with out-of-the-box metrics. 


# SET-UP

In [1]:
import etiq


Thanks for trying out the ETIQ.ai toolkit!

Visit our getting started documentation at https://docs.etiq.ai/

Visit our Slack channel at https://etiqcore.slack.com/ for support or feedback.



In [2]:
from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "<token>")


(Dashboard supplied updated license information)


Connection successful. Projects and pipelines will be displayed in the dashboard. 😀

In [3]:
# Can get/create a single named project
project = etiq.projects.open(name="Custom Metrics Scans")

# CUSTOM METRICS SCANS: xgboost, already trained model


To illustrate some of the library's features, we build a model that predicts whether an applicant makes over or under 50K using the Adult dataset from https://archive.ics.uci.edu/ml/datasets/adult.


First, we'll be encoding the categorical features found in this dataset.

Second, we'll log the dataset to Etiq.

In this case we encode prior to splitting into test/train/validate because we know in advance the categories people fall into for this dataset. This means that in production we won't run into new categories that will fall into a bucket not included in this dataset, This allows us to encode prior to splitting into train/test/validation.

However if this is not the case for your use case, you should NOT encode prior to splitting your sample, as this might lead to LEAKAGE.

Encoding categorical values itself is problematic as it assigns a numerical ranking to categorical variables. For best practice encoding use one hot encoding. As we limit the free library functionality to 15 features, we will not do one-hot encoding for the purposes of this example.

## Model Build 

In [4]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from xgboost.sklearn import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

# Loading a dataset. We're using the adult dataset
data = etiq.utils.load_sample("adultdata")
data.head()



Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [5]:
# use a LabelEncoder to transform categorical variables
cont_vars = ['age', 'educational-num', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
cat_vars = list(set(data.columns.values) - set(cont_vars))

label_encoders = {}
data_encoded = pd.DataFrame()
for i in cat_vars:
    label = LabelEncoder()
    data_encoded[i] = label.fit_transform(data[i])
    label_encoders[i] = label

data_encoded.set_index(data.index, inplace=True)
data_encoded = pd.concat([data.loc[:, cont_vars], data_encoded], axis=1).copy()



In [6]:
# prepare the training/testing/validation datasets

# separate into train/validate/test dataset of sizes 80%/10%/10% as percetages of the initial data
data_remaining, test = train_test_split(data_encoded, test_size=0.1)
train, valid = train_test_split(data_remaining, test_size=0.1112)

# because we don't want to train on protected attributes or labels to be predicted, 
# let's remove these columns from the training dataset
protected_train = train['gender'].copy() # gender is a protected attribute
y_train = train['income'].copy() # labels we're going to train the model to predict
x_train = train.drop(columns=['gender','income'])
protected_valid = valid['gender'].copy() 
y_valid = valid['income'].copy() 
x_valid = valid.drop(columns=['gender','income'])
protected_test = test['gender'].copy() 
y_test = test['income'].copy()
x_test = test.drop(columns=['gender','income'])

In [7]:
# train a XGBoost model to predict 'income'

standard_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=4)    
model_fit = standard_model.fit(x_train, y_train)

In [8]:
y_train_pred = standard_model.predict(x_train)
y_valid_pred = standard_model.predict(x_valid)
print('Model accuracy on the training dataset :', 
      round(100 * accuracy_score(y_train, y_train_pred),2),'%') # round the score to 2 digits  

print('Model accuracy on the validation dataset :', 
      round(100 * accuracy_score(y_valid, y_valid_pred),2),'%')

Model accuracy on the training dataset : 90.11 %
Model accuracy on the validation dataset : 86.75 %


## Create your custom accuracy metric

At the moment the decorators you can use to build your custom metric are as follows:

- prediction_values (refers to what the model scores, should be a list)

- actual_values (refers to the actuals, if your custom metric is for production, it will use the score as actual if it is   provided and no actuals or model are available, should be a list)

- protected_values (refers to the demographic variable you want to check for bias, if you have multiple demographics please create a feature with the intersection)

- positive_outcome (directional, refers to what is considered a positive prediction or outcome, e.g. in the case of a lending model it would be a low risk score or if the customer is accepted for the loan, should be a value)

- negative_outcome (directional, refers to what is considered a negative prediction or outcome, e.g. in the case of a lending model it would be a high risk score or if the customer is rejected for the loan, should be a value)

- privileged_class (refers to the class in the demographics which is priviledged - not protected by the legislation, should be a value)

- unprivileged_class (refers to the class in the demographics which is not priviledged - and which is protected by the legislation, should be a value, in future releases we will add functionality for multiple values here)

They follow the parameters available in the config file. 

Below is an example syntax of how you could use these decorators.

@etiq.metrics.accuracy_metric - refers to logging your metric as an accuracy metric (right now you can log it either as accuracy or as bias metric, with drift pending)

@etiq.custom_metric - specifies that this is a custom metric

In [9]:

import numpy as np

@etiq.metrics.accuracy_metric
@etiq.custom_metric
@etiq.actual_values('actual')
@etiq.prediction_values('predictions')
def accuracy_custom(predictions=None, actual=None):
    """ Accuracy = nr of correct predictions/ nr of predictions
    """
    apred = np.asarray(predictions)
    alabel = np.asarray(actual)
    return (apred == alabel).mean()



In [10]:
#check with a short example

pred = np.asarray([1, 1, 1, 0, 0, 1])
label = np.asarray([1, 0, 1, 0, 1, 0])
res = accuracy_custom(pred=pred, label=label)
print(res)

{'accuracy_custom': 0.5}


## Log config, dataset and model to Etiq

For already trained models make sure you only you use a sample you held out. 

As you don't want any retraining of the model to occur, set your train_valid_test split to [0.0, 1.0, 0.0].

Don't forget to add the new metric to your config file!!!

In [11]:
etiq.load_config("./config_accuracy_custom.json")


{'dataset': {'label': 'income',
  'bias_params': {'protected': 'gender',
   'privileged': 1,
   'unprivileged': 0,
   'positive_outcome_label': 1,
   'negative_outcome_label': 0},
  'train_valid_test_splits': [0.0, 1.0, 0.0]},
 'scan_accuracy_metrics': {'thresholds': {'accuracy': [0.8, 1.0],
   'true_pos_rate': [0.6, 1.0],
   'true_neg_rate': [0.6, 1.0],
   'accuracy_custom': [0.9, 1.0]}}}

In [12]:
from etiq import Model


#log your dataset (the sample you held-out!!!)

dataset_loader = etiq.dataset(test)

#Log your already trained model

model = Model(model_fitted=model_fit)

In [13]:
snapshot = project.snapshots.create(name="New custom accuracy metric", dataset=dataset_loader.initial_dataset, model=model, bias_params=dataset_loader.bias_params)


In [14]:
(segments, issues, issue_summary) = snapshot.scan_accuracy_metrics()


INFO:etiq.pipeline.AccuracyMetricsIssuePipeline0165:Starting pipeline
INFO:etiq.pipeline.AccuracyMetricsIssuePipeline0165:Computed acurracy metrics for the dataset
INFO:etiq.pipeline.AccuracyMetricsIssuePipeline0165:Completed pipeline


In [15]:
issues

Unnamed: 0,name,feature,segment,measure,measure_value,metric,metric_value,threshold
0,accuracy_custom_below_threshold,,all,,,<function accuracy_custom at 0x7f10430bbca0>,0.882497,"[0.9, 1.0]"


In [16]:
issue_summary

Unnamed: 0,name,metric,measure,features,segments,total_issues_tested,issues_found,threshold
0,accuracy_below_threshold,<function accuracy at 0x7f1044042550>,,{},{},1,0,"[0.8, 1.0]"
1,accuracy_above_threshold,<function accuracy at 0x7f1044042550>,,{},{},1,0,"[0.8, 1.0]"
2,true_pos_rate_below_threshold,<function true_pos_rate at 0x7f10440425e0>,,{},{},1,0,"[0.6, 1.0]"
3,true_pos_rate_above_threshold,<function true_pos_rate at 0x7f10440425e0>,,{},{},1,0,"[0.6, 1.0]"
4,true_neg_rate_below_threshold,<function true_neg_rate at 0x7f1044042670>,,{},{},1,0,"[0.6, 1.0]"
5,true_neg_rate_above_threshold,<function true_neg_rate at 0x7f1044042670>,,{},{},1,0,"[0.6, 1.0]"
6,accuracy_custom_below_threshold,<function accuracy_custom at 0x7f10430bbca0>,,{},{all},1,1,"[0.9, 1.0]"
7,accuracy_custom_above_threshold,<function accuracy_custom at 0x7f10430bbca0>,,{},{},1,0,"[0.9, 1.0]"


## Create your custom bias metric

In [17]:
from collections import Counter

@etiq.metrics.bias_metric
@etiq.custom_metric
@etiq.prediction_values('predictions')
def gini_index(predictions):
    class_counts = Counter(predictions)
    num_values = len(predictions)
    sum_probs = 0.0
    for aclass in class_counts:
        sum_probs += (class_counts[aclass]/num_values) ** 2
    return 1.0 - sum_probs

## Log config, dataset and model to Etiq

In [18]:
etiq.load_config("./config_bias_custom.json")


{'dataset': {'label': 'income',
  'bias_params': {'protected': 'gender',
   'privileged': 1,
   'unprivileged': 0,
   'positive_outcome_label': 1,
   'negative_outcome_label': 0},
  'train_valid_test_splits': [0.0, 1.0, 0.0]},
 'scan_accuracy_metrics': {'thresholds': {'accuracy': [0.7, 0.9],
   'true_pos_rate': [0.75, 1.0],
   'true_neg_rate': [0.7, 1.0]}},
 'scan_bias_metrics': {'thresholds': {'equal_opportunity': [0.0, 0.2],
   'demographic_parity': [0.0, 0.2],
   'equal_odds_tnr': [0.0, 0.2],
   'equal_odds_tpr': [0.0, 0.2],
   'individual_fairness': [0.0, 0.2],
   'gini_index': [0.3, 0.4]}}}

In [19]:
from etiq import Model


#log your dataset (the sample you held-out!!!)

dataset_loader = etiq.dataset(test)

#Log your already trained model

model = Model(model_fitted=model_fit)

In [20]:
snapshot = project.snapshots.create(name="New custom bias metric", dataset=dataset_loader.initial_dataset, model=model, bias_params=dataset_loader.bias_params)


In [21]:
(segments, issues, issue_summary) = snapshot.scan_bias_metrics()

INFO:etiq.pipeline.BiasMetricsIssuePipeline0566:Starting pipeline
INFO:etiq.pipeline.BiasMetricsIssuePipeline0566:Computed bias metrics for the dataset
INFO:etiq.pipeline.BiasMetricsIssuePipeline0566:Completed pipeline


In [22]:
issues

Unnamed: 0,name,feature,segment,measure,measure_value,metric,metric_value,threshold
0,gini_index_below_threshold,,all,,,<function gini_index at 0x7f10430bbb80>,0.229389,"[0.3, 0.4]"


In [23]:
issue_summary

Unnamed: 0,name,metric,measure,features,segments,total_issues_tested,issues_found,threshold
0,demographic_parity_below_threshold,<function demographic_parity at 0x7f1044042700>,,{},{},1,0,"[0.0, 0.2]"
1,demographic_parity_above_threshold,<function demographic_parity at 0x7f1044042700>,,{},{},1,0,"[0.0, 0.2]"
2,equal_odds_tpr_below_threshold,<function equal_odds_tpr at 0x7f1044042790>,,{},{},1,0,"[0.0, 0.2]"
3,equal_odds_tpr_above_threshold,<function equal_odds_tpr at 0x7f1044042790>,,{},{},1,0,"[0.0, 0.2]"
4,equal_odds_tnr_below_threshold,<function equal_odds_tnr at 0x7f1044042820>,,{},{},1,0,"[0.0, 0.2]"
5,equal_odds_tnr_above_threshold,<function equal_odds_tnr at 0x7f1044042820>,,{},{},1,0,"[0.0, 0.2]"
6,equal_opportunity_below_threshold,<function equal_opportunity at 0x7f10440428b0>,,{},{},1,0,"[0.0, 0.2]"
7,equal_opportunity_above_threshold,<function equal_opportunity at 0x7f10440428b0>,,{},{},1,0,"[0.0, 0.2]"
8,individual_fairness_below_threshold,<function individual_fairness at 0x7f1044042af0>,,{},{},1,0,"[0.0, 0.2]"
9,individual_fairness_above_threshold,<function individual_fairness at 0x7f1044042af0>,,{},{},1,0,"[0.0, 0.2]"
