# Notebook Summary 


### Quickstart

  1. Import etiq library - for install please check our docs (https://docs.etiq.ai/) 

  2. Login to the dashboard - this way you can send the results to your dashboard instance (Etiq AWS instance if you use the SaaS version). To deploy on your own cloud instance, get in touch (info@etiq.ai)

  3. Create or open a project 
  
  
  
### Snapshot 1, already-trained model 


  7. Load Adult dataset
  
  8. Train a model 
  
  9. Load your config file and create your snapshot using the test dataset only 
  
  10. Scan for bias rca issues 
  
 
  

## What are Bias RCA scans?

Some scans are simple tests that show whether a metric for an entire sample dataset is above or below certain thresholds set by the user. Other scans are more complex and look at whether a metric for only a part of the sample dataset is below or above the threshold. This will help you discover segments of customers or groups of records for which the model has a lower than expected accuracy or groups for which bias thresholds are not met.

##### Imagine if your accuracy metrics tests have picked up issues, this test finds out exactly which segment has the issue, which should help you fix it sooner. 

If only a part of the data drifted or only a segment is underperforming, your overall tests might not pick up on it, but this test would. While you can pre-set segments you are interested in, if you just run the scan as is, it will discover problematic segments on its own.


The RCA bias scans so far provide the following metrics out-of-the-box:

1. Equal Opportunity: measures the difference in true positive rate between a privileged demographic group and an unprivileged demographic group.

2. Demographic Parity: measures the difference between number of positive labels out of total from a privileged demographic group vs. a unprivileged demographic group)

3. Equal Odds TNR: measures the difference between true negative rate - privileged vs. unprivileged. The full measure in the literature looks for an optimal point where the difference in true positive rate between demographic groups as well as the difference in true negative rate between demographic groups are both minimized.

4. Individual Fairness: measures whether individuals with similar features observe the same model responses

This test pipeline is experimental. 

# SET-UP

In [1]:
import etiq


Thanks for trying out the ETIQ.ai toolkit!

Visit our getting started documentation at https://docs.etiq.ai/

Visit our Slack channel at https://etiqcore.slack.com/ for support or feedback.



In [2]:
from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "MRuDU89L2gAKTba07u4Kfj48QLQLkaT92pw7LI4ScQw0uX0dWdCis9RxicaPjgs5")


(Dashboard supplied updated license information)


Connection successful. Projects and pipelines will be displayed in the dashboard. 😀

In [3]:
# Can get/create a single named project
project = etiq.projects.open(name="Bias RCA Scans")

# SNAPSHOT 1: xgboost, pre-configured model


To illustrate some of the library's features, we build a model that predicts whether an applicant makes over or under 50K using the Adult dataset from https://archive.ics.uci.edu/ml/datasets/adult.


First, we'll be encoding the categorical features found in this dataset.

Second, we'll log the dataset to Etiq.

In this case we encode prior to splitting into test/train/validate because we know in advance the categories people fall into for this dataset. This means that in production we won't run into new categories that will fall into a bucket not included in this dataset, This allows us to encode prior to splitting into train/test/validation.

However if this is not the case for your use case, you should NOT encode prior to splitting your sample, as this might lead to LEAKAGE. Encoding categorical values itself is problematic as it assigns a numerical ranking to categorical variables. For best practice encoding use one hot encoding. This is for example purposes

Once model is built, we will:
 - Log the relevant config to etiq 
 - Log the model together with the hold-out sample to etiq
 - Run the Bias RCA scan



## Model Build 

In [4]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from xgboost.sklearn import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

# Loading a dataset. We're using the adult dataset
data = etiq.utils.load_sample("adultdata")
data.head()



Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [5]:
# use a LabelEncoder to transform categorical variables
cont_vars = ['age', 'educational-num', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
cat_vars = list(set(data.columns.values) - set(cont_vars))

label_encoders = {}
data_encoded = pd.DataFrame()
for i in cat_vars:
    label = LabelEncoder()
    data_encoded[i] = label.fit_transform(data[i])
    label_encoders[i] = label

data_encoded.set_index(data.index, inplace=True)
data_encoded = pd.concat([data.loc[:, cont_vars], data_encoded], axis=1).copy()



In [6]:
# prepare the training/testing/validation datasets

# separate into train/validate/test dataset of sizes 80%/10%/10% as percetages of the initial data
data_remaining, test = train_test_split(data_encoded, test_size=0.1)
train, valid = train_test_split(data_remaining, test_size=0.1112)

# because we don't want to train on protected attributes or labels to be predicted, 
# let's remove these columns from the training dataset
protected_train = train['gender'].copy() # gender is a protected attribute
y_train = train['income'].copy() # labels we're going to train the model to predict
x_train = train.drop(columns=['gender','income'])
protected_valid = valid['gender'].copy() 
y_valid = valid['income'].copy() 
x_valid = valid.drop(columns=['gender','income'])
protected_test = test['gender'].copy() 
y_test = test['income'].copy()
x_test = test.drop(columns=['gender','income'])

In [7]:
# train a XGBoost model to predict 'income'

standard_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=4)    
model_fit = standard_model.fit(x_train, y_train)

In [8]:
y_train_pred = standard_model.predict(x_train)
y_valid_pred = standard_model.predict(x_valid)
print('Model accuracy on the training dataset :', 
      round(100 * accuracy_score(y_train, y_train_pred),2),'%') # round the score to 2 digits  

print('Model accuracy on the validation dataset :', 
      round(100 * accuracy_score(y_valid, y_valid_pred),2),'%')

Model accuracy on the training dataset : 90.24 %
Model accuracy on the validation dataset : 87.3 %


## Log config, dataset and model to Etiq

For already trained models make sure you only you use a sample you held out. 

As you don't want any retraining of the model to occur, set your train_valid_test split to [0.0, 1.0, 0.0]. 

In this instance we have used a low volume for the minimum segment size to illustrate the issues, but you should probably set it higher, depending on your use case and sample size. Also we have set the threshold for bias low so as to make sure it finds at least one issue.

In [9]:
from etiq import Model

with etiq.etiq_config("./config_bias_rca.json"):
    
    #log your dataset (the sample you held-out!!!)
    dataset = etiq.BiasDatasetBuilder.dataset(test)

    #Log your already trained model
    model = Model(model_fitted=model_fit)

    # Create a snapshot
    snapshot = project.snapshots.create(name="Snapshot 1", 
                                        dataset=dataset, 
                                        model=model, 
                                        bias_params=etiq.biasparams.BiasParams(protected='gender', privileged=1, unprivileged=0, positive_outcome_label=1, negative_outcome_label=0)
                                       )
    
    #accuracy metrics scan
    (segments_bias, issues_bias, issue_summary_bias) = snapshot.scan_bias_metrics_rca()


INFO:etiq.pipeline.IdentifyPipeline0972:Starting pipeline
INFO:etiq.pipeline.IdentifyPipeline0972:Metric = demographic_parity
INFO:etiq.pipeline.IdentifyPipeline0972:Searching for segments above the threshold 0.3
INFO:etiq.pipeline.IdentifyPipeline0972:Completed pipeline


In [10]:
issue_summary_bias

Unnamed: 0,name,metric,measure,features,segments,total_issues_tested,issues_found,threshold
0,demographic_parity_above_threshold,<compiled_function demographic_parity at 0x7f8...,,{},{8},9,1,"[0.0, 0.3]"


In [11]:
issues_bias

Unnamed: 0,name,feature,segment,measure,measure_value,metric,metric_value,threshold
0,demographic_parity_above_threshold,,8,,,<compiled_function demographic_parity at 0x7f8...,0.432409,"[0.0, 0.3]"


In [12]:
segments_bias

Unnamed: 0,name,business_rule,mask
0,0,all,"[True, True, True, True, True, True, True, Tru..."
1,1,`capital-gain` <= 0.0,"[True, True, True, True, True, True, True, Tru..."
2,2,`capital-gain` <= 0.0 and `capital-loss` <= 0.0,"[True, True, True, True, True, True, True, Tru..."
3,3,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[False, True, False, False, False, False, True..."
4,4,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[False, True, False, False, False, False, True..."
5,5,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[False, False, False, False, False, False, Tru..."
6,6,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[False, True, False, False, False, False, Fals..."
7,7,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[False, True, False, False, False, False, Fals..."
8,8,`capital-gain` <= 0.0 and `capital-loss` <= 0....,"[True, False, True, True, True, True, False, T..."
