# Notebook Summary 


### Quickstart

  1. Import etiq library - for install please check our docs (https://docs.etiq.ai/) 

  2. Login to the dashboard - this way you can send the results to your dashboard instance (Etiq AWS instance if you use the SaaS version). To deploy on your own cloud instance, get in touch (info@etiq.ai)

  3. Create or open a project 
  
  
### Bias Metrics & Bias Sources Scans 


  4. Load Adult dataset 
  
  5. Load your config file and create your snapshot based on an etiq wrapped xgboost model
  
  6. Scan for bias issues 
  
  7. Scan for bias sources in the training dataset



# What is bias? And why it matters?

In this context bias refers to algorithmic bias. "Algorithmic bias" refers to unintended discrimination occurring as a result of an automated decision. 

Legislation defines a series of protected features. For example, in the UK, citizens are protected against discrimination on the basis of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex or sexual orientation status by the Equality Act 2010. 
The unprivileged group within the protected feature (for example, people over 65 when age is the protected feature) tends to be discriminated against and as a result tends to be the one protected by legislation. The privileged group within the protected feature tends to not be discriminated against.

If you are not tackling this issue, not only is your model potentially unethical, discriminating unintentionally and at risk from a compliance point of view, but also you are potentially leaving customer groups underserved and thus leaving money on the table. 



# SET-UP

In [1]:
import etiq


Thanks for trying out the ETIQ.ai toolkit!

Visit our getting started documentation at https://docs.etiq.ai/

Visit our Slack channel at https://etiqcore.slack.com/ for support or feedback.



In [2]:
from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "<token>")


(Dashboard supplied updated license information)


Connection successful. Projects and pipelines will be displayed in the dashboard. 😀

In [3]:
# Can enumerate all available projects
all_projects = etiq.projects.get_all_projects()
print(all_projects)

[<ETIQ:Project [1] "Demo Project">, <ETIQ:Project [2] "Test Project April 1">, <ETIQ:Project [3] "Test Project April 2">, <ETIQ:Project [4] "COMPAS">, <ETIQ:Project [5] "Bias Scan">, <ETIQ:Project [6] "RCA Scans">, <ETIQ:Project [7] "Doug's Data">, <ETIQ:Project [8] "Test Project April 3">, <ETIQ:Project [9] "Test Project April 4">, <ETIQ:Project [10] "Test Project April 5">, <ETIQ:Project [11] "Custom Metrics Scans">, <ETIQ:Project [12] "Accuracy Scans">, <ETIQ:Project [13] "Doug Test Project 2">]


In [4]:
# Can get/create a single named project
project = etiq.projects.open(name="Bias Scan")

# SNAPSHOT: xgboost, pre-configured model


To illustrate some of the library's features, we build a model that predicts whether an applicant makes over or under 50K using the Adult dataset from https://archive.ics.uci.edu/ml/datasets/adult.


First, we'll be encoding the categorical features found in this dataset.

Second, we'll log the dataset to Etiq.

In this case we encode prior to splitting into test/train/validate because we know in advance the categories people fall into for this dataset. This means that in production we won't run into new categories that will fall into a bucket not included in this dataset, This allows us to encode prior to splitting into train/test/validation.

However if this is not the case for your use case, you should NOT encode prior to splitting your sample, as this might lead to LEAKAGE.

Encoding categorical values itself is problematic as it assigns a numerical ranking to categorical variables. For best practice encoding use one hot encoding. As we limit the free library functionality to 15 features, we will not do one-hot encoding for the purposes of this example.

Remember: This is an example only. The use case for the majority of scans in Etiq is that you log the model to Etiq once you have the sample that you'll be training on. Usually this sample will have numeric features only as otherwise you will not be able to use it in with the majority of supported libraries training methods.

In [5]:
# Loading a dataset. We're using the adult dataset
data = etiq.utils.load_sample("adultdata")
data.head()


Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [6]:
from etiq.transforms import LabelEncoder
import pandas as pd
import numpy as np 

# use a LabelEncoder to transform categorical variables
cont_vars = ['age', 'educational-num', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
cat_vars = list(set(data.columns.values) - set(cont_vars))

label_encoders = {}
data_encoded = pd.DataFrame()
for i in cat_vars:
    label = LabelEncoder()
    data_encoded[i] = label.fit_transform(data[i])
    label_encoders[i] = label

data_encoded.set_index(data.index, inplace=True)
data_encoded = pd.concat([data.loc[:, cont_vars], data_encoded], axis=1).copy()


## Loading the config file

In [7]:
# XXX: Make per-project.
etiq.load_config("./config_bias_scans.json")


{'dataset': {'label': 'income',
  'bias_params': {'protected': 'gender',
   'privileged': 1,
   'unprivileged': 0,
   'positive_outcome_label': 1,
   'negative_outcome_label': 0},
  'train_valid_test_splits': [0.8, 0.1, 0.1],
  'remove_protected_from_features': True,
  'cat_col': ['workclass',
   'relationship',
   'occupation',
   'gender',
   'race',
   'native-country',
   'marital-status',
   'income',
   'education'],
  'cont_col': ['age',
   'educational-num',
   'fnlwgt',
   'capital-gain',
   'capital-loss',
   'hours-per-week']},
 'scan_bias_metrics': {'thresholds': {'equal_opportunity': [0.0, 0.2],
   'demographic_parity': [0.0, 0.2],
   'equal_odds_tnr': [0.0, 0.2],
   'individual_fairness': [0.0, 0.2],
   'equal_odds_tpr': [0.0, 0.2]}},
 'scan_bias_sources': {'auto': True}}

## Logging the snapshot to Etiq 

This can happen at any point in the pipeline and through a variety of ways

In [8]:
#load your dataset

dataset_loader = etiq.dataset(data_encoded)

from etiq.model import DefaultXGBoostClassifier
# Load our model
model = DefaultXGBoostClassifier()

# Creating a snapshot
snapshot = project.snapshots.create(name="Test Bias", dataset=dataset_loader.initial_dataset, model=model, bias_params=dataset_loader.bias_params)


## Bias Metrics Scan

Some of the metrics commonly used in the algorithmic fairness literature that the Etiq library provides are:

- equal_opportunity metric: measures the difference in true positive rate between a privileged demographic group and an unprivileged demographic group. 

- demographic_parity (the difference between number of positive labels out of total from a privileged demographic group vs. a unprivileged demographic group)

- equal_odds_tpr & equal_odds_tnr (unlike with equal_opportunity, this criteria looks at difference between true positive rate - privileged vs. unpriviledge and true negative rate - privileged vs. unprivileged, with the aim of ensuring that the difference for both metrics are minimal)

- individual_fairness (measures whether individuals with similar features observe the same model responses)

Our Bias Metrics scan uses the metrics above with certain thresholds to see if the model meets that benchmark or not. 

The thresholds are set by the user, BUT most metrics are ideally as close to 0 as possible, meaning that the model shouldn't really behave differently (and with detrimental outcomes) for the protected groups. 

The consensus in the literature (and our view) is that algorithmic bias can be mitigated but not removed entirely.


In [9]:
(segments, issues, issue_summary) = snapshot.scan_bias_metrics()

INFO:etiq.pipeline.BiasMetricsIssuePipeline0579:Starting pipeline
INFO:etiq.pipeline.BiasMetricsIssuePipeline0579:Computed bias metrics for the dataset
INFO:etiq.pipeline.BiasMetricsIssuePipeline0579:Completed pipeline


In [10]:
issue_summary

Unnamed: 0,name,metric,measure,features,segments,total_issues_tested,issues_found,threshold
0,demographic_parity_below_threshold,<function demographic_parity at 0x7f6b5213c700>,,{},{},1,0,"[0.0, 0.2]"
1,demographic_parity_above_threshold,<function demographic_parity at 0x7f6b5213c700>,,{},{},1,0,"[0.0, 0.2]"
2,equal_odds_tpr_below_threshold,<function equal_odds_tpr at 0x7f6b5213c790>,,{},{},1,0,"[0.0, 0.2]"
3,equal_odds_tpr_above_threshold,<function equal_odds_tpr at 0x7f6b5213c790>,,{},{},1,0,"[0.0, 0.2]"
4,equal_odds_tnr_below_threshold,<function equal_odds_tnr at 0x7f6b5213c820>,,{},{},1,0,"[0.0, 0.2]"
5,equal_odds_tnr_above_threshold,<function equal_odds_tnr at 0x7f6b5213c820>,,{},{},1,0,"[0.0, 0.2]"
6,equal_opportunity_below_threshold,<function equal_opportunity at 0x7f6b5213c8b0>,,{},{},1,0,"[0.0, 0.2]"
7,equal_opportunity_above_threshold,<function equal_opportunity at 0x7f6b5213c8b0>,,{},{},1,0,"[0.0, 0.2]"
8,individual_fairness_below_threshold,<function individual_fairness at 0x7f6b5213caf0>,,{},{},1,0,"[0.0, 0.2]"
9,individual_fairness_above_threshold,<function individual_fairness at 0x7f6b5213caf0>,,{},{},1,0,"[0.0, 0.2]"


## Bias Sources 



Our Bias Sources scan identifies potential sources of bias based on a framework that includes: 

- proxies - identifying features
- sample size disparity - difference in sample sizes and size of positive/negative labels between protected demographic and the majority demographic group
- segment size - are some customer profiles poorly represented in your sample
- limited features/correlation issue - features are less reliable for a certain demographic group, which is oftentimes linked with sampling but more fundamentally it could be that some groups' behaviour is less well encoded by available features


It can useful to look at these metrics globally to uncover issues across your sample. But a lot of the issues will only be visible for specific groups, specific records. The Bias Sources scan aims to identify which groups have the issues above. 


Bias sources scan is ran on training dataset by default as this is where the potentially harmful unfairly discriminatory pattern is learned by your model. 

You have two options of bias sources scans to run: 

1) if you don't set anything in the config, the segments will be fuzzy rather than business rules. 

2) if you set the option: auto in the config (as in the current config we are using) then the segments will be based on business rules.

If you use the auto option, you will need to specify the categorical and continuous features. You can do this either from the config as in this case or from the notebook (see last cell). 


At the moment we only provide Pearson's correlation which means there is an issue with calculating correlations for known categorical features. This is on our roadmap as a priority fix. 

In [11]:
(segments, issues, issue_summary) = snapshot.scan_bias_sources()

INFO:etiq.pipeline.DataPipeline0513:Starting pipeline
INFO:etiq.pipeline.DataPipeline0513:Computed metrics for the initial dataset
INFO:etiq.pipeline.DataPipeline0513:Completed pipeline
INFO:etiq.pipeline.DebiasPipeline0851:Starting pipeline
INFO:etiq.pipeline.DebiasPipeline0851:Start Phase IdentifyPipeline0387
INFO:etiq.pipeline.IdentifyPipeline0387:Using parent model
INFO:etiq.pipeline.IdentifyPipeline0387:Starting pipeline
INFO:etiq.pipeline.IdentifyPipeline0387:Checking proxy for feature age
INFO:etiq.pipeline.IdentifyPipeline0387:Checking correlation for feature age
INFO:etiq.pipeline.IdentifyPipeline0387:Checking proxy for feature educational-num
INFO:etiq.pipeline.IdentifyPipeline0387:Checking correlation for feature educational-num
INFO:etiq.pipeline.IdentifyPipeline0387:Checking proxy for feature fnlwgt
INFO:etiq.pipeline.IdentifyPipeline0387:Checking correlation for feature fnlwgt
INFO:etiq.pipeline.IdentifyPipeline0387:Checking proxy for feature capital-gain
INFO:etiq.pipeli

  return {'accuracy': round((pred == label).mean(), 2)}
  ret = ret.dtype.type(ret / rcount)
  return {'accuracy': round((pred == label).mean(), 2)}
  ret = ret.dtype.type(ret / rcount)
  return {'accuracy': round((pred == label).mean(), 2)}
  ret = ret.dtype.type(ret / rcount)
  return {'accuracy': round((pred == label).mean(), 2)}
  ret = ret.dtype.type(ret / rcount)
  return {'accuracy': round((pred == label).mean(), 2)}
  ret = ret.dtype.type(ret / rcount)
  pk = 1.0*pk / np.sum(pk, axis=axis, keepdims=True)
  c /= stddev[:, None]
  c /= stddev[None, :]
  avg = a.mean(axis)
  ret = um.true_divide(
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)


INFO:etiq.pipeline.IdentifyPipeline0387:Completed pipeline
INFO:etiq.pipeline.DebiasPipeline0851:Completed Phase IdentifyPipeline0387
INFO:etiq.pipeline.DebiasPipeline0851:Computed metrics for the initial dataset
INFO:etiq.pipeline.DebiasPipeline0851:Completed pipeline


In [12]:
issues

Unnamed: 0,name,feature,segment,measure,measure_value,metric,metric_value,threshold
0,low_unpriv_sample,,0.0,,,,,"(0.0, 0.8)"
1,low_unpriv_sample,,1.0,,,,,"(0.0, 0.8)"
2,low_unpriv_sample,,2.0,,,,,"(0.0, 0.8)"
3,low_unpriv_sample,,3.0,,,,,"(0.0, 0.8)"
4,low_unpriv_sample,,4.0,,,,,"(0.0, 0.8)"
...,...,...,...,...,...,...,...,...
197,correlation_issue,race,20.0,<function corrcoef at 0x7f6b83ff0ca0>,,,,"(0.0, 0.2)"
198,correlation_issue,relationship,20.0,<function corrcoef at 0x7f6b83ff0ca0>,,,,"(0.0, 0.2)"
199,limited_features_issue,,3.0,,,<function equal_opportunity at 0x7f6b5213c8b0>,0.300000,"(0.0, 0.2)"
200,limited_features_issue,,4.0,,,<function equal_opportunity at 0x7f6b5213c8b0>,0.234568,"(0.0, 0.2)"


In [13]:
issue_summary

Unnamed: 0,name,metric,measure,features,segments,total_issues_tested,issues_found,threshold
0,missing_sample,,,{},"{5, 6, 7, 19, 20}",21,5,"(0.0, 0.0)"
1,low_unpriv_sample,,,{},"{0, 1, 2, 3, 4, 8, 9, 10, 11, 12, 13, 14, 15, ...",16,16,"(0.0, 0.8)"
2,low_priv_sample,,,{},{},16,0,"(0.0, 0.8)"
3,skewed_priv_sample,,,{},{},14,0,"(0.0, 0.2)"
4,skewed_unpriv_sample,,,{},"{10, 4}",16,2,"(0.0, 0.2)"
5,proxy_issue,,<function corrcoef at 0x7f6b83ff0ca0>,{relationship},"{3, 4, 8, 10, 12, 16, 17, 18}",273,8,"(0.0, 0.5)"
6,correlation_issue,,<function corrcoef at 0x7f6b83ff0ca0>,"{educational-num, age, marital-status, capital...","{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...",273,168,"(0.0, 0.2)"
7,low_volume_group,,,{},{},21,0,"(1000, inf)"
8,limited_features_issue,<function equal_opportunity at 0x7f6b5213c8b0>,,{},"{10.0, 3.0, 4.0}",21,3,"(0.0, 0.2)"


In [14]:
pd.set_option('display.max_colwidth', None)

In [15]:
segments

Unnamed: 0,name,business_rule,mask
0,0,`native-country` == 39 and `occupation` == 6,"[False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
1,1,`native-country` == 39 and `occupation` == 1 and `education` == 11,"[False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
2,2,`native-country` == 39 and `occupation` == 14 and `workclass` == 4,"[False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
3,3,`native-country` == 39 and `occupation` == 7 and `workclass` == 4 and `education` == 11,"[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
4,4,`native-country` == 39 and `occupation` == 13,"[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, True, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, True, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
5,5,`occupation` == 4 and `race` == 4 and `native-country` == 39 and `workclass` == 4 and `relationship` == 0,"[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
6,6,`occupation` == 3 and `workclass` == 4 and `marital-status` == 2 and `native-country` == 39 and `relationship` == 0,"[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, ...]"
7,7,`occupation` == 10 and `native-country` == 39 and `marital-status` == 2 and `relationship` == 0,"[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, ...]"
8,8,`native-country` == 39 and `education` == 11 and `race` == 2,"[False, False, True, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, ...]"
9,9,`native-country` == 39 and `education` == 11 and `race` == 4 and `occupation` == 3,"[False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, ...]"


 You can specify categorical and continuous features also directly in the notebook as per example below:

In [None]:
#Load your dataset
#For bias sources you need to add some specific syntax at the moment or set-up your categorical and continuous features in the config

dataset_loader = etiq.dataset(data_encoded)
dl = etiq.dataset_loader.DatasetLoader(data=data_encoded, label='income', bias_params=dataset_loader.bias_params,
                   train_valid_test_splits=[0.8, 0.1, 0.1], cat_col=cat_vars,
                   cont_col=cont_vars, names_col = data_encoded.columns.values)

from etiq.model import DefaultXGBoostClassifier
# Load our model
model = DefaultXGBoostClassifier()

# Creating a snapshot
snapshot = project.snapshots.create(name="Snapshot 2", dataset=dl.initial_dataset, model=model, bias_params=dataset_loader.bias_params)