### About
This notebook contains simple, toy examples to help you get started with FairMLHealth tool usage. This same content is mirrored in the repository's main [README](../../../README.md)

### Example Setup

In [1]:
from fairmlhealth import model_comparison as fhmc, stratified_reports
from fairmlhealth.reports import flag

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.tree import DecisionTreeClassifier


# Load data
X = pd.DataFrame({'col1':[1,2,50,3,45,32], 'col2':[34,26,44,2,1,1],
                  'col3':[32,23,34,22,65,27], 
                  'gender':[0,1,0,1,1,0], 'ethnicity':[0,0,0,1,1,1]
                 })
y = pd.DataFrame({'y':[1,1,0,1,0,1]})
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.75, random_state=36)

#Train models
model_1 = BernoulliNB().fit(X_train, y_train)
model_2 = DecisionTreeClassifier().fit(X_train, y_train)

# Deterimine your set of protected attributes
prtc_attr = X_test['gender']

# Specify either a dict or a list of trained models to compare
model_dict = {'model_1': model_1, 'model_2': model_2}


In [2]:
display(X)

Unnamed: 0,col1,col2,col3,gender,ethnicity
0,1,34,32,0,0
1,2,26,23,1,0
2,50,44,34,0,0
3,3,2,22,1,1
4,45,1,65,1,1
5,32,1,27,0,1


### Model Measurement
The primary feature of this library is the model comparison tool. The current version supports assessment of binary prediction models through use of the measure_models and compare_models functions.

Measure_model is designed to generate a report of multiple fairness metrics for a single model. Here it is shown wrapped in a "flag" function to emphasize values that are outside of the "fair" range. 

In [3]:
# Generate a pandas dataframe of measures
fairness_measures = fhmc.measure_model(X_test, y_test, prtc_attr, model_1)
# Display and color measures that are out of range
flag(fairness_measures)

Unnamed: 0_level_0,Unnamed: 1_level_0,Value
Metric,Measure,Unnamed: 2_level_1
Group Fairness,Statistical Parity Difference,0.0
Group Fairness,Disparate Impact Ratio,1.0
Group Fairness,Equalized Odds Difference,0.0
Group Fairness,Equalized Odds Ratio,1.0
Group Fairness,Positive Predictive Parity Difference,-0.1667
Group Fairness,Balanced Accuracy Difference,0.0
Group Fairness,Balanced Accuracy Ratio,1.0
Group Fairness,AUC Difference,0.0
Individual Fairness,Consistency Score,1.0
Individual Fairness,Between-Group Gen. Entropy Error,0.0017


### Evaluating

FairMLHealth now also includes stratified reporting features to aid in identifying the source of unfairness or other bias: a data_report, classification_performance report, and classification_fairness report. Note that these stratified reports can evaluate multiple features at once, and that there are two options for identifying which features to assess.

Note that the flag tool has not yet been updated to work with stratified reports.

#### Stratified Data Reports

The data reporter is shown below with each of the two data argument options. It evaluates basic statistics specific to each feature-value, in addition to relative statistics for the target value. Since the reporter can evaluate many features at once, it can be a useful option for identifying patterns of bias either alone or in concert with other (e.g., visual methods).

In [4]:
# Arguments Option 1: pass full set of data, subsetting with *features* argument
stratified_reports.data_report(X_test, y_test, features=['gender'])

Unnamed: 0,FEATURE,FEATURE VALUE,N OBS,N MISSING,FEATURE ENTROPY,VALUE PREVALENCE,Y MAX,Y MEAN,Y MEDIAN,Y MIN,Y STDV
0,ALL_FEATURES,ALL_VALUES,5.0,0,,1.0,1.0,0.6,1.0,0.0,0.5477
1,gender,0,2.0,0,0.971,0.4,1.0,0.5,0.5,0.0,0.7071
2,gender,1,3.0,0,0.971,0.6,1.0,0.6667,1.0,0.0,0.5774


In [5]:
# Arguments Option 2: pass the data subset of interest without using the *features* argument
stratified_reports.data_report(X_test[['gender']], y_test)

Unnamed: 0,FEATURE,FEATURE VALUE,N OBS,N MISSING,FEATURE ENTROPY,VALUE PREVALENCE,Y MAX,Y MEAN,Y MEDIAN,Y MIN,Y STDV
0,ALL_FEATURES,ALL_VALUES,5.0,0,,1.0,1.0,0.6,1.0,0.0,0.5477
1,gender,0,2.0,0,0.971,0.4,1.0,0.5,0.5,0.0,0.7071
2,gender,1,3.0,0,0.971,0.6,1.0,0.6667,1.0,0.0,0.5774


#### Stratified Performance Reports

The stratified classification_performance reporter evaluates model performance specific to each feature-value subset. If prediction probabilities (via the *predict_proba()* method) are available to the model, additional ROC_AUC and PR_AUC values will be included.

In [6]:
stratified_reports.classification_performance(X_test[['gender']], y_test, model_1.predict(X_test))

Unnamed: 0,FEATURE,FEATURE VALUE,N OBS,TRUE MEAN,PRED MEAN,ACCURACY,FNR,FPR,PRECISION (PPV),TNR,TPR
0,ALL_FEATURES,ALL_VALUES,5.0,0.6,1.0,0.6,0.0,1.0,0.6,0.0,1.0
1,gender,0,2.0,0.5,1.0,0.5,0.0,1.0,0.5,0.0,1.0
2,gender,1,3.0,0.6667,1.0,0.6667,0.0,1.0,0.6667,0.0,1.0


#### Stratified Fairness Reports

The stratified classification_fairness reporter evaluates model fairness specific to each feature-value subset. It assumes each feature-value as the "privileged" group relative to all other possible values for the feature. For example, row 3 in the table below displaying measures of "col1" value of "2" where 2 is considered to be the privileged group and all other values (1, 2, 45, and 50) are considered unprivileged.

To simplify the report, fairness measures have been simplified to their component parts. For example, measures of Equalized Odds can be determined by combining the True Positive Rate (TPR) Ratios & Differences with False Positive Rate (FPR) Ratios & Differences.

See also: [Fairness Quick References](../docs/Fairness_Quick_References.pdf) and the [Tutorial for Evaluating Fairness in Binary Classification](./Tutorial-EvaluatingFairnessInBinaryClassification.ipynb)

In [8]:
stratified_reports.classification_fairness(X_test[['gender', 'col1']], y_test, model_1.predict(X_test))

Unnamed: 0,FEATURE,FEATURE VALUE,N OBS,FNR Diff,FNR Ratio,FPR Diff,FPR Ratio,PPV Diff,PPV Ratio,TNR Diff,TNR Ratio,TPR Diff,TPR Ratio
0,gender,0,2.0,-0.0,0.5,0.0,1.0,0.1667,0.1667,0.0,1.0,0.0,1.0
1,gender,1,3.0,0.0,2.0,0.0,1.0,-0.1667,-0.1667,0.0,1.0,-0.0,1.0
2,col1,1,1.0,-0.0,0.5,0.5,2.0,-0.5,-0.5,-0.5,0.0,0.0,1.0
3,col1,2,1.0,-0.0,0.5,0.5,2.0,-0.5,-0.5,-0.5,0.0,0.0,1.0
4,col1,3,1.0,-0.0,0.5,0.5,2.0,-0.5,-0.5,-0.5,0.0,0.0,1.0
5,col1,45,1.0,-0.5,0.0,0.0,1.0,0.75,0.75,0.0,1.0,0.5,2.0
6,col1,50,1.0,-0.5,0.0,0.0,1.0,0.75,0.75,0.0,1.0,0.5,2.0


### Comparing Results for Multiple Models

The compare_models feature can be used to generate side-by-side fairness comparisons of multiple models. Model performance metrics such as accuracy and precision are also provided to facilitate comparison.   

Below is an example output comparing the two example models defined above. Missing values have been added for metrics requiring prediction probabilities, which the second model does not have (note the warning below).

In [9]:
# Pass the data and models to the compare models function, as above
comparison1 = fhmc.compare_models(X_test, y_test, prtc_attr, model_dict)

# Add highlights
flag(comparison1)

Probabilities could not be generated for the following models['model_2']. Please note that dependent metrics will appear as missing in the results.


Unnamed: 0_level_0,Unnamed: 1_level_0,model_1,model_2
Metric,Measure,Unnamed: 2_level_1,Unnamed: 3_level_1
Data Metrics,Prevalence of Privileged Class (%),60.0,60.0
Group Fairness,AUC Difference,0.0,
Group Fairness,Balanced Accuracy Difference,0.0,0.0
Group Fairness,Balanced Accuracy Ratio,1.0,1.0
Group Fairness,Disparate Impact Ratio,1.0,1.0
Group Fairness,Equalized Odds Difference,0.0,0.0
Group Fairness,Equalized Odds Ratio,1.0,1.0
Group Fairness,Positive Predictive Parity Difference,-0.1667,-0.1667
Group Fairness,Statistical Parity Difference,0.0,0.0
Individual Fairness,Between-Group Gen. Entropy Error,0.0017,0.0017


The compare_models function can also be used to measure two different protected attributes. Protected attributes are measured separately and cannot yet be combined together with this tool.

In [10]:
commparison2 = fhmc.compare_models(X_test, y_test, 
                     [X_test['gender'], X_test['ethnicity']], 
                      {'gender':model_1, 'ethnicity':model_1})
flag(commparison2)

Unnamed: 0_level_0,Unnamed: 1_level_0,gender,ethnicity
Metric,Measure,Unnamed: 2_level_1,Unnamed: 3_level_1
Group Fairness,Statistical Parity Difference,0.0,0.0
Group Fairness,Disparate Impact Ratio,1.0,1.0
Group Fairness,Equalized Odds Difference,0.0,0.0
Group Fairness,Equalized Odds Ratio,1.0,1.0
Group Fairness,Positive Predictive Parity Difference,-0.1667,0.1667
Group Fairness,Balanced Accuracy Difference,0.0,0.0
Group Fairness,Balanced Accuracy Ratio,1.0,1.0
Group Fairness,AUC Difference,0.0,0.0
Individual Fairness,Consistency Score,1.0,1.0
Individual Fairness,Between-Group Gen. Entropy Error,0.0017,0.0017
