<a href="https://colab.research.google.com/github/VectorInstitute/Causal_Inference_Laboratory/blob/fk-lab/fairness_analysis_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Packages

In [None]:
!git clone https://github.com/VectorInstitute/Causal_Inference_Laboratory.git
%cd Causal_Inference_Laboratory
!pip install flaml
!pip install tensorflow_addons
!pip install keras-tuner
!pip install econml

In [None]:
import numpy as np
from IPython.display import Image
import pandas as pd
import ipywidgets as widgets

from fairness.fairness_cookbook import fairness_cookbook
from fairness_analysis import load_data, plot_confidence_intervals

# Loading Data
Select the dataset you wish to work with.

In [None]:
dataset_name = widgets.Dropdown(
    options=[('Census', 0), ('Berkeley', 1), ('Compas', 2)],
    value=1,
    description='Dataset:',
)
dataset_name

In [None]:

dataset_name = dataset_name.label
if dataset_name == "Census":
  data_addr = "data/CFA/gov_census_numeric.csv"
elif dataset_name == "Berkeley":
  data_addr = "data/CFA/berkeley_numeric.csv"
elif dataset_name == "Compas":
  data_addr = "data/CFA/compas_numeric.csv"

data = load_data(data_addr)
    
data_header = list(pd.read_csv(data_addr).columns.values)[1:]
for i in range(len(data_header)):
  print("Column", i, ": ", data_header[i])

# Analyzing Data

In [None]:
Image('notebook_images/standard_fairness_model.png', height=300)

You should set indecis of the columns for each of the X, Z, W, Y variables in a list:
- X: the protected attribute (e.g., gender, race, religion) 		
- Z: the set of confounding variables, which are not causally influenced by X (e.g., demographic information, zip code)
- W: mediator variables that are possibly causally influenced by X (e.g., educational level or other job-related information)
- Y: the outcome variable (e.g., admissions, hiring, salary).

*x0* and *x1* are the values of the protected arribute.


In [None]:
if dataset_name == "Census":
    X = [0]
    Y = [11]
    W = [1, 2, 3, 4, 5, 16]
    Z = [6, 7, 8, 9, 10, 12, 13, 14, 15]
    x0 = 0
    x1 = 1
elif dataset_name == "Berkeley":
    X = [0]
    Y = [11]
    W = [1, 2, 3, 4, 5, 16]
    Z = [6, 7, 8, 9, 10, 12, 13, 14, 15]
    x0 = 0
    x1 = 1
elif dataset_name == "Compas":
    X = [2]
    Y = [8]
    W = [3, 4, 5, 6, 7]
    Z = [0, 1]
    x0 = 0
    x1 = 1 

# Experiment

## Initialization

Choose one of the estimators from the drop-down menu.

In [None]:
estimator = widgets.Dropdown(
    options=[('AutoML', 10), ('OLS1', 0), ('OLS2', 1), ('RF1', 2), ('RF2', 3), ('NN1', 4), ('NN2', 5), ('IPW', 6), ('DML', 7), ('TARNet', 8), ('Dragonnet', 9)],
    value=2,
    description='Estimator:',
)
estimator

Write down number of runs.

In [None]:
num_run_text = widgets.Text(
    value='',
    placeholder='Enter number of runs',
    description='Runs:',
    disabled=False
)
num_run_text

In [None]:
num_rows_2_sample_text = widgets.Text(
    value='',
    placeholder='Enter number of rows to sample in each run',
    description='Number of samples:',
    disabled=False
)
num_rows_2_sample_text

## Running the experiments
By running the cell below, you will get the fairness metrics estimation for each run.

In [None]:
estimator_name = estimator.label
num_run = int(num_run_text.value)
num_rows_2_sample = int(num_rows_2_sample_text.value)
all_metrics = np.zeros((num_run, 4))
for i in range(num_run):
    print("-" * 15 + " Run " + str(i) + " " + "-" * 15)
    data_ck = data[np.random.choice(data.shape[0], num_rows_2_sample, replace=False)]
    metrics = fairness_cookbook(data_ck, X = X, Z = Z, Y = Y, W = W,
                                x0 = x0, x1 = x1, estimator_name = estimator_name)
    all_metrics[i][0] = metrics["tv"]
    all_metrics[i][1] = metrics["ctfde"]
    all_metrics[i][2] = metrics["ctfie"]
    all_metrics[i][3] = metrics["ctfse"]

# Plots

By running the cell below, you will get the plot showing the estimation of the fairness metrics with 95% confidence interval.

In [None]:
plot_confidence_intervals(all_metrics)

In [None]:
Image('notebook_images/causal_effects.png', height=300)

TV: It is the total variation.

DE: It is the direct effect of X on Y.

IE: It is the indirect effect of X to Y through W.

SE: It is the spurious effect because of the confounder Z.