## Fairness in regression

In this project, we show how to check if the regression model discriminates a particular subgroup.

In [None]:
import pandas as pd 
import numpy as np

### Data
We use the [Communitties and Crime data](https://archive.ics.uci.edu/ml/datasets/communities+and+crime) and aim to predict the ```ViolentCrimesPerPop``` variable (total number of violent crimes per 100K population).

The protected attribute is the ```racepctblack``` value (part of the population identifying as black).

In [None]:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data", header=None, na_values=["?"])
from urllib.request import urlopen
names = urlopen("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names")
columns = [line.split(b' ')[1].decode("utf-8") for line in names if line.startswith(b'@attribute')]
data.columns = columns
data = data.dropna(axis = 1)
data = data.iloc[:, 3:]
data.head()
X = data.drop('ViolentCrimesPerPop', axis=1)
y = data.ViolentCrimesPerPop

### Model
In this part, we first split the data into training data and test data, then train the one regressor model: **Decision Tree** with the training data, finally evaluate the test data.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [None]:
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
prediction = dx.Explainer(model, X_test, y_test, verbose=False) # evaluate the test data 

### Fairness in the model prediction
We then assess models' fairness. To make sure that the models are fair, we will be checking three independence criteria. These are:
* independence: R⊥A
* separation: R⊥A ∣ Y
* sufficiency: Y⊥A ∣ R

Where:
* A - protected group
* Y - target
* R - model's prediction

In [None]:
import dalex as dx

protected = np.where(X_test.racepctblack >= 0.5, 'majority_black', "else")
privileged = 'else'
fobject = prediction.model_fairness(protected, privileged)
fobject.fairness_check()

# Plotting the result

In [None]:
fobject.plot()