## Exercise 8: Discrimination

### Task 1: Assessing Algorithmic Unfairness

In this task we are using the German credit dataset that was introduced in lecture. Note that this is already an adaptation from the original data that can be found at the UCI ML library. A documentation regarding the categorical values can also be found here: https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)  
Note that we adapt the target "credit" attribute such that a '1' indicates a good credit risk, and a '0' indicates a bad credit risk.  
Recall that an age-based discrimination was found to exist in the dataset.

In [15]:
import numpy as np
import pandas as pd
df = pd.read_csv ("german_credit.csv")
df.credit = df.credit.replace(2,0)
df.head(100)

Unnamed: 0,status,month,credit_history,purpose,credit_amount,savings,employment,investment_as_income_percentage,sex,other_debtors,...,property,age,installment_plans,housing,number_of_credits,skill_level,people_liable_for,telephone,foreign_worker,credit
0,A11,6,A34,A43,1169,A65,A75,4,male,A101,...,A121,67,A143,A152,2,A173,1,A192,A201,1
1,A12,48,A32,A43,5951,A61,A73,2,female,A101,...,A121,22,A143,A152,1,A173,1,A191,A201,0
2,A14,12,A34,A46,2096,A61,A74,2,male,A101,...,A121,49,A143,A152,1,A172,2,A191,A201,1
3,A11,42,A32,A42,7882,A61,A74,2,male,A103,...,A122,45,A143,A153,1,A173,2,A191,A201,1
4,A11,24,A33,A40,4870,A61,A73,3,male,A101,...,A124,53,A143,A153,2,A173,2,A191,A201,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,A12,54,A30,A49,15945,A61,A72,3,male,A101,...,A124,58,A143,A151,1,A173,1,A192,A201,0
96,A14,12,A34,A46,2012,A65,A74,4,female,A101,...,A123,61,A143,A152,1,A173,1,A191,A201,1
97,A12,18,A32,A49,2622,A62,A73,4,male,A101,...,A123,34,A143,A152,1,A173,1,A191,A201,1
98,A12,36,A34,A43,2337,A61,A75,4,male,A101,...,A121,36,A143,A152,1,A173,1,A191,A201,1


#### a) Analysing Age Disparity
We want to split all individuals in the data in two age groups, namely people that are older than 25 years, and people that are not. How many people are in each group, and what is the ratio of good credit scores among each group?

In [8]:
df_over25 = df[df['age'] > 25]
df_under25 = df[df['age'] <= 25]
print("The number of people over 25 =", df_over25.shape[0])
print("The number of people under 25 =",df_under25.shape[0])

num_over25_good_credit = df_over25[df_over25['credit'] == 1].shape[0]
num_under25_good_credit = df_under25[df_under25['credit'] == 1].shape[0]
print("ratio of good credit scores in people over 25 =", num_over25_good_credit/df_over25.shape[0])
print("ratio of good credit scores in people under 25 =", num_under25_good_credit/df_under25.shape[0])

The number of people over 25 = 810
The number of people under 25 = 190
ratio of good credit scores in people over 25 = 0.7283950617283951
ratio of good credit scores in people under 25 = 0.5789473684210527


#### b) Predicting Credit Score
Train a logistic regression classifier to predict the class attribute using all other attributes as predictor. Remember to dummy-code and scale your data first. Train the classifier on the whole training data and keep its prediction on the training data.

In [30]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

def mc_predict(X_train,y_train,X_test):
    Y = np.unique(y_train)
    M = len(Y)
    
    # collect class-wise confidence scores in this matrix
    y_probs = np.zeros((X_test.shape[0],M))
    # iterate over all classes
    for i in range(M):
        # fit binary model
        y_curr = np.array(y_train==Y[i]).astype(int) 
        clf = LogisticRegression()
        clf.fit(X_train, y_curr)
        y_probs[:,i] = clf.predict_proba(X_test)[:,1]

    # return classes that yielded highest confidence
    return np.array([Y[i] for i in np.argmax(y_probs, axis=1)])

data = df
data = pd.get_dummies(data)
credits = data['credit']
features = data.drop(columns = ['credit'])
X_train, X_test, y_train, y_test = train_test_split(features, credits, test_size = 0.33, random_state = 1)

y_pred = mc_predict(X_train,y_train,X_test)
print("accuracy_score =", accuracy_score(y_test,y_pred))
print(y_pred)

accuracy_score = 0.7666666666666667
[1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1
 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1
 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1
 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 0
 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 0 0
 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 1
 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1]




#### c) Measuring Algorithmic Fairness 

For each of the following measures, write a functions that computes the following measures. 
* Disparate Impact
* Calders and Verwers' measure
* s-Accuracy
* s-TPR and s-TNR
* s-BCR

You may assume that both the class attribute as well as the sensitive attribute is binary, with '1' indicating the 'good class' and the privileged group, respectively.
Apply these measures on your predictions from b).
What do you observe?

In [None]:
# INPUT PARAMETERS:
# y_pred: numpy array of binary predicted classes
# y: numpy array of true classes
# s_arr: numpy array of the sensitive attribute
# s: attribute value to compute s-accuracy/tpr/tnr for
from sklearn.metrics import roc_curve


def disparate_impact(y_pred, s_arr):
    # your code here


def cv_measure(y_pred, s_arr):
    # your code here

    
def s_acc(y,y_pred, s_arr, s):
    # your code here
    
    
# return the tuple (s-TPR,s_TNR)
def s_true_rates(y,y_pred, s_arr, s):
    # your code here
    
    
def s_bcr(y,y_pred, s_arr, s):
    # your code here

#### d) Omitting the Sensitive Attribute 

Train another classifier without the age attribute for your classifier, and recompute the values of all fairness measures from c). Do the values change significantly?

### Task 2: Fair Ranking

Consider the two lists in the cell below, which represent the members of a majority group and a minority group.
Every tuple per list represents a group member, where the first element identifies the group member, and the second elements represent the score based on which we would like to rank the members of each group against each other in a fair manner.  
Note that these lists are already sorted/ranked by their score, which you can also implicitly assume to be the case in the upcoming subtasks.

In [14]:
scores_majority = [('majority_0', 0.96807446469624236),
 ('majority_1', 0.95419384037453137),
 ('majority_2', 0.93211641632869746),
 ('majority_3', 0.9182712009369044),
 ('majority_4', 0.90938858688860358),
 ('majority_5', 0.89292152918031786),
 ('majority_6', 0.84448834214088997),
 ('majority_7', 0.84443217447581942),
 ('majority_8', 0.79877688788630141),
 ('majority_9', 0.72247591245073473),
 ('majority_10', 0.69229468159176299),
 ('majority_11', 0.65357596960054976),
 ('majority_12', 0.58813714673468209),
 ('majority_13', 0.56708226044663301),
 ('majority_14', 0.56489951776292002),
 ('majority_15', 0.52643423000490386),
 ('majority_16', 0.48127524553556911),
 ('majority_17', 0.42980570045372379),
 ('majority_18', 0.39275842622357748),
 ('majority_19', 0.33121051753313779),
 ('majority_20', 0.32231417740354318),
 ('majority_21', 0.31066018022909625),
 ('majority_22', 0.30898366593860171),
 ('majority_23', 0.25933833855712163),
 ('majority_24', 0.2504469397048189),
 ('majority_25', 0.24394092885783314),
 ('majority_26', 0.21407896534542137),
 ('majority_27', 0.20145807259008808),
 ('majority_28', 0.16928016568204085),
 ('majority_29', 0.14953531474810089),
 ('majority_30', 0.14179052094927069),
 ('majority_31', 0.13812332959151996),
 ('majority_32', 0.12545008434640159),
 ('majority_33', 0.11349156356515289),
 ('majority_34', 0.1077974496597377),
 ('majority_35', 0.10044678297249221),
 ('majority_36', 0.091348281546486554),
 ('majority_37', 0.061912827392595737),
 ('majority_38', 0.052371984692786588),
 ('majority_39', 0.026194153696491584)]

scores_minority = [('minority_0', 0.50446739782758911),
 ('minority_1', 0.34613151169850636),
 ('minority_2', 0.13144603620260447),
 ('minority_3', 0.12642371774573849),
 ('minority_4', 0.09909433134008121),
 ('minority_5', 0.0750708981671464),
 ('minority_6', 0.05135236597003786),
 ('minority_7', 0.04519179427505781),
 ('minority_8', 0.03729934154197691),
 ('minority_9', 0.035886146647779346)]

#### a) Ranking by Score
Write a function that compiles a list of tuples from a majority group and a minority group, which are ranked simply based on their score, without considering group membership. Apply your function on the lists given above.

In [None]:
# INPUT VALUES:
# majority: list of tuples indicating members in majority (non-protected) group.
# minority: list of tuples indicating members in minority (protected) group.

def rank_by_score(majority, minority):
    # your code here

#### b) The _rND_ score
Write a function that computes the _rND_ score of a given ranking, using the signature in the cell below. Then apply this function to measure the fairness of your ranking resulting from a)!

In [None]:
# tau: list of binary values which indicate whether an element is in the protected group or not
# -> ranking is implicitly given by the order of elements in this vector
# step: integer indicating the distance between all cut-off points
def rND(tau, step = 10):
    # your code here

#### c) Randomized Ranking

Write a function that computes a randomized ranking according to the ranking generator presented in lecture. Use the signature in the cell below, i.e. in the input we assume that the ranking has already been split into protected and non-protected group.  
Apply this function on the given scores ad compute the rND score of the resulting ranking!

In [None]:
# minority, majority as in a)
# fairness_prob is the fairness probability as discussed in lecture
# -> by default it should represent the fraction of the minority group over all people from both groups considered

def rank_fair_random(majority, minority, fairness_prob = None):
    # your code here

#### d) Deterministic Ranking 
Another approach to creating a fair ranking is based on computing the number of protected group members at each rank which would be required to not obtain significant unfairness with respect to according to a statistical test.  
Thus, for each rank i, you would have a minimum number $m(i)$ of members of the protected class required in the ranking.
The ranking algorithm would then, at every rank $i$:
* Check if the minimum mumber m(i) of protected group members has occured in the class, and 
    * if yes, place the best remaining candidate of all at rank $i$
    * if no, place the best remaining member of the minority group at rank $i$
    
The function in the cell below computes the minimum number of occurences of protected class members at a fixed position with respect to a _binomial test_ with significance threshold $\alpha = 0.01$.

In [None]:
from scipy.stats import binom_test

def minimum_occurrence (position, fairness_prob, threshold=0.1):
    for i in range (position):
        if binom_test(i, position, p=fairness_prob) > threshold:
            return i
    return position

# test functionality to get minimum number required at rank 50
proportion = len (scores_minority) / (len(scores_minority) + len(scores_majority))
minimum_occurrence (50, proportion)

Apply this function to write a deterministic ranking algorithm which ensures that at each position in the ranking, the minimum number of protected class members according to the given test has occurred. Again, use the signature in the cell below, and again evaluate the ranking by computing the rND score!

In [None]:
# same signature as in c)
def rank_fair_deterministic (majority, minority, fairness_prob = None):
    # your code here
    