## Exercise 8: Discrimination

### Task 1: Assessing Algorithmic Unfairness

In this task we are using the German credit dataset that was introduced in lecture. Note that this is already an adaptation from the original data that can be found at the UCI ML library. A documentation regarding the categorical values can also be found here: https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)  
Note that we adapt the target "credit" attribute such that a '1' indicates a good credit risk, and a '0' indicates a bad credit risk.  
Recall that an age-based discrimination was found to exist in the dataset.

In [1]:
import numpy as np
import pandas as pd
df = pd.read_csv ("german_credit.csv")
df.credit = df.credit.replace(2,0)
df.head()

Unnamed: 0,status,month,credit_history,purpose,credit_amount,savings,employment,investment_as_income_percentage,sex,other_debtors,...,property,age,installment_plans,housing,number_of_credits,skill_level,people_liable_for,telephone,foreign_worker,credit
0,A11,6,A34,A43,1169,A65,A75,4,male,A101,...,A121,67,A143,A152,2,A173,1,A192,A201,1
1,A12,48,A32,A43,5951,A61,A73,2,female,A101,...,A121,22,A143,A152,1,A173,1,A191,A201,0
2,A14,12,A34,A46,2096,A61,A74,2,male,A101,...,A121,49,A143,A152,1,A172,2,A191,A201,1
3,A11,42,A32,A42,7882,A61,A74,2,male,A103,...,A122,45,A143,A153,1,A173,2,A191,A201,1
4,A11,24,A33,A40,4870,A61,A73,3,male,A101,...,A124,53,A143,A153,2,A173,2,A191,A201,0


#### a) Analysing Age Disparity
We want to split all individuals in the data in two age groups, namely people that are older than 25 years, and people that are not. How many people are in each group, and what is the ratio of good credit scores among each group?

In [4]:
n_old = sum(df.age>25)
n_young = sum(df.age<=25)

print(n_old)
print(n_young)

n_old_good = sum((df.age>25) & (df.credit == 1))
n_young_good = sum((df.age<=25) & (df.credit == 1))


old_ratio = n_old_good/n_old
young_ratio = n_young_good/n_young
print(old_ratio)
print(young_ratio)

810
190
0.7283950617283951
0.5789473684210527


#### b) Predicting Credit Score
Train a logistic regression classifier to predict the class attribute using all other attributes as predictor. Remember to dummy-code and scale your data first. Train the classifier on the whole training data and keep its prediction on the training data.

In [5]:
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression

In [6]:
#the vector for the class attribute
y = df ['credit'].values
#y = preprocessing.LabelEncoder().fit_transform(y)

#The vector for the features
df_features = df.drop("credit", axis=1)
df_features = pd.get_dummies(df_features)
X = preprocessing.scale(df_features)
X

array([[-1.23647786, -0.74513141,  0.91847717, ...,  1.21459768,
         0.19601428, -0.19601428],
       [ 2.24819436,  0.94981679, -0.87018333, ..., -0.82331789,
         0.19601428, -0.19601428],
       [-0.73866754, -0.41656241, -0.87018333, ..., -0.82331789,
         0.19601428, -0.19601428],
       ...,
       [-0.73866754, -0.87450324,  0.91847717, ..., -0.82331789,
         0.19601428, -0.19601428],
       [ 1.9992892 , -0.50552769,  0.91847717, ...,  1.21459768,
         0.19601428, -0.19601428],
       [ 1.9992892 ,  0.46245715,  0.02414692, ..., -0.82331789,
         0.19601428, -0.19601428]])

In [7]:
clf = LogisticRegression()
clf.fit (X,y)
y_pred = clf.predict(X)



#### c) Measuring Algorithmic Fairness 

For each of the following measures, write a functions that computes the following measures. 
* Disparate Impact
* Calders and Verwers' measure
* s-Accuracy
* s-TPR and s-TNR
* s-BCR

You may assume that both the class attribute as well as the sensitive attribute is binary, with '1' indicating the 'good class' and the privileged group, respectively.
Apply these measures on your predictions from b).
What do you observe?

In [8]:
# INPUT PARAMETERS:
# y_pred: numpy array of binary predicted classes
# y: numpy array of true classes
# s_arr: numpy array of the sensitive attribute
# s: attribute value to compute s-accuracy/tpr/tnr for


def disparate_impact(y_pred, s_arr):
    pos_mask = s_arr==1
    neg_mask = s_arr==0
    
    return np.sum(y_pred[neg_mask])/np.sum(neg_mask)/np.sum(y_pred[pos_mask])*np.sum(pos_mask)


def cv_measure(y_pred, s_arr):
    pos_mask = s_arr==1
    neg_mask = s_arr==0
    
    return 1 - np.sum(y_pred[pos_mask])/np.sum(pos_mask) + np.sum(y_pred[neg_mask])/np.sum(neg_mask)
    

def s_acc(y,y_pred, s_arr, s):
    mask = s_arr==s
    return np.sum((y==y_pred)[mask])/np.sum(mask)

# return the tuple (s-TPR,s_TNR)
def s_true_rates(y,y_pred, s_arr, s):
    mask_1 = (s_arr==s) & (y == 1)
    mask_0 = (s_arr==s) & (y == 0)
    return np.sum(((y_pred==1))[mask_1])/np.sum(mask_1),np.sum(((y_pred==0))[mask_0])/np.sum(mask_0)

def s_bcr(y,y_pred, s_arr, s):
    rates = s_true_rates(y,y_pred, s_arr, s)
    return .5*(rates[0]+rates[1])


In [9]:
s_arr = (df.age>25).to_numpy().astype(int)

print(disparate_impact(y_pred,s_arr))
print(cv_measure(y_pred,s_arr))
print(s_acc(y,y_pred,s_arr,1))
print(s_acc(y,y_pred,s_arr,0))
print(s_true_rates(y,y_pred,s_arr,1))
print(s_true_rates(y,y_pred,s_arr,0))
print(s_bcr(y,y_pred,s_arr,1))
print(s_bcr(y,y_pred,s_arr,0))

0.8139362490733877
0.8532163742690059
0.808641975308642
0.6947368421052632
(0.9101694915254237, 0.5363636363636364)
(0.7909090909090909, 0.5625)
0.72326656394453
0.6767045454545455


#### d) Omitting the Sensitive Attribute 

Train another classifier without the age attribute for your classifier, and recompute the values of all fairness measures from c). Do the values change significantly?

In [10]:
from sklearn.ensemble import RandomForestClassifier
clf = LogisticRegression()
X1 = preprocessing.scale(df_features.drop("age",axis=1))
clf.fit (X1,y)
y_pred1 = clf.predict(X1)



In [11]:
print(disparate_impact(y_pred1,s_arr))
print(cv_measure(y_pred1,s_arr))
print(s_acc(y,y_pred1,s_arr,1))
print(s_acc(y,y_pred1,s_arr,0))
print(s_true_rates(y,y_pred1,s_arr,1))
print(s_true_rates(y,y_pred1,s_arr,0))
print(s_bcr(y,y_pred1,s_arr,1))
print(s_bcr(y,y_pred1,s_arr,0))

0.8378848063555115
0.8727095516569201
0.8074074074074075
0.7
(0.9067796610169492, 0.5409090909090909)
(0.8090909090909091, 0.55)
0.72384437596302
0.6795454545454546


__Answer:__ The changes we observe are only marginal.

### Task 2: Fair Ranking

Consider the two lists in the cell below, which represent the members of a majority group and a minority group.
Every tuple per list represents a group member, where the first element identifies the group member, and the second elements represent the score based on which we would like to rank the members of each group against each other in a fair manner.  
Note that these lists are already sorted/ranked by their score, which you can also implicitly assume to be the case in the upcoming subtasks.

In [12]:
scores_majority = [('majority_0', 0.96807446469624236),
 ('majority_1', 0.95419384037453137),
 ('majority_2', 0.93211641632869746),
 ('majority_3', 0.9182712009369044),
 ('majority_4', 0.90938858688860358),
 ('majority_5', 0.89292152918031786),
 ('majority_6', 0.84448834214088997),
 ('majority_7', 0.84443217447581942),
 ('majority_8', 0.79877688788630141),
 ('majority_9', 0.72247591245073473),
 ('majority_10', 0.69229468159176299),
 ('majority_11', 0.65357596960054976),
 ('majority_12', 0.58813714673468209),
 ('majority_13', 0.56708226044663301),
 ('majority_14', 0.56489951776292002),
 ('majority_15', 0.52643423000490386),
 ('majority_16', 0.48127524553556911),
 ('majority_17', 0.42980570045372379),
 ('majority_18', 0.39275842622357748),
 ('majority_19', 0.33121051753313779),
 ('majority_20', 0.32231417740354318),
 ('majority_21', 0.31066018022909625),
 ('majority_22', 0.30898366593860171),
 ('majority_23', 0.25933833855712163),
 ('majority_24', 0.2504469397048189),
 ('majority_25', 0.24394092885783314),
 ('majority_26', 0.21407896534542137),
 ('majority_27', 0.20145807259008808),
 ('majority_28', 0.16928016568204085),
 ('majority_29', 0.14953531474810089),
 ('majority_30', 0.14179052094927069),
 ('majority_31', 0.13812332959151996),
 ('majority_32', 0.12545008434640159),
 ('majority_33', 0.11349156356515289),
 ('majority_34', 0.1077974496597377),
 ('majority_35', 0.10044678297249221),
 ('majority_36', 0.091348281546486554),
 ('majority_37', 0.061912827392595737),
 ('majority_38', 0.052371984692786588),
 ('majority_39', 0.026194153696491584)]

scores_minority = [('minority_0', 0.50446739782758911),
 ('minority_1', 0.34613151169850636),
 ('minority_2', 0.13144603620260447),
 ('minority_3', 0.12642371774573849),
 ('minority_4', 0.09909433134008121),
 ('minority_5', 0.0750708981671464),
 ('minority_6', 0.05135236597003786),
 ('minority_7', 0.04519179427505781),
 ('minority_8', 0.03729934154197691),
 ('minority_9', 0.035886146647779346)]

#### a) Ranking by Score
Write a function that compiles a list of tuples from a majority group and a minority group, which are ranked simply based on their score, without considering group membership. Apply your function on the lists given above.

In [14]:
# INPUT VALUES:
# majority: list of tuples indicating members in majority (non-protected) group.
# minority: list of tuples indicating members in minority (protected) group.

def rank_by_score(majority, minority):
    majority = majority.copy()
    minority = minority.copy()
    
    # fastest solution: concat and sort lists
    result = majority+minority
    result.sort(key=lambda x: x[1], reverse=True)
    return result

In [15]:
r1 = rank_by_score(scores_majority, scores_minority)
r1

[('majority_0', 0.9680744646962424),
 ('majority_1', 0.9541938403745314),
 ('majority_2', 0.9321164163286975),
 ('majority_3', 0.9182712009369044),
 ('majority_4', 0.9093885868886036),
 ('majority_5', 0.8929215291803179),
 ('majority_6', 0.84448834214089),
 ('majority_7', 0.8444321744758194),
 ('majority_8', 0.7987768878863014),
 ('majority_9', 0.7224759124507347),
 ('majority_10', 0.692294681591763),
 ('majority_11', 0.6535759696005498),
 ('majority_12', 0.5881371467346821),
 ('majority_13', 0.567082260446633),
 ('majority_14', 0.56489951776292),
 ('majority_15', 0.5264342300049039),
 ('minority_0', 0.5044673978275891),
 ('majority_16', 0.4812752455355691),
 ('majority_17', 0.4298057004537238),
 ('majority_18', 0.3927584262235775),
 ('minority_1', 0.34613151169850637),
 ('majority_19', 0.3312105175331378),
 ('majority_20', 0.3223141774035432),
 ('majority_21', 0.31066018022909625),
 ('majority_22', 0.3089836659386017),
 ('majority_23', 0.25933833855712163),
 ('majority_24', 0.25044693

#### b) The _rND_ score
Write a function that computes the _rND_ score of a given ranking, using the signature in the cell below. Then apply this function to measure the fairness of your ranking resulting from a)!

In [16]:
# tau: list of binary values which indicate whether an element is in the protected group or not
# -> ranking is implicitly given by the order of elements in this vector
# step: integer indicating the distance between all cut-off points
def rND(tau, step = 10):
    
    def rnd_sum(tau,step):
        i = step
        res = 0
        while i <= N:
            res += 1/np.log2(i)*np.abs(np.sum(tau[:i])/i - r_S)
            i += step
        return res
    
    N = len(tau)
    n_S = np.sum(tau)
    r_S = n_S/N
    
    # compute Z
    Z = rnd_sum(np.sort(tau),step)
    
    return (1/Z*rnd_sum(tau,step))

In [17]:
tau1 = [0 if "majo" in t[0] else 1 for t in r1]
rND(tau1)

0.7622270966633158

#### c) Randomized Ranking

Write a function that computes a randomized ranking according to the ranking generator presented in lecture. Use the signature in the cell below, i.e. in the input we assume that the ranking has already been split into protected and non-protected group.  
Apply this function on the given scores ad compute the rND score of the resulting ranking!

In [18]:
def rank_fair_random(majority, minority, fairness_prob = None):
    
    majority = majority.copy()
    minority = minority.copy()
    result = []
    
    N = len(minority) + len(majority)
    
    if fairness_prob is None:
        fairness_prob = len (minority) / N
        
    minority_in_result = 0
    
    next_elem_majority = majority.pop(0)
    next_elem_minority = minority.pop(0)
    while next_elem_minority is not None or next_elem_majority is not None:
        
        p = np.random.uniform()
        if p < fairness_prob and not next_elem_minority is None:
            result.append(next_elem_minority)
            if len(minority) > 0:
                next_elem_minority = minority.pop(0)
            else:
                next_elem_minority = None
        elif not next_elem_majority is None:
            result.append(next_elem_majority)
            if len(majority) > 0:
                next_elem_majority = majority.pop(0)
            else:
                next_elem_majority = None
        else:
            result.append(next_elem_minority)
            if len(minority) > 0:
                next_elem_minority = minority.pop(0)
            else:
                next_elem_minority = None
                
    return result

In [23]:
r2 = rank_fair_random(scores_majority, scores_minority)
r2

[('majority_0', 0.9680744646962424),
 ('majority_1', 0.9541938403745314),
 ('minority_0', 0.5044673978275891),
 ('majority_2', 0.9321164163286975),
 ('majority_3', 0.9182712009369044),
 ('minority_1', 0.34613151169850637),
 ('majority_4', 0.9093885868886036),
 ('majority_5', 0.8929215291803179),
 ('minority_2', 0.13144603620260448),
 ('majority_6', 0.84448834214089),
 ('majority_7', 0.8444321744758194),
 ('majority_8', 0.7987768878863014),
 ('minority_3', 0.1264237177457385),
 ('majority_9', 0.7224759124507347),
 ('majority_10', 0.692294681591763),
 ('majority_11', 0.6535759696005498),
 ('majority_12', 0.5881371467346821),
 ('majority_13', 0.567082260446633),
 ('majority_14', 0.56489951776292),
 ('majority_15', 0.5264342300049039),
 ('minority_4', 0.09909433134008121),
 ('minority_5', 0.0750708981671464),
 ('majority_16', 0.4812752455355691),
 ('majority_17', 0.4298057004537238),
 ('majority_18', 0.3927584262235775),
 ('majority_19', 0.3312105175331378),
 ('majority_20', 0.322314177403

In [24]:
tau2 = [0 if "majo" in t[0] else 1 for t in r2]
rND(tau2)

[0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


0.2504653421393267

#### d) Deterministic Ranking 
Another approach to creating a fair ranking is based on computing the number of protected group members at each rank which would be required to not obtain significant unfairness with respect to according to a statistical test.  
Thus, for each rank i, you would have a minimum number $m(i)$ of members of the protected class required in the ranking.
The ranking algorithm would then, at every rank $i$:
* Check if the minimum mumber m(i) of protected group members has occured in the class, and 
    * if yes, place the best remaining candidate of all at rank $i$
    * if no, place the best remaining member of the minority group at rank $i$
    
The function in the cell below computes the minimum number of occurences of protected class members at a fixed position with respect to a _binomial test_ with significance threshold $\alpha = 0.01$.

In [25]:
from scipy.stats import binom_test

def minimum_occurrence (position, fairness_prob, threshold=0.1):
    for i in range (position):
        if binom_test(i, position, p=fairness_prob) > threshold:
            return i
    return position

# test functionality to get minimum number required at rank 50
proportion = len (scores_minority) / (len(scores_minority) + len(scores_majority))
minimum_occurrence (50, proportion)

6

Apply this function to write a deterministic ranking algorithm which ensures that at each position in the ranking, the minimum number of protected class members according to the given test has occurred. Again, use the signature in the cell below, and again evaluate the ranking by computing the rND score!

In [30]:
def rank_fair_deterministic (majority, minority, fairness_prob = None):
    majority = majority.copy()
    minority = minority.copy()
    result = []
    if fairness_prob is None:
        fairness_prob = len (scores_minority) / (len(scores_minority) + len(scores_majority))
        
    minority_in_result = 0
    no_elements = len(majority) + len(minority)
    
    next_elem_majority = majority.pop(0)
    next_elem_minority = minority.pop(0)
    while len(result) < no_elements:
        minority_required = minimum_occurrence(len(result) + 1, fairness_prob)
        if not next_elem_minority is None and (next_elem_majority is None or minority_in_result < minority_required or next_elem_minority[1] > next_elem_majority[1]):
            result.append(next_elem_minority)
            minority_in_result = minority_in_result + 1
            if len(minority) > 0:
                next_elem_minority = minority.pop(0)
            else:
                next_elem_minority = None
        else:
            result.append(next_elem_majority)
            if len(majority) > 0:
                next_elem_majority = majority.pop(0)
            else:
                next_elem_majority = None
    return result
    

In [33]:
r3 = rank_fair_deterministic(scores_majority, scores_minority)
tau3 = [0 if "majo" in t[0] else 1 for t in r3]
rND(tau3) 

0.7254717082405626

__Note:__ This score is slightly better than than what results from the first ranking, but this deterministiv ranking approach including that statistical test does not reflect the premise of the rND score.