# Fairness metric

In this notebook, I'll define the fairness metric that will be used to evaluate the top 10 submissions on the private leaderboard.

Let's start by loading the job labels as well as the genders.

In [1]:
import pandas as pd

names = pd.read_csv('/kaggle/input/defi-ia-insa-toulouse/categories_string.csv')['0'].to_dict()
jobs = pd.read_csv('/kaggle/input/defi-ia-insa-toulouse/train_label.csv', index_col='Id')['Category']
jobs = jobs.map(names)
jobs = jobs.rename('job')
jobs.head()

Id
0     professor
1    accountant
2     professor
3     architect
4     architect
Name: job, dtype: object

In [2]:
genders = pd.read_json('/kaggle/input/defi-ia-insa-toulouse/train.json').set_index('Id')['gender']
genders.head()

Id
0    F
1    M
2    M
3    M
4    M
Name: gender, dtype: object

In [3]:
people = pd.concat((jobs, genders), axis='columns')
people.head()

Unnamed: 0_level_0,job,gender
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,professor,F
1,accountant,M
2,professor,M
3,architect,M
4,architect,M


The fairness metric is going to be what I call the "macro disparate impact". Essentially, we will look at the individual the disparate impact of each job with respect to both genders, and then compute the non-weighted average of these disparate impacts. Let's first look at the gender distribution for each job.

In [4]:
counts = people.groupby(['job', 'gender']).size().unstack('gender')
counts

gender,F,M
job,Unnamed: 1_level_1,Unnamed: 2_level_1
accountant,1129,1992
architect,1314,4527
attorney,7106,11714
chiropractor,391,1015
comedian,345,1294
composer,553,2842
dentist,1895,3555
dietitian,2120,168
dj,125,706
filmmaker,1394,2730


Now let's compute the disparate impact for each job.

In [5]:
counts['disparate_impact'] = counts[['M', 'F']].max(axis='columns') / counts[['M', 'F']].min(axis='columns')
counts.sort_values('disparate_impact', ascending=False)

gender,F,M,disparate_impact
job,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
dietitian,2120,168,12.619048
rapper,64,719,11.234375
nurse,11493,1129,10.179805
surgeon,890,5726,6.433708
yoga_teacher,803,141,5.695035
dj,125,706,5.648
software_engineer,613,3447,5.623165
paralegal,814,153,5.320261
composer,553,2842,5.139241
model,3398,717,4.739191


Now we can obtain the macro disparate impact by simply computing the average of the `disparate_impact` column.

In [6]:
counts['disparate_impact'].mean()

3.898171170378378

Let's write a function to do all of this in one step.

In [7]:
def macro_disparate_impact(people):
    counts = people.groupby(['job', 'gender']).size().unstack('gender')
    counts['disparate_impact'] = counts[['M', 'F']].max(axis='columns') / counts[['M', 'F']].min(axis='columns')
    return counts['disparate_impact'].mean()

people.head()

Unnamed: 0_level_0,job,gender
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,professor,F
1,accountant,M
2,professor,M
3,architect,M
4,architect,M


In [8]:
macro_disparate_impact(people)

3.898171170378378

The obtained value is the macro disparate impact for the labels in the training data. What if we want to evaluate the fairness of a model? Let's do that. 

We'll split the training data in two. We'll train a model on the first half of the training data and test on the remaining half.

In [9]:
from sklearn import model_selection

descriptions = pd.read_json('/kaggle/input/defi-ia-insa-toulouse/train.json').set_index('Id')['description']

X_train, X_test, y_train, y_test, gender_train, gender_test = model_selection.train_test_split(
    descriptions,
    jobs,
    genders,
    test_size=.5,
    random_state=42
)

We'll build a simple TF-IDF extractor followed by a multinomial classifier.

In [10]:
from sklearn import feature_extraction
from sklearn import linear_model
from sklearn import pipeline
from sklearn import preprocessing

model = pipeline.make_pipeline(
    feature_extraction.text.TfidfVectorizer(),
    preprocessing.Normalizer(),
    linear_model.LogisticRegression(multi_class='multinomial')
)

model = model.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [11]:
y_pred = model.predict(X_test)
y_pred = pd.Series(y_pred, name='job', index=X_test.index)
y_pred.head()

Id
83232       professor
19036       professor
35087    photographer
86945       professor
79762       professor
Name: job, dtype: object

In [12]:
test_people = pd.concat((y_pred, gender_test), axis='columns')
test_people

Unnamed: 0_level_0,job,gender
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
83232,professor,F
19036,professor,M
35087,photographer,M
86945,professor,M
79762,professor,F
...,...,...
141605,professor,F
104258,journalist,F
72072,professor,M
208591,psychologist,F


In [13]:
macro_disparate_impact(test_people)

5.112365210475933

The model has worsened the fairness metric! The goal of this competition is to develop a model that lowers the fairness metric. The minimum attainable value is 1. Good luck!