In [None]:
---
title: Allocative Bias Blog
author: Dean Smith
date: '2023-5-9'
image: "penguins.png"
description: "In this blog post, I use New Jersey employment data to build a machine learning model and examine the model for potential bias. "
format: html
---
# Auditing Allocative Bias
First, I import my dataset and packages I need for my research.
from folktables import ACSDataSource, ACSIncome, BasicProblem, adult_filter
import numpy as np

STATE = "NJ"

data_source = ACSDataSource(survey_year='2018', 
                            horizon='1-Year', 
                            survey='person')

acs_data = data_source.get_data(states=[STATE], download=True)

acs_data.head()
There is a lot of data here. I only want to use relevent features here so I will go ahead and clean my data.
possible_features=['AGEP', 'SCHL', 'MAR', 'RELP', 'DIS', 'ESP', 'CIT', 'MIG', 'MIL', 'ANC', 'NATIVITY', 'DEAR', 'DEYE', 'DREM', 'SEX', 'RAC1P', 'ESR']
acs_data[possible_features].head()


I don't want to include Race or Employment status in my features so I will leave those out and get my features, labels, and group (Race).
features_to_use = [f for f in possible_features if f not in ['RAC1P', "ESR"]]

EmploymentProblem = BasicProblem(
    features=features_to_use,
    target='ESR',
    target_transform=lambda x: x == 1,
    group='RAC1P',
    preprocess=lambda x: x,
    postprocess=lambda x: np.nan_to_num(x, -1),
)

features, label, group = EmploymentProblem.df_to_numpy(acs_data)
Lets see the shape of our features and the size of our dataset.
for obj in [features, label, group]:
  print(obj.shape)
Now lets do a test/train split.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test, group_train, group_test = train_test_split(
    features, label, group, test_size=0.2, random_state=0)
Here, we put back our group (race) to groupby and our labels (employment status).
import pandas as pd
df = pd.DataFrame(X_train, columns = features_to_use)
df["group"] = group_train
df["label"] = y_train

len(df)
From the output of the code cell below, we can see that about 48% of individuals in our training data are employed.
df["label"].mean()
Now, lets see how many people are in each racial group in our training data. Race group 1 is white, race group 2 is black, and the rest are other self-identified racial groups.
df.groupby("group").size()
Likewise, lets see what the employment rate for each racial group is.
df.groupby("group")["label"].mean()
Now lets visualize this data in a bar graph where we also show the different sex groups within the racial groups. Sex group 1 is male and sex group 2 is female.
import seaborn as sns
counts = df.groupby(["group", "SEX"])["label"].mean().reset_index(name = "mean")
sns.barplot(data = counts, x = "group", y = "mean", hue = "SEX")
In the code cell below, we train our Logistic Regression model and see which number of polynomial features works best.
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import PolynomialFeatures
import warnings

best_model = []
best_score = -1
best_dgegree = 0

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for deg in range(0, 4):

        polynomial_logistic = Pipeline([
            ('poly', PolynomialFeatures(degree=deg)),
            ('logistic', LogisticRegression())
        ])

        iter_model = polynomial_logistic.fit(X_train, y_train)

        acc = iter_model.score(X_train, y_train)

        print("Degree of " + str(deg) + " -> Training Accuracy = " + str(acc))

        if acc > best_score:
            best_score = acc
            best_degree = deg
            best_model.append(iter_model)

print("Best Degree for Model: " + str(best_degree))

model = best_model[-1]

Now lets get our model predictions for our testing data
y_hat = model.predict(X_test)
And now we see our model had a total accuracy of about 82%!
(y_hat == y_test).mean()
Below, we utilize a confusion matrix to get our model's PPV, FNR, and FPR.
tn, fp, fn, tp = confusion_matrix(y_test, y_hat, normalize='true').ravel()
ppv = tp / (tp + fp)
fnr = fn / (fn + tp)
fpr = fp / (fp + tn)

print("PPV: " + str(ppv))
print("FNR: " + str(fnr))
print("FPR: " + str(fpr))
Now lets see what our model's accuracy, PPV, FNR, and FPR are for both our white racial group and our black racial group.
for group in range(1, 10):
    try:
        overall_acc = (y_hat == y_test)[group_test == group].mean()
        test = y_test[group_test == group]
        pred = y_hat[group_test == group]
        tn, fp, fn, tp = confusion_matrix(test, pred, normalize='true').ravel()
        ppv = tp / (tp + fp)
        fnr = fn / (fn + tp)
        fpr = fp / (fp + tn)

        print("Group " + str(group) + " overall accuracy: " + str(overall_acc)+ " | PPV: " + str(ppv) + " | FNR: " + str(fnr) + " | FPR: " + str(fpr) + "\n")
    except:
        print("Group " + str(group) + " does not have sufficient enough data.\n")



As we can see, the model performs worse when dealing with black individuals as compared to white individuals. Additionally, the model performs better on white individuals than the total model average and worse on black individuals compared to the models average. However, the model had a higher PPV than the models average when predicting white individuals and a lower PPV when predicting black individuals. This means that when the model guesses that a white person is employed, the model correctly guesses it more than the average, but when the model guesses that a black person is employed, the model predicts worse than the average. In contrast, for black individuals, the model has a slightly higher FNR than the average meaning that when the model predicts a black person is unemployed then the model predicts wrong more than the average while the model's FNR for white indiviuals is about the average. Lastly,  
Here, we put some data into our test dataframe for graphing later.
import pandas as pd
df_test = pd.DataFrame(X_test, columns = features_to_use)
df_test["group"] = group_test
df_test["label"] = y_test
df_test["prediction"] = y_hat

len(df)
Now lets see if our model is calibrated.
df_test = df_test[df_test["group"] < 3]

means = df_test.groupby(["group", "prediction"])["label"].mean().reset_index(name = "mean")
sns.lineplot(data = means, x = "prediction", y = "mean", hue = "group")
df_test["employed"] = df_test["prediction"] == 1

means = df_test.groupby(["group", "employed"])["label"].mean().reset_index(name = "mean")

p = sns.barplot(data = means, x = "employed", y = "mean", hue = "group")