# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head(6)

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit
5,b,1,4,2,6,1,0,0,0,266,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,Private


Let's compute the number of call backs by race:

In [6]:
calls_by_race = data.groupby('race')['call'].sum()
calls_by_race

race
b    157.0
w    235.0
Name: call, dtype: float32

In [7]:
# Split and save into two different variables
n_black, n_white = data.groupby('race').size()[0], data.groupby('race').size()[1]
calls_black = calls_by_race[0] 
calls_white = calls_by_race[1] 
print("Ratio of call backs for white sounding names: {:.2f}".format(calls_white/n_white))
print("Ratio of call backs for black sounding names: {:.2f}".format(calls_black/n_black))

Ratio of call backs for white sounding names: 0.10
Ratio of call backs for black sounding names: 0.06


There seem to be a higher ratio of call backs for white sounding names compared to black sounding names. Since the number of instances is quite large, CLT would work. 

### Two sample binomial test

Let's apply the two sample binomial test. There are two probabilities that we need to estimate for this dataset, $p_b$ is the probability of call backs for black sounding names, and $p_w$ for white sounding names. Their sample estimates are

$$ {\hat p}_w = X/n $$
$$ {\hat p}_b = Y/n$$

where $X$ is the number of call backs for white sounding names, $Y$ is for black sounding names and $n$ is the common number of records (equal for both white and black). The test statisctic would be

$$ TS = \frac{{\hat p}_w - {\hat p}_b}{\sqrt{\frac{p_w(1-p_w)}{n} + \frac{p_b(1-p_b)}{n} }} $$

Since we do not now these population probabilities, we will use estimates ${\hat p}_w$ and ${\hat p}_w$ (i.e. Wald interval).

In [8]:
hat_p_w = calls_white/n_white
hat_p_b = calls_black/n_black
se = np.sqrt(hat_p_w*(1-hat_p_w)/n_white + hat_p_b*(1-hat_p_b)/n_black)
TS = ( hat_p_w - hat_p_b )/se
print("Test statistic = {:.2f}".format(TS))

Test statistic = 4.12


Our null hypothesis is that there is no difference between black and white sounding names:

$$ H_0: \qquad {\hat p}_w = {\hat p}_b $$

and the alternative is

$$ H_a: \qquad {\hat p}_w > {\hat p}_b $$

So, we can perform a one-sided test:

In [9]:
p_val = stats.norm.sf(TS)
print("p-value = {:.2e}".format(p_val))

p-value = 1.93e-05


which is very small, so we reject the null hypothesis.

Let's also provide a 95% confidence interval for $p_w - p_b$:

In [10]:
CI_low = (hat_p_w - hat_p_b) - stats.norm.ppf(0.975)*se
CI_high = (hat_p_w - hat_p_b) + stats.norm.ppf(0.975)*se
print("95% confidence interval = [{:.2f}, {:2f}]".format(CI_low, CI_high))

95% confidence interval = [0.02, 0.047288]


### Chi square test

Let's also apply chi squared test:

In [11]:
contingency_table = [[calls_white, n_white-calls_white],[calls_black, n_black-calls_black]]
chi2test = stats.chisquare(np.array(contingency_table), axis=None)
print("Chi square test p-value = {:.4f}".format(chi2test.pvalue))

Chi square test p-value = 0.0000


Again, we reject the null hypothesis.

### Fisher's exact test

[Fisher's exact test](https://en.wikipedia.org/wiki/Fisher's_exact_test) is based on permutations. With null hypothesis assumed true, one can reshuffle $X$ and $Y$ (i.e. number of calls for white and black sounding names) and the proportion of call backs would be unaffected. The exact p-value would then be the probability of observing as extreme or more extreme distributions from what is being observed:

In [47]:
contingency_table = [[calls_white, n_white-calls_white],[calls_black, n_black-calls_black]]
oddsratio, pvalue = stats.fisher_exact(contingency_table)
print("Exact p-value = {:.2e}".format(pvalue))

Exact p-value = 4.76e-05


The exact test also yields a very small p-value, so we reject the null hypothesis. This analysis shows that the name (related to race) is important in the number of call backs.