# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [2]:
import pandas as pd
import numpy as np
import math
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
print(sum(data[data.race=='b'].call))

157.0


In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


We will work with a simplified dataset keeping the needed columns

In [6]:
dataRaceCall = data[['race','call']]
dataRaceCall_w = dataRaceCall[dataRaceCall.race=='w']
dataRaceCall_b = dataRaceCall[dataRaceCall.race=='b']

print(len(dataRaceCall_b))
print(len(dataRaceCall_w))

2435
2435


We can use a binomial distributions. 
Let pw be the sample probability of sucess (being called back) for a white person and let call pb for a black person.
Calculate these probality here after

In [7]:
pw = len(dataRaceCall_w[dataRaceCall_w.call==1])/len(dataRaceCall_w)
pb = len(dataRaceCall_b[dataRaceCall_b.call==1])/len(dataRaceCall_b)
print (pw)
print (pb)

0.09650924024640657
0.06447638603696099


The binomilal distribution could be approximated by a normal distribution when 
n = number of sample and p = the probability

np > 5 and n(1 - p) > 5.

We will calculate this values for the white and black population for this exercice.

In [8]:
print(len(dataRaceCall_w)*pw > 5)
print(len(dataRaceCall_w)*(1-pw) >5)

print(len(dataRaceCall_b)*pb >5)
print(len(dataRaceCall_b)*(1-pb) >5)

True
True
True
True


Question 1) With these values we can conclude the CLT could be applied

Question 2) What are the null and alternate hypothesis?

The null hypothesis is that race for this data's has no impact on the whether or not an applicant is called back. 
The alternative hypothesis is the opposite:

H0: pB = pW

H1: pB <> pW

We want a 95% confidence interval, with this value the test statistics is 1.96. 

The margin of error in this case is given by:

In [9]:
zvalue = 1.96
errmargin = zvalue*math.sqrt(pw*(1-pw)/len(dataRaceCall_w)+pb*(1-pb)/len(dataRaceCall_b))
print('Error margin is :',errmargin)

Error margin is : 0.015255406349886438


The confidence interval is:

pw - pb  +/- (margin error)

In [10]:
print([pw-pb-errmargin, pw-pb+errmargin])

[0.016777447859559147, 0.047288260559332024]


In [11]:
from statsmodels.stats.proportion import proportions_ztest as pz

callw = len(dataRaceCall_w[dataRaceCall_w.call==1])
callb = len(dataRaceCall_b[dataRaceCall_b.call==1])

print(pz(np.array([callw,callb]),np.array([len(dataRaceCall_w),len(dataRaceCall_b)]),value=0))

(4.1084121524343464, 3.9838868375850767e-05)


The second value represents the pvalue and it's very small so we can reject the null hypothesis.

# Question 4 & 5) Because the pvalue is small we can reject the null hypothesis, this means that race does have an impact on the rate at which applicants are accepted for interviews by employers