# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [5]:
# total number of white and black sounding names
data.groupby('race').race.count()

race
b    2435
w    2435
Name: race, dtype: int64

In [6]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit



1. Since the names are assigned randomly to resumes, we can treat the data as independent and identically distributed and apply the CLT.  The appropriate test is the binomial test for the difference of two proportions, which in this case can be done with a normal approximation, because the data are plentiful.  Thus the test statistic is the Z statistic calculated by dividing the empirical difference of proportions by its estimated standard error.


2. The null hypothesis is pb=pw, where pb is the proportion of "black" resumes getting callbacks and pw is the proportion of white resumes.  The alternative hypothesis could be either pb<pw or pb =/= pw, depending on whether one wants to allow for the possibility of disrcimnation against white-sounding names.  The latter is more conservative, but it doesn't seem realistic, so I'll test against the pb<pw.


In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

In [8]:
np.mean(b.call), np.mean(w.call)

(0.0644763857126236, 0.09650924056768417)

In [9]:
# Bootstrap test of pb=pw against pb<pw
np.random.seed(2)
empirical_diff = np.mean(b.call) - np.mean(w.call)
nreps = 20000
diffs = np.empty(nreps)
for i in range(nreps):
    pbhat = np.mean(np.random.choice(data.call, size=len(b)))
    pwhat = np.mean(np.random.choice(data.call, size=len(w)))
    diffs[i] = pbhat - pwhat
len(diffs[diffs<empirical_diff])/nreps, # p-value

(5e-05,)

In [10]:
# Bootstrap confidence interval
np.random.seed(3)
nreps = 20000
diffs = np.empty(nreps)
for i in range(nreps):
    pbhat = np.mean(np.random.choice(b.call, size=len(b)))
    pwhat = np.mean(np.random.choice(w.call, size=len(w)))
    diffs[i] = pbhat - pwhat
np.percentile(diffs, [2.5, 97.5]) # 95% confidence interval

array([-0.0476386 , -0.01683778])

In [11]:
# Empirical difference with 95% bootstrap margin of error
empirical_diff, np.percentile(diffs, [2.5, 97.5]) - empirical_diff

(-0.03203285485506058, array([-0.01560575,  0.01519507]))

In [12]:
# Based on the bootstrap method, the margin of error is about 1.5%, compared to
# an empirical difference of 3.2%.  (The confidence interval is two-sided, so it
# is more "conservative" than the one-tail hypothesis test, in the sense that
# the confidence interval allows that the difference might not be very far from zero, 
# whereas the test conclusively rejected the hypothesis that it was equal to zero.)

In [13]:
# Frequentist test of pb=pa against pb<pa
phat = np.mean(data.call)
sehat = np.sqrt(phat * (1-phat) * (1/len(b)+1/len(w)))
z = empirical_diff / sehat
z, stats.norm.cdf(z) # p-value for one-tail significance test

(-4.108412148525734, 1.991943452497468e-05)

In [14]:
# Frequentist 95% confidence interval
stats.norm.isf([.025,.975], empirical_diff, sehat)

array([-0.01675122, -0.04731449])

In [15]:
# Empirical difference with 95% margin of error
empirical_diff, stats.norm.isf(.975, empirical_diff, sehat) - empirical_diff

(-0.03203285485506058, -0.015281631824705544)


4. Racial discrimination in employment applications—or perhaps should we call it "name discrimination", discrimination against people with black-sounding names—is a real problem.  At least that's what these data show.  4870 resumes were randomly assigned black-sounding or white-sounding names.  Almost 9.7% of the ones with white-sounding names were called back, but less than 6.5% of the ones with black sounding names.  If there were no discrimination, the chance of such an extreme difference for black resumes would be only about 0.00002%, or 1 in 50,000.  The raw difference in callbacks between black and white resumes was 3.2%, with a statistical margin of error of about 1.5%.


5. This analysis does not necessarily mean that race is the most important factor in callback success. In fact it tells us nothing directly about the *relative* importance of race in comparison to other factors, because we haven't looked at how important any of the other factors are. We could do similar tests for other individual factors and compare them with race, although there are two problems with that approach. First, the other factors weren't randomly assigned, so the statistical tests might be suspect, depending on the characteritics of the data and the method of analysis. Second, various factors might interact with one another. To allow for such interactions, we could use regression analysis.
