# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [155]:
import pandas as pd
import numpy as np
from scipy import stats

In [156]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [157]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [158]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

## 1) What test is appropriate for this problem? Does CLT apply?
We want to find out if there is a statistically significant difference between the proportion of 'b' and 'w' resumes that received a call from employers. First we want to double check the conditions for if the Central Limit Theorem applies, which are:
* Randomness
* Independence
* Normality

It is explicitly given to us that the resumes are randomly assigned black or white sounding names, so the first condition holds up. For independence we just need to make sure our sample size is less than 10% of the population size, which it surely is. For normality, it is enough that we have at least 10 successes and failures when testing proportions, which we'll show below.

In [159]:
w = data[data.race=='w']
b = data[data.race=='b']

# get proportion of resumes that got called back for both 'b' and 'w'
sw = len(w[w.call == 1])
sb = len(b[b.call == 1])

# print number of successes 
print("calls:   ", "w -", sw, " b -", sb)
# print number of failures
print("no calls:", "w -", len(w[w.call ==0]), "b -", len(b[b.call == 0]))

calls:    w - 235  b - 157
no calls: w - 2200 b - 2278


Since there are at least 10 successes and failures (calls/no calls) for both 'b' and 'w' resumes, the CLT holds. We will use t-testing to test whether or not racial discrimination plays a role in determining if a resume gets a call from an employer.
## 2) What are the null and alternate hypotheses?
**Null Hypothesis:** the proportion of 'w' called back equals the proportion of 'b' called back (p1 - p2 = 0)

**Alternative Hypothesis:** the proportion of 'w' called back does *not* equal the proportion of 'b' called back (p1 - p2 =/= 0)

We will select a significance level of alpha = 0.05
## 3) Compute margin of error, confidence interval, and p-value. 
Try using both the bootstrapping and the frequentist statistical approaches.

In [160]:
# get the length of each sample
nb = len(b)
nw = len(w)

# get the proportion of successes in each sample
pb = sb / nb
pw = sw / nw

# calulate the differenec in proportion across groups
diff = pw - pb

# calculate the std deviation
std = np.sqrt((pb * (1 - pb) / nb) + (pw * (1 - pw) / nw))

# calculate the margin of error and find the 95% confidence interval
margin = 1.96 * std
ci = [diff - margin, diff + margin]

# calculate the p-value
pvalue = stats.norm.sf(diff / std) * 2
# pvalue = stats.ttest_ind(w.call, b.call)

print("The margin of error is: %.5f" % margin)
print("The 95% confidence interval is: ", ci)
print("The p-value is: %.10f" % pvalue)

The margin of error is: 0.01526
The 95% confidence interval is:  [0.016777447859559147, 0.047288260559332024]
The p-value is: 0.0000386257


In [161]:
# bootstrap method
size = 10000

#initialize empty array
bs_reps = np.empty(size)

np.random.seed(12345)

for i in range(size):
    # permute order of the data
    perm = np.random.permutation(data.call.values)
    # take first half as sample for w, second as sample for b
    sample_w = perm[:nw]
    sample_b = perm[nw:]
    #find the difference in sample means, append to the array bs_reps
    bs_reps[i] = np.abs(np.mean(sample_w) - np.mean(sample_b))
    
bs_pvalue = np.sum( bs_reps >= diff) / size
print("Bootstrap p-value is: %.10f" % bs_pvalue)

Bootstrap p-value is: 0.0000000000


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

## 4) Write a story describing the statistical significance in the context or the original problem.
In both the statistical and bootstrap methods, our p-values were less than our alpha. Therefore we rejected the null hypothesis that the proportion of call backs for each group were the same, in favor of the alternative hypothesis that the proportion of call backs for the two groups were *not* the same. In other words, there is a statistically significant difference between the call back rates for resumes that have white or black sounding names. And thus we conclude that race may have played a role in determining the rate of call backs for resumes. On average, resumes with black sounding names received call backs at lower rates than resumes with white sounding names.

## 5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Our analysis does not mean that race was *the most important* factor in call-back rates. Only that race may have played an important role. The fact that the names were assigned randomly should account for other confounding variables which may have influence as well, such as education level, experience, gender, age, or geographic location. A future analysis may need to investigate the role these factors (and possibly others) may have played in influencing call back rates, in order to get a better idea of which ones had a stronger influence (or possibly *the strongest influence*) on call-back rates.