# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [5]:
import pandas as pd
import numpy as np
from scipy import stats

In [6]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [7]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [8]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## Q1 and Q2

1.A double sample two-tail hypothesis test would work here, seperating the total into two groups based on black and white.  CLT does apply here because this should be a binomial distribution, which CLT would apply to.

2.Null Hypothesis = There is no correlation between race and the likelyhood of getting a call for an interview.
Alternate Hypothesis = There is in fact a correlation between race and receiving a call.

## Q3

In [9]:
w = data[data.race=='w']
b = data[data.race=='b']

In [10]:
#this function will create multiple bootstraps from an array
def draw_bs_reps(data, func, size=1):


    bs_replicates = np.empty(size)
    for i in range(size):
        bs_replicates[i] = func(np.random.choice(data, size=len(data)))

    return bs_replicates

Here we will be testing the null hypothesis, that race has no effect on call rate, by using bootstrap replicates.  We do this by shifting our w and b arrays so that
the means of each are the same as the combined mean of the two arrays, but the variance and standard deviation of the arrays
will remain the same.  We will create a large number of bootstraps assuming all races
get the same number of calls and see how many cases produce more extreme results than the one present in our data.



In [11]:
mean_diff = np.mean(w.call)-np.mean(b.call)
combined_mean = np.mean(np.concatenate((w.call,b.call)))
w_shifted = w.call - np.mean(w.call) + combined_mean
b_shifted = b.call - np.mean(b.call) + combined_mean

bs_replicates_w = draw_bs_reps(w_shifted,np.mean,10000)
bs_replicates_b = draw_bs_reps(b_shifted,np.mean,10000)

bs_diff_replicates = bs_replicates_w - bs_replicates_b

p = np.sum(bs_diff_replicates >= mean_diff) / len(bs_diff_replicates)
print(p)

0.0


A very low p value (or a value of 0) indicates that the hypothesis is false!  Essentially none of our bootstrap replicates produced results as extreme as what we recorded, and thus it seems unlikely that the true average calls recieve is the same among races. If the black and white sounding names really did recieve the same number of call backs, we shouldn't be seeing such an extreme result from our sampled data.  It is likely that race does in fact effect the likelyhood of getting called in for an interview.

Now to calculate a 95% confidence interval as well as the margin of error using bootstraps.  We will make bootstraps of the white and black call results (not adjusted this time) and check manually where 95% of the results fall.

In [12]:
bs_replicates_w_unshifted = draw_bs_reps(w.call,np.mean,10000)
conf_int_w = np.percentile(bs_replicates_w_unshifted,[2.5,97.5])

bs_replicates_b_unshifted = draw_bs_reps(b.call,np.mean,10000)
conf_int_b = np.percentile(bs_replicates_b_unshifted,[2.5,97.5])

print(conf_int_w)
print(conf_int_b)

[ 0.08501027  0.10882957]
[ 0.05462012  0.07433265]


So our 95% confidence intervals, by bootstrap, are 8.5% to 10.8% of whites receiving calls, while 5.5% to 7.5% of blacks recieve calls.  The margin of error for each of these would be one half of the difference between the max and min of the confidence interval.  Noticeably these ranges have no overlap. Below here we quickly calculate the margin of error for our bootstraping confidence intervals.

In [13]:
w_moe = (10.8 - 8.5) / 2
b_moe = (7.5 - 5.5) / 2

print(w_moe)
print(b_moe)

1.1500000000000004
1.0


Now we will solve for the p-value, 95% confidence interval, and margin of error using a frequentist statistical approach. To do this we will do the sample mean plus or minus (t value * sample std /sqrt(sample size).  Since our sample size is 2345, our critical t value is 1.962

In [14]:
#95% confidence interval for w
w_sample_avg = np.mean(w.call)
w_sample_moe = 1.962 * np.std(w.call) / np.sqrt(2345)
print(w_sample_avg)
print(w_sample_moe)

#95% confidence interval for b
b_sample_avg = np.mean(b.call)
b_sample_moe = 1.962 * np.std(b.call) / np.sqrt(2345)
print(b_sample_avg)
print(b_sample_moe)

0.09650924056768417
0.0119637897005
0.0644763857126236
0.00995071305758


So our 95% confidence interval for w is 9.65% plus or minus 1.20% chance of receiving a call back, so between 8.45% and 10.85%.
Our 95% confidence interval for b is 6.45% plus or minus 1.00%, so between 5.45% and 7.45%.  Just as with our bootstrap sample these ranges do not intersect.  These ranges ultimately are very close to those we recieved in our bootstrap test, which is an encouraging result.

For computing p, we compare our confidence intervals to our expected value.  Our null hypothesis, that there is no difference, would mean the total average call success rate is expected for w or b, regardless of race.  The total average is 8% callback success rate.  Seeing as both 95% confidence intervals do not contain 8% at all, p would be less than 0.05 for both white and black resumes.  It is less than 0.05 (rather than just 0) because our intervals are only 95% confident.

## Q4

These results certainly seem signifigant given the problem.  After what we have calculated we can be 95% confident that those resumes with black sounding names were much less likely, on average, to get a call for an interview.  The confidence intervals we calculated for both races did not even overlap, so it seem's extremely unlikely that our sample results were obtained by chance. Rather, it appears race did have a noticeable effect on one's chances of being called back in our sample population.

## Q5

This most certainly does NOT mean race is the single most important factor in callback success.  Just because we discovered one feature of an application that effects call back rate, it does not mean there are not others with equal or greater impact.  In order to determine what the most impactful feature is, we would need to do similar statistical testing on many other factors of these applications.  Only after considering many other possiblities could we determine what most greatly determines callback success.