# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
np.random.seed(17)

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


#### Question 1
If we are only interested in whether or not particular sounding names have an impact on the request for interviews, we have a couple of options we can think about using. This first would be a chi-squared test that would examine whether or not there are significant differences in the proportions between the two groups. We could also perform a bootstrap version in which we compare the distributions of the proportions against permutation samples. This would mean that the distributions would represent the distribution of a particular race group receiving a folllow up interview. We could also perform independent samples t-test to measure the proportions of each race group, or we could find the z-score of in the difference of proportions between the two race groups. If we pursued the difference in proportion z-score, we could also run bootstrap samples to find the true distribution of that difference and figure out where our difference resides in that distribution. If we wished to involve the covariates that are also present in the data set, we could provide a more comprehensive relationship between the covariates, the predictor, and the outcome variables. If we adopted a more research-oriented causal framework, we would want to test the relationship between each covariate on both the predictor and outcome variables to uncover any possible confounding variables. We could then utilize a logistic regression for the variables and hopefully provide a causal understanding between race and job application success. On the other hand, if we adopted a more predictive method, in which we would like to predict which individual would receive an interview and compare the importance of racial group to the other features, we could some machine learning algorithm like a random forest and extract the important features.

The central limit theorem does apply. We can either think of the distribution as the proportion for any individual regardless of racial background to continue to the next stage in hiring process, we can think of the distribution for the proportions of the white or black sounding names, or we could think about the difference in the proportion between the two racial groups. Any one of these distributions will abide by the central limit theorem, that any sample from these distributions will begin to take on a normal distribution as the sample size approaches infinity with an approximation of the population test statistic that was used and an approximation of the population standard deviation.


#### Question 2
Ho: The proportion for white sounding names who receive request for an interview is equal to the proportion of black sounding names who receive request for an interview( P(I| W-name) = P(I|B-name) ) 

Ha: The proportion for white sounding names who receive request for an interview is NOT equal to the proportion of black sounding names who receive request for an interview( P(I| W-name) =/= P(I|B-name) ) 

In [5]:
w = data[data.race=='w']
b = data[data.race=='b']

#### Question 3
##### Part A
###### Bootstrap


In [6]:
white_callback = np.mean(w.call)
black_callback = np.mean(b.call)
dif_callback = white_callback - black_callback
replicates = np.empty(100000)
whites = np.empty(100000)
blacks = np.empty(100000)
for i in range(100000):
    samples = np.random.permutation(data.call)
    white = np.random.choice(w.call, size = len(w))
    black = np.random.choice(b.call, size = len(b))
    perm_white = samples[:len(w)]
    perm_black = samples[len(w):]
    replicates[i] = np.mean(perm_white) - np.mean(perm_black)
    whites[i] = np.mean(white)
    blacks[i] = np.mean(black)
pvalue = np.sum(replicates >= dif_callback)/len(replicates)
black_ci = np.percentile(blacks, [2.5, 97.5])
white_ci = np.percentile(whites, [2.5, 97.5])
white_moe = white_callback - white_ci[0]
black_moe = black_callback - black_ci[0]
print("black confidence interval & margin of error: " + str(black_ci) + " , " +  str(black_moe))
print("white confidence interval & margin of error: " + str(white_ci) + " , " +  str(white_moe))
print("pvalue for differences in proportinos: " + str(pvalue))


black confidence interval & margin of error: [0.0550308  0.07433265] , 0.009445585310459137
white confidence interval & margin of error: [0.08501027 0.10841889] , 0.011498972773551941
pvalue for differences in proportinos: 1e-05


We have a probability of 1 in 100,000 of getting the observed difference in proportions if the null hypothesis, that the proportions between racial groups are the same, is true. We have reason to reject the null hypothesis and have evidence to support that white sounding names get, on average, more call backs than black sounding names. If we look at the confidence intervals for each racial group, we see no overlap, which is another way of demonstrating statistical significance at an alpha level of 0.05 since the confidence intervals are 95% confidence intervals. These are saying that there is a 95% chance that the true proportions for racial groups are between the bracketed values. Since the values do not overlap, we are led to believe that the true proportions are not equivalents, and otherwise, rejecting our null hypothesis. 

#### Question 3
##### Part B
###### Frequentist

In [7]:
w_se = np.sqrt((white_callback*(1-white_callback))/len(w))
w_moe = w_se * 1.96
b_se = np.sqrt((black_callback*(1-black_callback))/len(b))
b_moe = b_se* 1.96
overall_prop = (np.sum(w.call) + np.sum(b.call))/(len(w)+len(b))
zscore = (dif_callback - 0)/np.sqrt(overall_prop*(1-overall_prop)*((1/len(w))+(1/len(b))))
print(zscore)
pvalue = stats.norm.sf(abs(zscore))
print(pvalue)
upper = (white_callback - black_callback) + 1.96*(np.sqrt((white_callback*(1-white_callback)/len(w))+(black_callback*(1-black_callback))/len(b)))
lower = (white_callback - black_callback) - 1.96*(np.sqrt((white_callback*(1-white_callback)/len(w))+(black_callback*(1-black_callback))/len(b)))
print("proportion difference conf interval: " + str(lower) + " , " +  str(upper))
se_dif = np.sqrt(w_se**2 + b_se**2)
moe = se_dif * 1.96
print('proportion difference margin of error: ' + str(moe))



4.108412235238472
1.991942704752209e-05
proportion difference conf interval: 0.016777448506376254 , 0.0472882612037449
proportion difference margin of error: 0.015255406348684322


When we utilize the frequentist approach, we decide to calculate the confidence interval and margin of error of the difference in proportions as opposed to the individual racial groups. This time we should see a 0 value in the confidence interval if we wish to fail to reject the null hypothesis. However, we see small values but values that indicate the true mean difference in the proportions to be between those two values with 95% confidence. Once more, we have a probability of about 1 in 100,000 of observing our results and thus have evidence to reject the null hypothesis. 

In [8]:
#t-statistic - confidence interval of the proportions
#(t, p) = stats.ttest_ind(w.call, b.call, equal_var=False)
#print(t, p)
#w_upper = white_callback + w_moe
#w_lower = white_callback - w_moe
#b_upper = black_callback + b_moe
#b_lower = black_callback - b_moe
#print('white name conf interval: ' + str(w_lower) + " , " + str(w_upper))
#print('white margin of error: ' + str(w_moe))
#print('black name conf interval: ' + str(b_lower) + " , " + str(b_upper))
#print('black margin of error: ' + str(b_moe))

#### Question 4
We see, through multiple methods, that black sounding names are less likely to receive a call back for an interview. We believe that the true proportion of black sounding names that receive interview callbacks to be between about 5.5% and 7.4%. This is different from white sounding names that we believe to have a true proportion of interview call backs to be between 8.5% and 10.8%. We want to be careful to point to any causal factors too early. However, the statistical significance indicates that we would only see a proportion difference of this magnitude 1 in 100,000 if the null hypothesis that the true proportion difference is 0 is true. Since this event seems rare, we reject the null hypothesis and believe there to be some difference between these two groups.

#### Question 5
Now, to further explain these differences, we need further examine any confounding covariates that might better explain this relationship. For now, we are assuming all covariates are randomly distributed between white and black sounding names, which leads us to believe that a white or black sounding names informs a hiring manages first impression and subsequent judgements. If the covariates are truly randomly distributed, then we might be able to take our statistical significance one step further into a causal explanation. Since we assume that years of experiences or educational attainment are equally distributed, we should see equal levels of hiring based upon these equivalent levels of merit, and since we don't see equal levels of hiring under the assumption of randomly distributed covariates, then we might think this permeates all industries and all locations. Our analysis does not mean that race or name is the most important feature in receiving a call back. There are hundreds of other covariates that could better explain why a person receives a call back such as that years of experience or highest level of education. We should even further understand as to whether or not these proportions are equivalent across all industries or geographies, or if there are particular outliers that are affecting this data. We are simply saying that if those other covariates are held at a constant, we would expect to see on average, a higher proportion of white sounding names receive interview call backs. With 95% confidence, this true proportion difference between 1.6 and 4.7 is also practically significant since it would be extremely unfair for any individual to not be considered based upon their merits any percent of the time.
 