# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [36]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### What test is appropriate for this problem? Does CLT apply?

To determine this we will need to compute the number of observations in our sample.

In [46]:
N = len(data)
print(N)

4870


Since we are comparing the call back rate for individuals with white sounding names versus black sounding names, it will also be useful to compare the number of observations which fall into these two categories

In [9]:
# We first split the data into two data frames, seperated by race
b_name = data[data.race=='b']
w_name = data[data.race=='w']

# We then compute the lengths
print("Length of black sounding names: "+str(len(b_name)))
print("Length of white sounding names: "+str(len(w_name)))

Length of black sounding names: 2435
Length of white sounding names: 2435


Since the samples are independent, identically distributed, and we have a sufficient number of samples (general rule of thumb is over 30), we cna say that the central limit theorum applies. Thus we can say that their distribution is approximatly normal.

### What are the null and alternate hypotheses?

We have split our origional sample in to two samples, the first sample which contains the samples with white sounding names, the second contains the sample with black sounding names. The hypothesis that we want to test is that there is no statistically significant difference between the call back rate between the two groups, this is our null hypothesis. Conversly the alternative hypothesis would assert that there is a statistically significant difference in the call back rate between the two groups.

### Compute margin of error, confidence interval, and p-value.

The formula for our margin of error is as follows:

$Z * ((P1(1-P1)/N1)+(P2(1-P2)/N2))^(1/2)$

Where Z is the test statistic of the confidence interval (in our case we will use 19.6 which corresponds to 95% confidence for a two tailed test). P is the probability of success for our group, and N is the number of samples

In [16]:
# We first need to calculate the proportion of success for the two groups
b_success = len(b_name[b_name.call==1])/len(b_name)
w_success = len(w_name[w_name.call==1])/len(w_name)

# We cna then calculate the margin of error
ME = 1.96*((b_success*(1-b_success)/len(b_name)+w_success*(1-w_success)/len(w_name))**(1/2))
print(ME)

0.015255406349886438


The formula for our confidence interval is:

$P1 - P2 \pm (ME)$

Where P is the proportion of success and ME is the margin of error

In [18]:
print('Lower bound of our confidence interval is '+str(w_success - b_success-1.96*ME))
print('Upper bound of our confidence interval is '+str(w_success - b_success+1.96*ME))

Lower bound of our confidence interval is 0.0021322577636681654
Upper bound of our confidence interval is 0.061933450655223


Finally to calculate the p-value we will use a proportions z-test from stats models

In [47]:
proportions_ztest(np.array([len(b_name[b_name.call==1]), len(w_name[w_name.call==1])]), np.array([N, N]), value=0)

(-4.0213483517778181, 5.7865956145353264e-05)

### Write a story describing the statistical significance in the context or the original problem.

From the p-value which we have computed, we reject the null hypothesis in favour of the alternative hypothesis, which asserts that there is a statistically significant difference in the call back rate of applicants with black sounding names compared to white sounding names. 

In [44]:
print('Black success rate: '+str(b_success))
print('White success rate: '+str(w_success))

Black success rate: 0.06447638603696099
White success rate: 0.09650924024640657


Looking at the proportions this seems obvious. The call back rate of black call backs is 6.45% where as the white call back rate is 9.65%, an applicant with a white sounding name is 50% more likely to be called back. 

### Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

We have not done enough analysis to conclude this, at this point all we can say is that it is a factor. We would need to analyse all factors in the data set to be able to see the relative importance of race.

Another interesting thing to note is that we have shown that there is a correlation between a persons race and the liklehood of them being called back. However this does not imply causation, i.e. we have not proven that being white increases your chance of a call back. For this to happen we would need to do further research, however since the disparity between white and black call back rates are so large, it would be an important topic to research.