# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [9]:
import pandas as pd
import numpy as np
import scipy.stats as st

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [22]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [23]:
data.columns

Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout', 'fraccolp', 'linc', 'col', 'expminreq',
       'schoolreq', 'eoe', 'parent_sales', 'parent_emp', 'branch_sales',
       'branch_emp', 'fed', 'fracblack_empzip', 'fracwhite_empzip',
       'lmedhhinc_empzip', 'fracdropout_empzip', 'fraccolp_empzip',
       'linc_empzip', 'manager', 'supervisor', 'secretary', 'offsupport',
       'salesrep', 'retailsales', 'req', 'expreq', 'comreq', 'educreq',
       'compreq', 'orgreq', 'manuf', 'transcom', 'bankreal', 'trade',
       'busservice', 'othservice', 'missind', 'ownership'],
      dtype='object')

__Question 1__:
* The most appropriate test for this problem is to use a 2 sample z-test for proportions.  Because we are dealing with proportions, the Central Limit Theorem (CLT) does not apply to the situation.  However, we can still assume that the sampling distribution of the differences between the two proportions is reasonably normal as there are at least 10 callbacks and 10 non-callbacks in both samples.

__Question 2__:  
* The null hypothesis H_0 is that the true proportion of individuals with white-sounding names who are called back is the same as the true proportion of individuals with black-sounding names who are called back.
* The alternate hypothesis H_A is that the true proportion of individuals with white-sound names who are called back is different than the true proportion of individuals with black-sounding names who are called back.

In [4]:
w = data[data.race=='w']
b = data[data.race=='b']

__Question 3__: Computing margin of error, 99% confidence interval, and p-value using bootstrap methods:

In [5]:
#Calculating useful values from data
prop_w = np.sum(w.call)/len(w.call)
prop_b = np.sum(b.call)/len(b.call)
prop_all = np.sum(data.call)/len(data.call)
observed_difference = prop_w - prop_b

In [14]:
#Bootstrapping 2-sample confidence interval for the difference between white and black sounding names
bs_w = np.random.binomial(size=100000, n=len(w), p=prop_w) / len(w)
bs_b = np.random.binomial(size=100000, n=len(b), p=prop_b) / len(b)
bs_difference = bs_w - bs_b
conf_int = np.percentile(bs_difference, [0.5, 99.5])
moe = (conf_int[1] - conf_int[0])/2

#Bootstrapping 2-sample hypothesis test
bs_w = np.random.binomial(size=100000, n=len(data), p=prop_all) / len(data)
bs_b = np.random.binomial(size=100000, n=len(data), p=prop_all) / len(data)
bs_difference = bs_w - bs_b
p = 2*np.sum(bs_difference >= observed_difference) / len(bs_difference)
print('The margin of error for a 99% confidence interval is {}.\n'.format(moe) +
      'A 99% confidence interval for the difference between the percentage of white-sounding names that were called back and black-sounding names that were called back is {}.\n'.format(conf_int) +
      'P-value is approximately {}.'.format(p))

The margin of error for a 99% confidence interval is 0.01991786447638604.
A 99% confidence interval for the difference between the percentage of white-sounding names that were called back and black-sounding names that were called back is [0.01190965 0.05174538].
P-value is approximately 0.0.


__Question 3 (continued)__: Computing margin of error, 99% confidence interval, and p-value using frequentist statistical methods:

In [19]:
#calculating standard error for the difference in the proportions
sep = np.sqrt(prop_w*(1-prop_w)/len(w) + prop_b*(1-prop_b)/len(b))

#2-sample confidence interval
conf_int = st.norm.interval(alpha=0.99, loc=observed_difference, scale=sep)
moe = (conf_int[1] - conf_int[0])/2

#2-sample hypothesis test
z = observed_difference/np.sqrt(prop_all*(1-prop_all)*(1/len(w) + 1/len(b)))
p = 2*(1-st.norm.cdf(z))

print('The margin of error for a 99% confidence interval is {}.\n'.format(moe) +
      'A 99% confidence interval for the difference between the percentage of white-sounding names that were called back and black-sounding names that were called back is {}.\n'.format(conf_int) +
      'P-value is approximately {}.'.format(p))

The margin of error for a 99% confidence interval is 0.020048634037542576.
A 99% confidence interval for the difference between the percentage of white-sounding names that were called back and black-sounding names that were called back is (0.011984220171903006, 0.05208148824698816).
P-value is approximately 3.983886837577444e-05.


__Question 4__: Write a story describing the statistical significance in the context of the original problem.

The p-value is extremely small at 0.00003983.  This means that there is about a 1 in 25000 chance of seeing a difference between the proportions of callbacks for white and black sounding names as extreme as we saw if race does not have a significant impact.  This highly unlikely, so it is statistically significant evidence that race __does__ play a role in the rate of callbacks.  

However, it is important to remember that a hypothesis can be statistically significant, but that doesn't necessarily mean that the difference is practically significant.  For this, it is best to consider the confidence interval that we calculated earlier.  This suggests that the true difference in the callback rates is very likely between 1.2% and 5.2% in favor of white-sounding names, which can be a substantial amount, especially if you are dealing with a large population!

__Question 5__: Does your analysis mean that race/name is the most important factor in callback success?  Why or why not?  If not, how would you amend your analysis? 

While the hypothesis test and confidence interval provided statistically significant evidence that white-sounding names receive are more likely to receive a callback by about 1.2 to 5.2%, it does not guarantee that this is is the most important factor in callback success.  I would be interested in further analyzing a predictive model for the callback rate which includes race as one of those variables.  In this way, we may be able to determine the importance of that factor.