# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>




Q1 - What test is appropriate for this problem?

Let''s also calculate the proportion $\hat{p}_b$ of black sounding names getting a callback and the proportion $\hat{p}_w$ white sounding names getting a call back.

Since, all the values are above 10, we can use the normal distribution to model differences between proportions.

The sample size is large, with enough successes and failures that the central limit theorem (CLT) will apply.

Q2 - What are the null and alternate hypotheses?

Now that we have all the required data, we formulate the null and alternate hypotheses.

$H_0\:is \: p_b = p_w\\
H_A \:is \: p_b \neq p_w$

The Standard Error for the sample statistic is given by $\sqrt{\frac{\hat{p}_b(1-\hat{p}_b)}{n_b} + \frac{\hat{p}_w(1-\hat{p}_w)}{n_w}} $

We can use the z-statistic to place a confidence interval on this sample statistic.Hence, the margin of error is  $Z_{\alpha/2} * SE$. For a 95% confidence interval, the z-value is 1.96.

The confidence interval, subsequently, is $\hat{p}_b - \hat{p}_w \pm {Z_{\alpha/2} * SE}$

The null hypothesis is that the callback rate for blacks $p_b$ equals the callback rate for whites $p_w$. The alterative hypothesis is that $p_b$ < $p_w$.
Null Hypothesis: Race has no impact on employment callbacks (black_callrate - white_callrate = 0)

Alternate Hypothesis: Race has an impact on employment callbacks (black_callrate - white_callrate != 0)



In [13]:
w = data[data.race=='w']
b = data[data.race=='b']
#Number of CVs per variable:
b_size=b.race.count()
w_size=w.race.count()

#Number of callbacks per variable:
b_calls = b['call'].sum()
w_calls = w['call'].sum()


#Proportions:
b_callrate = b_calls / b_size
w_callrate = w_calls / w_size

#Pooled proportions:
pooled_callrate = (b_calls + w_calls) / (b_size + w_size)

#Output
print("black: "+str(b_size)+" resumes received "+str(b_calls)+" callbacks, for a rate of "+str(round(b_callrate,4)))
print("white: "+str(w_size)+" resumes received "+str(w_calls)+" callbacks, for a rate of "+str(round(w_callrate,4)))
print("pool: "+str(b_size+w_size)+" resumes received "+str(b_calls+w_calls)+" callbacks, for a rate of "+str(round(pooled_callrate,4)))

# Compute the callback rates for both races
n_b = len(data[data.race=='b'])
n_w = len(data[data.race=='w'])
c_b = sum(data[data.race=='b'].call)
c_w = sum(data[data.race=='w'].call)
p_b = c_b / n_b #b_callrate
p_w = c_w / n_w #w_callrate
print('Callback rates for blacks & whites are: {:1.3f}, {:1.3f}'.format(p_b,p_w))




black: 2435 resumes received 157.0 callbacks, for a rate of 0.0645
white: 2435 resumes received 235.0 callbacks, for a rate of 0.0965
pool: 4870 resumes received 392.0 callbacks, for a rate of 0.0805
Callback rates for blacks & whites are: 0.064, 0.097


In [14]:
# Your solution to Q3 here
#compute the standard error for each sample
b_std_err = np.sqrt((b_callrate * (1-b_callrate))/b_size)
w_std_err = np.sqrt((w_callrate * (1-w_callrate))/w_size)

#compute the standard error for the proportion
p_std_err = np.sqrt((b_std_err ** 2) + (w_std_err ** 2))
print('Standard error for the difference between callback rates: {:1.4f}'.format(p_std_err))

#compute the margin of error with 95% Confidence Interval
critical_value = 1.96
mar_err = critical_value * p_std_err

#compute the confidence interval
ci_min = (w_callrate - b_callrate) - mar_err
ci_max = (w_callrate - b_callrate) + mar_err

#compute the p-value z-test
test_stat = (w_callrate - b_callrate)/p_std_err
p_val = stats.norm.sf(test_stat)*2


print("margin of error: "+str('%.04f' % mar_err))
print("confidence interval between: "+str('%.04f' % ci_min)+" and "+str('%.04f' % ci_max))
print('The z_stat for the test of equality of callback rates is: {:1.5f}'.format(test_stat))
print("p-value: "+str(('%.08f' % p_val)))

alpha = 0.05
if p_val < alpha:
    print("Null Hypothesis: Race has no impact on employment callbacks (black_callrate - white_callrate = 0")
else:
    print("Alternate Hypothesis: Race has an impact on employment callbacks (black_callrate - white_callrate != 0")


# Aternatively

se=p_std_err
# Compute the standard error for the difference between callback rates
se = np.sqrt((p_b*(1-p_b)/n_b)+(p_w*(1-p_w)/n_w))
print('Standard error for the difference between callback rates: {:1.4f}'.format(se))

# Find 95% and 99% confidence intervals
crit = stats.norm.isf([0.025,0.005])
ci_95 = [p_b-p_w-crit[0]*se, p_b-p_w+crit[0]*se]
print("95% confidence interval: ({:2.3f}, {:2.3f})".format(ci_95[0],ci_95[1]))

ci_99 = [p_b-p_w-crit[1]*se, p_b-p_w+crit[1]*se]
print("99% confidence interval: ({:2.3f}, {:2.3f})".format(ci_99[0],ci_99[1]))


# compute the p-value of one-tail test
z_stat = (p_w - p_b) / se
p_value = stats.norm.sf(z_stat)
print('The z_stat for the test of equality of callback rates is: {:1.5f}'.format(z_stat))
print('The p-value (One Tail) for the test of equality of callback rates is: {:1.5f}'.format(p_value))


z_stat = (p_b - p_w) / se
p_value = stats.norm.cdf(z_stat)
print('The z_stat for the test of equality of callback rates is: {:1.5f}'.format(z_stat))
print('The p-value (One Tail) for the test of equality of callback rates is: {:1.5f}'.format(p_value))


# conduct hypothesis test (the p-value of one-tail test)
alpha = 0.05
if p_value < alpha:
    print("Null Hypothesis: Race has no impact on employment callbacks (black_callrate - white_callrate = 0")
else:
    print("Alternate Hypothesis: Race has an impact on employment callbacks (black_callrate - white_callrate != 0")
#The second value is the p-value and it is much lesser than 0.05. Hence, we can reject the null hypothesis.
    


Standard error for the difference between callback rates: 0.0078
margin of error: 0.0153
confidence interval between: 0.0168 and 0.0473
The z_stat for the test of equality of callback rates is: 4.11555
p-value: 0.00003863
Null Hypothesis: Race has no impact on employment callbacks (black_callrate - white_callrate = 0
Standard error for the difference between callback rates: 0.0078
95% confidence interval: (-0.047, -0.017)
99% confidence interval: (-0.052, -0.012)
The z_stat for the test of equality of callback rates is: 4.11555
The p-value (One Tail) for the test of equality of callback rates is: 0.00002
The z_stat for the test of equality of callback rates is: -4.11555
The p-value (One Tail) for the test of equality of callback rates is: 0.00002
Null Hypothesis: Race has no impact on employment callbacks (black_callrate - white_callrate = 0


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

Q4 - Discuss statistical significance.

The p-value is very small, well below the .05 threshold, indicating that the results are highly significant. 
Furthermore, the confidence interval does not include zero, which in and of itself gives us a significant result 
against the null hypothesis. The evidence of this study suggests that the bias against black applicants is extremely 
unlikely to be the result od chance. 

What does it practically mean to reject the null hypothesis? Our null hypothesis was that the proportion of black sounding 
names getting a call back is equal to the number of white sounding names getting a call back. After analysis, we have 
decided to reject it. This means that, in reality, there is a significant difference in the number of call backs ; white sounding names getting more call backs.    
 
Therefore, racial discrmination in the U.S. labor market still appears to be a continual challenge.

Q5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The analysis shows that racial soundingness of names has a significant effect on callbacks for interviews, but there are also other factors, such as education, years of experience and computer skills, that can affect the callback success. A logistic regression is needed to examine which factors are significant predictors and their effect sizes.
