# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

# Q1

In [31]:
w = data[data.race=='w']
b = data[data.race=='b']
print('sizes are equal and are: ' + str(len(w)))
print('# of call backs for w: '+ str(w.call.sum()))
print('# of call backs for b: '+ str(b.call.sum()))

sizes are equal and are: 2435
# of call backs for w: 235.0
# of call backs for b: 157.0


since the number of successes and failures for both white and black sounding names are significantly greater than 10, we satisfy that requirement for CLT. 

Further, the results are independent, since 2435 people is certainly less than the entire population of white and black citizens. Actually I'm not quite sure about this since we are not aware of the number of people applying for each job and how many of those individuals come from this study. So I am slightly uncertain about independence.

As for random, we assume this is this case since the b & w values are asigned randomly. 

The type of test which should be done would be a two-sample bootstrap hypothesis test for difference of proportions. Where the proportion is the proportion of successfull call backs. This test is done since we have two samples which we want to determine if they have the same proportion, but not necessarily the same distribution. It should be noted that since the mean is equal to the proportion of successes we are able to just interchange the two.

# Q2

$H_0$: the proportion of successfull callbacks between black and white sounding names are equal

$H_A$: the proportion of successfull of callbacks between black and white sounding names are not equal


<div class="span5 alert alert-success">
<p> Your answer to Q3 here </p>
</div>

# Q3

In [78]:
def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)
    # Generate replicates
    for i in range(size):
        bs_replicates[i] = func(np.random.choice(data, size=len(data)))
    return bs_replicates

mean_both = np.mean(data.call)
empirical_diff_means = w.call.mean() - b.call.mean()

# Generate shifted arrays
w_shift = w.call - np.mean(w.call) + mean_both
b_shift = b.call - np.mean(b.call) + mean_both

# Compute 10,000 bootstrap replicates from shifted arrays
bs_replicates_w = draw_bs_reps(w_shift, np.mean, 100000)
bs_replicates_b = draw_bs_reps(b_shift, np.mean, 100000)

# Get replicates of difference of means: bs_replicates
bs_replicates = bs_replicates_w - bs_replicates_b

# Compute and print p-value: p
p = float(np.sum(bs_replicates >= empirical_diff_means)) / len(bs_replicates)
print('p-value = ' + str(p))
conf_interval = np.percentile(bs_replicates, [2.5, 97.5])
print('95% confidence interval for the difference of means is: '+ str(conf_interval))

p-value = 1e-05
95% confidence interval for the difference of means is: [-0.0151951   0.01519505]


The p-value is much smaller than .05, thus we reject the null hypothesis. It is extremely unlikely that the difference between the number of call-backs is due to chance. Doing this with the frequentist approach we obtain:

In [83]:
# Sample means and standard deviations
w_mean = w.call.mean()
b_mean = b.call.mean()
w_std = w.call.std()
b_std = b.call.std()
n = len(w.call)
# 
diff_mean = w_mean - b_mean
diff_std = np.sqrt((w_std**2)/n + (b_std**2)/n)
t = diff_mean/diff_std
p = stats.t.sf(np.abs(t), n-1)
conf_interval = [-t*diff_std, t*diff_std]
print('p-value is: '+ str(p))
print('95% confidence interval for the difference of means is: '+ str(conf_interval))

p-value is: 2.00252681786e-05
95% confidence interval for the difference of means is: [-0.032032854855060577, 0.032032854855060577]


again we reject the null hypothesis.

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

# Q4

Racial bias is an issue which occurs in many societies and under many different circumstances. The purpose of this study was to examine whether there is a racial bias at play when obtaining a job. Specifically, do employers have a preference for white or black sounding names? To investigate this, the difference between the mean number of call-backs for both white and black sounding was compared. The results indicate that we are 95% confident there is a difference between the number of call-backs for white and black sounding names. Further investigative work can be done to determine which of these choices gives a more positive result in number of call-backs.  

# Q5

The results above do not indicate the significance of race as compared to other variables. The only statement we can make about our results is that there is a difference between white and black sounding names in terms of number of call backs. However, we have not accounted for other variables and a proper multivariate analysis should be done to determine the causes for the difference, and other variables which could cause a difference.

Given the dataset we have, we could run analysis on the other variables. This would give us a better picture of what variables play the largest role in determining whether an individual gets a call-back or not.