# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
# number of unique categories in the dataset 
len(list(data))

65

In [5]:
# new dataframe with just the race and call columns
df = data[['race', 'call']]
df.head()

Unnamed: 0,race,call
0,w,0.0
1,w,0.0
2,b,0.0
3,b,0.0
4,w,0.0


In [6]:
# number of white-sounding names 
num_white_names = len(df[df.race == 'w'])

# number of black-sounding names 
num_black_names = len(df[df.race == 'b'])

print("Number of white-sounding names: ", num_white_names)
print("Number of black-sounding names: ", num_black_names)

Number of white-sounding names:  2435
Number of black-sounding names:  2435


### Question 1: What test is appropriate for this problem? Does CLT apply?

We are comparing the rate of callbacks between names that are black-sounding versus white-sounding. For this problem, we should compare the mean rate of callbacks between samples: $\overline{X}_{whitecallbacks}$ and $\overline{X}_{blackcallbacks}$

The population standard deviation is not known, so we cannot use a Z-statistic; we will use a 2-sample T-test since the white-sounding and black-sounding names are 2 independent groups. The Central Limit Theorm does apply in this case because our sample sizes are above 30 (n > 30). 

### Question 2: What are the null and alternate hypotheses?

$H_0: \overline{X}_{whitecallbacks} = \overline{X}_{blackcallbacks}$

Race does not have a statistically significant impact on employer callbacks

$H_A: \overline{X}_{whitecallbacks} ≠ \overline{X}_{blackcallbacks}$ 

Race does have a statistically significant impact on employer callbacks

### Question 3: Compute margin of error, confidence interval, and p-value.

In order to find the margin of error, we must first find the standard error. The forumlas are: 

$Standard\;Error = \frac{\sigma}{\sqrt{n}}$

$Standard\;Error_p = \sqrt{\frac{p(1-p)}{n}}$

$Standard\;Error_{variance} = \sqrt{\frac{variance}{n}}$

$Margin\;of\;Error = T_{critical} * Standard\;Error$

$T_{critical} = \pm 1.96$

(T-critical value based on a 95% CI)

In [7]:
# finding the standard error & the margin of error  

# create 2 separate dataframes for white-sounding and black-sounding callbacks
df_w = df[df.race == 'w']
df_b = df[df.race == 'b']

# find the standard deviations of the callbacks 
w_std = df_w.call.std()
b_std = df_b.call.std()

# find the number of calls made 
w_calls = len(df_w.call)
b_calls = len(df_b.call)

# calculate the standard error of each 
se_w = w_std**2 / (w_calls)
se_b = b_std**2 / (b_calls)

# calculate the standard error 
se_total = np.sqrt(se_w + se_b)

# calculate the margin of error based on 95% confidence interval 
upper = se_total * 1.96
lower = se_total * -1.96

print("The upper and lower bounds of the 95% confidence interval are: ", [lower, upper])

The upper and lower bounds of the 95% confidence interval are:  [-0.015258417562835034, 0.015258417562835034]


In [8]:
# calculate the difference of means

# get the callback means for white-sounding and black-sounding names
w_mean = df_w.call.mean()
b_mean = df_b.call.mean()

# calculate the difference of the means 
diff_mean = w_mean - b_mean
diff_mean

0.03203285485506058

In [9]:
# calculate the t-statistic

t_stat = diff_mean / se_total
t_stat

4.1147383244277469

In [10]:
# calculate the p-value 

import scipy.stats as st

# we find the cumulative density of the t-statistic
cdf_t = st.norm.cdf(t_stat)

# we can calculate the p-value by subtracting the cdf from 1 
pval = 1 - cdf_t
pval

1.9380944477398465e-05

In [15]:
# Checking our work with SciPy Stats

stats.ttest_ind(df_w.call, df_b.call)

4.1147052908617514

The difference of the means falls outside of the 95% CI, which means that the difference is likely NOT due to chance and is statistically significant. 

In [22]:
# Computing the margin of error with the T-value from SciPy Stats

new_me = stats.ttest_ind(df_w.call, df_b.call)[0] * se_total

# The result is a 2-tailed margin of error result
print('The Margin of Error: ', new_me)

The Margin of Error:  0.0320325976918


0.0077849069198137931

This new computed margin of error compares favorably with our sample difference of mean. It shows that even accounting for a ~3.2% error in our sample difference of the means, we can be confident that our sample data is statistically significant and does not misrepresent the population.

### Question 4: Write a story describing the statistical significance in the context or the original problem.

From the data, there is a statistical significance between the mean callbacks of white-sounding names and black-sounding names. The difference of the means shows that it falls outside the 95% CI, meaning that the difference is likely NOT due to chance. More illuminating however is our t-statistic, which reveals that white sounding names receive callbacks at a rate of more than 4 standard deviations higher than black sounding names. That's quite startling. 

### Question 5: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

No. We cannot be for certain that race is the most important factor because we have not examined the impact of the other factors on race. For example, prior work experience, education, personal connections, and other attributes may causally influence callback success. I would account for these other factors by examining the effects of each on callback success. Ideally, we would have a randomized test where the candidates for jobs are equivalent in every respect EXCEPT for their names; this would isolate the way names sound as the primary variable which we are testing.