# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Initial data exploration

In [8]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [4]:
# Filtering dataset to only the data required
df = data[['race', 'call']]
# How many of each type of names do we have?
print("There are {} black-sounding names in this dataset".format(len(df.race=='b')))
print("There are {} white-sounding names in this dataset".format(len(df.race=='w')))

There are 4870 black-sounding names in this dataset
There are 4870 white-sounding names in this dataset


# 1. What test is appropriate for this problem? Does CLT apply?

In this instance, we know that there is a large sample size of white-sounding names as well as black sounding-names, 
as seen above. This suggests that the CLT applies.

Additionally, we would be comparing a metric (call-back rate) between two populations with known
standard deviations, which tells us that the z-test is the appropriate test to use here. 

# 2. Null and alternate hypotheses.

The null hypothesis is: there is no significant difference in the rate of callbacks between white-sounding names and black-sounding names
    

The alternate hypothesis is: There is a significant difference in the rate of callbacks.     

In [11]:
# Success or failures, by race
df.groupby(['race', 'call']).size().unstack()

call,0.0,1.0
race,Unnamed: 1_level_1,Unnamed: 2_level_1
b,2278,157
w,2200,235


In [6]:
w = df[df.race=='w']
b = df[df.race=='b']
# Callback success rates for white-sounding names
w_callback = sum(w.call)/len(df.race=='w')
# Callback success rates for black-sounding names
b_callback = sum(b.call)/len(df.race=='b')

print("The callback success rate for white-sounding names is {}".format(w_callback))
print("The callback success rate for black-sounding names is {}".format(b_callback))

The callback success rate for white-sounding names is 0.048254620123203286
The callback success rate for black-sounding names is 0.03223819301848049


Just to get a sense of what the data looks like, it appears that white-sounding names get callbacks at a higher rate.
However, what we do not yet know is whether this difference is statistically significant.

# 3. Margin of error, confidence interval, and p-values.

## Frequentist approach

In [24]:
#Difference in callback rates
diff = w_callback - b_callback
#Combined proportion of callbacks
combined = (sum(w.call) + sum(b.call))/(len(df.race=='w') + len(df.race=='b'))

# Calculate standard error of sampling distribution of differences between sample proportions
std_error = np.sqrt((combined*(1-combined)/len(df.race=='w') + combined*(1-combined)/len(df.race=='b')))

# Calculate z score
z_val = diff/np.sqrt(variance)

# Margin of error for 95% CI
margin_error = stats.norm.ppf(1-((1-0.95)/2)) * std_error

# 95% CI
ci_low, ci_upper = stats.norm.interval(0.95, loc=diff, scale=std_error)

# P-value for two-sided test
pval = stats.norm.sf(abs(z_val))*2


print("The margin of error is {}".format(margin_error))
print("The 95% CI is between {} and {}".format(ci_low, ci_upper))
print("The p value is {}".format(pval))

The margin of error is 0.007806242468994194
The 95% CI is between 0.008210184635728599 and 0.02382266957371699
The p value is 5.7865956145353264e-05


## Permutation approach

The premise of permutation in this context is as follows: if there is truly no effect of names on callback success, 
    then we should be able to reassign name labels (white-sounding or black-sounding), and find that the difference in
    callback success rates is similar to that of the one we empirically observe. What we do here is the following:
- Reassign labels for our data, and compare 
- Repeat this 10,000 times
- Plot the distribution of the permutated replicates and assess how many of the replicates have a difference greater
than the one empirically observed. 

In [9]:
# Function to calculate difference between groups
def diff_callback(data_A, data_B):
    callback_A = np.sum(data_A)/len(data_A)
    callback_B = np.sum(data_B)/len(data_B)
    difference = callback_A - callback_B
    return difference

# Function to generate a permutation sample for two groups
def perm_sample(group1, group2):
    data = group1.append(group2)
    permutated_data = np.random.permutation(data)
    
    perm_sample_1 = permutated_data[:len(group1)]
    perm_sample_2 = permutated_data[len(group1):]
    return perm_sample_1, perm_sample_2

# Function to generate permutation replicates
def perm_reps(data1, data2, func, size=1):
    #Initialize array
    perm_replicates = np.empty(size)
    
    for i in range(size):
        perm_sample_1, perm_sample_2 = perm_sample(data1, data2)
    perm_replicates[i] = func(perm_sample_1, perm_sample_2)
    
    return perm_replicates

In [10]:
#Empirical difference in callback rates between white-sounding names and black-sounding names
actual_diff = diff_callback(w.call, b.call)

#Generating results of all 10,000 permutations
replicates = perm_reps(w.call, b.call, diff_callback, 10000)

#Calculating p-value
p = np.sum(replicates >= actual_diff)/len(replicates)
print("The p-value is {}".format(p))

The p-value is 0.0


# 4. What do the analyses suggest?

Both frequentist and permutation-based tests have very small p-values. This means that the difference between the 
rates of callbacks for white and black-sounding names are likely to be statistically significant and that we should
reject the null hypothesis. In other words, the data suggests that there is a real-world bias against black-sounding
names in job applications.

# 5. Is name the most important factor in callback success?
Although there does appear to be a significant difference between white-sounding names and black-sounding names, we have only 
scratched the surface. As the original dataset shows, there are many, many variables that could come into play to explain
callback success, including sex and education. As it stands, it would be too presumptuous to say that name is the most
important factor, although it certainly plays a role. 

The more appropriate approach would be to perform a machine learning analysis (in this case, a classification approach)
to understand the relative importance of each of these variables and their contributions to callback success. 