# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions:
****

In [44]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
%matplotlib inline 
InteractiveShell.ast_node_interactivity = "all"

In [45]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [65]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 2 columns):
race    4870 non-null object
call    4870 non-null float32
dtypes: float32(1), object(1)
memory usage: 95.1+ KB


In [66]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [67]:
# number of black-sounding name rows
sum(df['race']=='b')

2435

In [68]:
# number of white-sounding name rows
sum(df['race']=='w')

2435

In [69]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [70]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

#  1.) What test is appropriate for this problem? Does CLT apply?

The end goal is to evaluate if race influences the callback rate in this collected sample. In this scenario since there is a large sample size with a binomial population ('b' and 'w') but the standard deviation is unknown, we should use a 2-sample t-test. With the large population (>30) we can assume the means are normally distributed across the sample and the central limit theorem applies to this problem. 

"Central Limit Theorem which says that:
given random and independent samples of N observations each, the distribution of sample means
approaches normality as the size of N increases, regardless of the shape of the population
distribution."

#  2.) What are the null and alternate hypotheses?

The null hypothesis: White and black-sounding names receive the same callback rate.
The alternative hypothesis: White and black-sounding names receive different callback rates.

# 3.) Compute margin of error, confidence interval, and p-value.

In [84]:
# Separate into two datasets
w = data[data.race=='w']
b = data[data.race=='b']

# Number of resumes
n_w = len(w)
n_b = len(b)

# Proportion of callbacks
prop_w = np.sum(w['call']) / n_w
prop_b = np.sum(b['call']) / n_b
print('Percentage of callbacks for white sounding names: ', prop_w)
print('Percentage of callbacks for black sounding names: ', prop_b)

# Difference in proportion of callbacks
prop_diff = prop_w - prop_b
print('Difference in percentage of callbacks: ', prop_diff)

# T-score
t_stat, p = stats.ttest_ind(w['call'],b['call'],equal_var=False)
print('t-statistic: ', t_stat)
print('p-value: ', p)

# Standard error
s_error = np.sqrt(w['call'].var()/n_w + b['call'].var()/n_b)

# Margin of error = Critical value x Standard error of the statistic
m_error = 1.96 * s_error
print('Margin of error:', m_error)

# Confidence Interval
c_int = prop_diff + (np.array([-1, 1]) * m_error)
print('Confidence interval:', c_int)

# p-value
p_value = stats.norm.cdf(-t_stat) * 2
print('p-value:', p_value)

prop_w/prop_b

Percentage of callbacks for white sounding names:  0.0965092402464
Percentage of callbacks for black sounding names:  0.064476386037
Difference in percentage of callbacks:  0.0320328542094
t-statistic:  4.11470529086
p-value:  3.94294151365e-05
Margin of error: 0.0152584173807
Confidence interval: [ 0.01677444  0.04729127]
p-value: 3.87674401894e-05


1.4968152866242037

In [82]:
# Frequentist Bootstrapping:
# Resample from the same population with the assumption that there is no difference between the two proportions.

df = data[['race','call']]

def get_prop_diff(sample1, sample2):
    
    prop_w = np.sum(sample1['call'] == 1)/len(sample1)
    prop_b = np.sum(sample2['call'] == 1)/len(sample2)
    
    return abs(prop_w-prop_b)
    
def get_bs_samples_diff(sample1, sample2, func, size):
    length1 = len(sample1)
    length2 = len(sample2)
    bs_prop_diffs = np.empty(size)
    
    for i in range(size):
        combined_sample = pd.concat([sample1,sample2])
        shuffled_sample = combined_sample.sample(length1+length2).reset_index(drop=True)

        new_sample1 = shuffled_sample.iloc[:length1,:]
        new_sample2 = shuffled_sample.iloc[length1:,:]
        
        bs_prop_diffs[i] = func(new_sample1,new_sample2)
        
    return bs_prop_diffs

bs_samples_diff = get_bs_samples_diff(df[df.race=='w'], df[df.race=='b'], get_prop_diff, 10000)
print(bs_samples_diff[:5])

[ 0.          0.01067762  0.00246407  0.0164271   0.00574949]


In [74]:
# p value
p = np.sum(bs_samples_diff > prop_diff)/len(bs_samples_diff)
print(p)

0.0001


P (0.0001) is the number of times the randomly selected proportion difference is greater than our samples proportion difference. A 1 in 10,000 probability of getting a sample as extreme as the diffence that we see in our samples suggests there is an impact of white vs. black sounding names.

# 4.) Write a story describing the statistical significance in the context of the original problem.

Racial discrimination is common throughout the world. Researchers performed a study to determine if there was a difference between black and white sounding names for callbacks of job applications in the United States. After analysis on this data set it appears that race has a significant role on call backs. The results from the survey show that white-sounding names received a callback rate of 9.65%, while black-sounding names received a callback rate of only 6.45%. This means that for every callback a black sounding name received, a white sounding name received 1.5 callbacks. We performed a few tests to see if this difference could be perceived as random using a 2 sample t-test and frequentist bootstrapping. The p-value for the t-test and bootstrapping are both low so we can reject the null hypothesis. This would mean white and black-sounding names do not receive the same callback rate showing that race could have a significant impact on the call back rate. There are however other variables (education, previous number of jobs, experience level, gaps in employment history, etc.) in the study that should also be taken into consideration in further examination.

# 5.) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

White or black racial sounding names were a factor for callbacks in this study. It does not however mean it is the most important factor in callback success. We would need to amend the analysis to study the rest of the data to see what else may have had an influence.