# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## What test is appropriate for this problem? Does CLT apply?

This is a large data set where we want to find the relationship between two catagorical variables as such I would use the 2-sample t-test to determine statisical significance. The CLT does apply given that there are a large number of binomial observations that can be approximated as a normal distribution.


## What are the null and alternate hypotheses?

The null hypothesis is $H_0$: White-sounding and black-sounding names have the same callback rate.

The alternative hypothesis is $H_A$: White-sounding and black-sounding names have different callback rates.

## Data Exploration

In [6]:
w = data[data.race=='w']
b = data[data.race=='b']

In [7]:
# Your solution to Q3 here

In [13]:
df = data[['race','call']]

# Seperate into two datasets so the computation isn't repeatly done
white = data[data['race']=='w']
black = data[data['race']=='b']

# Percent of call backs
prop_w = white['call'].sum() / white['call'].count()
prop_b = black['call'].sum() / black['call'].count()
print('Percentage of callbacks for whites: {:.2}'.format(prop_w))
print('Percentage of callbacks for blacks: {:.2}'.format(prop_b))

# Difference in proportion of callbacks
prop_diff = prop_w - prop_b
print('There is a {:.2} difference in percentages.'.format(prop_diff) )

Percentage of callbacks for whites: 0.097
Percentage of callbacks for blacks: 0.064
There is a 0.032 difference in percentages.


In [33]:
# 2-sample t-test
t, p = stats.ttest_ind(white['call'],black['call'],equal_var=False)
print('t-statistic: {:.2}'.format(t))
print('p-value: {:.2}'.format(p))

# Standard error
s_error = np.sqrt(white['call'].var()/white['call'].count() + black['call'].var()/black['call'].count())

# Margin of Error
m_error = 1.96 * s_error
print('Margin of error: {:.2}'.format(m_error))

# Confidence Interval
c_int = prop_diff + (np.array([-1, 1]) * m_error)
print('Confidence interval: {}'.format(c_int))


def get_diff(sample1, sample2):
    
    p1 = np.sum(sample1['call'] == 1)/len(sample1)
    p2 = np.sum(sample2['call'] == 1)/len(sample2)
    
    return abs(p1-p2) 

def get_bs_samples(sample1, sample2, func, size):
    length1 = len(sample1)
    length2 = len(sample2)
    bs_prop_diffs = np.empty(size)
    
    for i in range(size):
        combined_sample = pd.concat([sample1,sample2])
        shuffled_sample = combined_sample.sample(length1+length2).reset_index(drop=True)

        sample1 = shuffled_sample[:length1]
        sample2 = shuffled_sample[length1:]
        
        bs_prop_diffs[i] = func(sample1,sample2)
        
    return bs_prop_diffs

bs_samples = get_bs_samples(white, black, get_diff, 1000)

p = np.sum(bs_samples > prop_diff)/len(bs_samples)

print('p-value: {}'.format(p))

t-statistic: 4.1
p-value: 3.9e-05
Margin of error: 0.015
Confidence interval: [0.01677444 0.04729127]
p-value: 0


Given a p-value less than 0.05. This indicates that white-sounding and black-sounding names have different call back rates with a 95% confidence level. Thus the null hypothesis can not be rejected.

## A story describing the statistical significance in the context or the original problem.

Racial discrimination is a universal struggle across the globe. Recently the Abdul Latif Jameel Poverty Action Lab investigated how it effects the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers. The data suggest resumes paired with white-sounding names recieved an interview request 9.5% of the time while black-sounding names only got invited to interview 6.4% of the time (a difference of 3.1%). Upon statistical analysis it was determined that black sounding names have different call back rates with 95% confidence. 

## Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The anaylsis suggest that the assumed race of a job candidate can influence rather they get a call back or not. There was no analysis done on other factors such as experience, therefore we can not conclude that is the most important factor. In order to this claim I would look at more factors such as experience, education and/or skills.