# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Analysis

Let's look at the data, shall we?

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
data.call.describe()

count    4870.000000
mean        0.080493
std         0.272079
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: call, dtype: float64

In [5]:
black = sum(data[data.race=='b'].call)/len(data)
white = sum(data[data.race=='w'].call)/len(data)
white_black = sum(data.call)/len(data)

In [6]:
b_percent = black/white_black*100
w_percent = white/white_black*100
difference = (white/white_black - black/white_black)*100
print "Percentage of black-sounding resumes ", b_percent
print "Percentage of white-sounding resumes ", w_percent
print "Difference between these two percentages ", difference

Percentage of black-sounding resumes  40.0510204082
Percentage of white-sounding resumes  59.9489795918
Difference between these two percentages  19.8979591837


White sounding names were more likely to be called back than black sounding names by 19.90% 

### Calculating the margin of errors and confidence interval

In [32]:
z_critical = stats.norm.ppf(0.975)
p = white/white_black 
margin_of_error = np.sqrt((p*(1-p)/len(data)))*z_critical
min_CI = p - margin_of_error
max_CI = p + margin_of_error 

print "White-sounding names"
print "Mean: ", b_mean
print "Margin of error: ", margin_of_error
print "Confidence interval: ", confidence_interval


White-sounding names
Mean:  0.0322381930185
Margin of error:  0.0137619919899
Confidence interval:  (0.064476385712623596, 0.064476385712623596)


In [33]:
z_critical = stats.norm.ppf(0.975)
p = black/white_black
margin_of_error = np.sqrt((p*(1-p)/len(data)))*z_critical
min_CI = p - margin_of_error
max_CI = p + margin_of_error

print "Black-sounding names" 
print "Mean: ", p 
print "Margin of error: ", margin_of_error 
print "Confidence interval: ", min_CI, ',', max_CI 

Black-sounding names
Mean:  0.400510204082
Margin of error:  0.0137619919899
Confidence interval:  0.386748212092 , 0.414272196072


### Conclusions

Black-sounding names get called back less than white-sounding names. 

Specifically, white-sounding names were called back at 60.0 +/-1.4 % of the time.
Black-sounding names were called back at 40.1 +/- 1.4% of the time. 

However, this is solely determined by the data retrieved from callbacks. Of course, there are other factors that could play in the decision whether they would get a call back or not such as degrees, years of experience, and age to name a few. 