# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import numpy as np
from scipy import stats
%matplotlib inline
sns.set()

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

### Question 1 - 3 

In [3]:
white = data[data.race=='w']
black = data[data.race=='b']

In [4]:
print('Makred white')
print(' - number of resumes: ', white.shape[0])
print(' - number of callbacks: ', int(white['call'].sum()))
print(' - callback ratio: ', round(int(white['call'].sum())/white.shape[0], 4))

print('\nMakred black')
print(' - number of resumes: ', black.shape[0])
print(' - number of callbacks: ', int(black['call'].sum()))
print(' - callback ratio: ', round(int(black['call'].sum())/black.shape[0], 4))

Makred white
 - number of resumes:  2435
 - number of callbacks:  235
 - callback ratio:  0.0965

Makred black
 - number of resumes:  2435
 - number of callbacks:  157
 - callback ratio:  0.0645


Sample size is large. Though the current callback distribution is binomial as number of observations become large, standardized sample means are normally distributed and since actual population **n** would tend towards **infinite** (i.e. a lot of resumes in the real world), *CLT applies*. 

Two data sets where wee could just compare the sample means in a **2 sample t-test**. Test statistics will be **difference in callback means**, which is same as difference in callback ratio as number of entires is same for each 'b' and 'w' dataset. 

But first let's try bootstrap hypothesis test by drawing random instances from original data and assign to white and black bootstrap samples. I will calculate the difference in means from bootstrap samples and test if the difference is at least as large as the original difference.

**Null Hypothesis:** There is no difference in callback ratio for resumes marked white or black.

**Alternate Hypothesis:** Race influence on resume callbacks is statistically significant.

**Test statistic:** Difference in means of callbacks for 'w' and 'b'

#### Bootstrap test


In [5]:
white = white[['race', 'call']]
black = black[['race', 'call']]

In [6]:
def bootstrap_replicate_1d(data, func):
    bs_sample = np.random.choice(data, size=len(data))
    return func(bs_sample)

def draw_bs_reps(data1, func, size=1):
    bs_replicates = np.empty(size)
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)
    return bs_replicates

def diff_of_means(data1, data2):
    return (np.mean(data1) - np.mean(data2))

In [7]:
# Test statistic difference of means

original_diff_means = diff_of_means(white.call, black.call)

# Initialize bootstrap replicates: bs_replicates
bs_replicates = np.empty(10000)

for i in range(10000):
    # Generate bootstrap sample
    bs_sample = np.random.choice(data.call, size=len(data.call))
    
    # Compute replicate
    bs_replicates[i] = diff_of_means(bs_sample[:len(white)],
                                     bs_sample[len(black):])

# Compute and print p-value: p
p = np.sum(bs_replicates >= original_diff_means) / len(bs_replicates)

# Variance for two samples, white and black
var_2samp = (white.call.var()/len(white)) + (black.call.var()/len(black))

# Standard error
std_dev = np.sqrt(var_2samp)

# t statistic
t_stat = (white.call.mean() - black.call.mean()) / std_dev

# Margin of error with 95% confidence
margin_error = 1.96 * std_dev

# Confidence interval
conf_interval = original_diff_means + np.array([-1, 1]) * margin_error

print('BOOTSTRAP')
print('\nDifference in call back ratio: ', original_diff_means)
print('T statistics: ', t_stat)
print('pvalue: ', p)
print('Margin of error: ', margin_error)
print('Confidence interval: ', conf_interval)

BOOTSTRAP

Difference in call back ratio:  0.032032855
T statistics:  4.114705290861751
pvalue:  0.0
Margin of error:  0.015258540060051217
Confidence interval:  [0.01677431 0.04729139]


#### 2 sample T test

In [8]:
# Run a T test with difference of means for white and black datasets

t_ttest = stats.ttest_ind(white.call, black.call)[0]
p_ttest = stats.ttest_ind(white.call, black.call)[1]

# Differnece in means for white and black datasets will be normally distributed
# Std deviation will be the sum of two datasets

# Variance for two samples, white and black
var_2samp = (white.call.var()/len(white)) + (black.call.var()/len(black))

# Standard error
std_dev = np.sqrt(var_2samp)
margin_ttest = 1.96 * std_dev
# 95% confidence interval
conf_interval = stats.norm.interval(0.95, loc=original_diff_means, scale=std_dev)

print('T - test')
print('\nDifference in call back ratio: ', original_diff_means)
print('T statistics: ', t_ttest)
print('pvalue: ', p_ttest)
print('Margin of error: ', margin_ttest)
print('Confidence interval: ', conf_interval)

T - test

Difference in call back ratio:  0.032032855
T statistics:  4.114705290861751
pvalue:  3.940802103128886e-05
Margin of error:  0.015258540060051217
Confidence interval:  (0.016774595174263628, 0.04729111453585753)


### Question 4 - 5 

#### Summary

Write a story describing the statistical significance in the context or the original problem.
Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The callback ratio is higher by 3.2% points in resumes marked *white*. Our null hypothesis that the difference is by chance was rejected as the pvalue of 3.9e-05 is very small, confirming that this difference is not due to mere random chance. 

It can be concluded, race plays an important factor towards job prospects. However, this analysis does not take into consideration other factors which would influence a resume call back. Also, it does not weigh race as a factor compared to others. Education, experience, additional skills, etc. would be important in decision making as well.

I would look at all the available features in the data set and find correlation with call value wherever applicable. Strong correlation does not suggest causation. Thus, I would want to conduct multiple hypothesis tests for features with strong correlation. Regression analysis would come in handy to explain the effects of changes in feature values and success. As we know call feature aka as dependent variable is binary, **logistic regression** would be the best approach.
