# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


# What test is appropriate for this problem? Does CLT apply?

In [6]:
print('There are', len(data), 'observations in this sample')

There are 4870 observations in this sample


Due to the size of the data sample we can apply the Central limit Theorem. Since we are testing the mean success rate of two different samples, a two-sample t-test is appropriate.

# State the null and alternate hypotheses

### The null hypothesis we are testing is that the number of callbacks for applicants with black-sounding names equals the number of callbacks for applicants with white-sounding names. So the alternate hypothesis is the scenario where the number of callbacks for both races are not equal.
#### Ho: Pb - Pw = 0
#### H1: Pb - Pw != 0

In [7]:
w = data[data.race=='w'].call #callbacks for white-sounding names
b = data[data.race=='b'].call #callbacks for black-sounding names

# Compute margin of error, confidence interval, and p-value. Using both the bootstrapping and the frequentist statistical approaches. 

In [8]:
def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1) - np.mean(data_2)

    return diff

w_np = w.values
b_np = b.values

In [9]:
# bootstrap replicate function to generate replicate datasets
def bootstrap_replicate_1d(data, func, seed=1):
    np.random.seed(seed)
    return func(np.random.choice(data, size=len(data)))


def draw_bs_reps(data, func, size=1, seed=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func, seed+i)

    return bs_replicates

In [10]:
empirical_diff_of_means = diff_of_means(w_np, b_np)
race_concat = np.concatenate((w_np, b_np))
bs_replicates = np.empty(10000)

In [11]:
# Find the mean of all callbacks
mean_callback = np.mean(race_concat)

# Generate shifted arrays
w_np_shifted = w_np - np.mean(w_np) + mean_callback
b_np_shifted = b_np - np.mean(b_np) + mean_callback

# Compute 10,000 bootstrap replicates from shifted arrays
bs_replicates_w = draw_bs_reps(w_np_shifted, np.mean, 10000)
bs_replicates_b = draw_bs_reps(b_np_shifted, np.mean, 10000)

# Get replicates of difference of means: bs_replicates
bs_replicates = bs_replicates_w - bs_replicates_b

# Compute and print p-value: p
p = np.sum(bs_replicates >= empirical_diff_of_means) / len(bs_replicates)
print('p-value =', p)

p-value = 0.0


In [12]:
# Compute 95% confidence intervals
conf_int_w = np.percentile(bs_replicates_w, [2.5, 97.5])
conf_int_b = np.percentile(bs_replicates_b, [2.5, 97.5])

In [13]:
print('The average callback rate for white-sounding names is ', conf_int_w, 'with 95% certainty')
print('The average callback rate for black-sounding names is ', conf_int_b, 'with 95% certainty')

The average callback rate for white-sounding names is  [0.06899384 0.09199178] with 95% certainty
The average callback rate for black-sounding names is  [0.07104725 0.0903491 ] with 95% certainty


In [16]:
data_b = data[data['race'] == 'b']
data_w = data[data['race'] == 'w']

# using frequentist statistical approaches
#import stats module
import statsmodels.stats.weightstats as sms
two_sample = stats.ttest_ind(data_w['call'], data_b['call'])
cm = sms.CompareMeans(sms.DescrStatsW(data_w['call']), sms.DescrStatsW(data_b['call']))

print('The t-statistic and p-value are given as', two_sample)

The t-statistic and p-value are given as Ttest_indResult(statistic=4.114705290861751, pvalue=3.940802103128885e-05)


In [17]:
print('The 95% confidence interval about the mean difference is ({:.3f}, {:.3f}).'.format(cm.tconfint_diff(usevar='unequal')[0],
                                                                                          cm.tconfint_diff(usevar='unequal')[1]))

The 95% confidence interval about the mean difference is (0.017, 0.047).


In [18]:
print('The margin of error is {:.3f}.'.format((data_b['call'].mean() - data_w['call'].mean()) 
                                              - cm.tconfint_diff(usevar='unequal')[0]))

The margin of error is -0.049.


In [19]:
print(data_b['call'].mean())
print(data_w['call'].mean())

0.0644763857126236
0.09650924056768417


# Write a story describing the statistical significance in the context or the original problem.

From the analysis, black-sounding names have a 6.4% callback rate as opposed to white sounding names. This difference gives us reason to be suspicious of hiring patterns. Given the p-value, test-statistic and confidence interval; we can prove that there is a statistically significant difference between the number of black-sounding names that recieved callbacks and white-sounding names that received callbacks. We are 95% confident that white-sounding names receive between 0.017 to 0.047 callbacks more than black-sounding names. Thus, from a statistical standpoint racial discriminationin the U.S Labor Market is still a major obstacle.  

# Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Although from the analysis we can infer that race plays somewhat of a role in the rate of callback success for job applicants, It is not enough to indicate that this is the most important factor in callback success. There are other factors such as education, experience etc. that contribute to the rate of callback success. We can go further and do a regression analysis to determine the strength of the relationship between race and callback success. 