# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
We will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

We will answer the following questions. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. We will compute the margin of error, confidence interval, and p-value. We will use both bootstrapping and frequentist statistical approaches.
   4. We will write a story describing the statistical significance in the context or the original problem.
   5. Does our analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how can we amend our analysis?


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
</div>
*****

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [12]:
data.shape

(4870, 65)

In [10]:
# number of callbacks for black-sounding names
white_calls = sum(data[data.race=='w'].call)
black_calls = sum(data[data.race=='b'].call)
print("White Calls:", white_calls)
print("Black Calls:", black_calls)

White Calls: 235.0
Black Calls: 157.0


In [18]:
w = data[data.race=='w']
b = data[data.race=='b']

print("Rows of white names:", w.shape[0])
print("Rows of black names:", b.shape[0])

Rows of white names: 2435
Rows of black names: 2435


Let n = 4870, pw be the percentage of whites who received calls for jobs, and pb be the percentage of blacks who received calls for jobs. Then the following equalities hold.


In [27]:
pw = 235/2435
pb = 157/2435
print("n(pw) =", pw*4870)
print("n(pb) =", pb*4870)
print("n(1-pw) =", (1-pw)*4870)
print("n(1-pb) =", (1-pb)*4870)

n(pw) = 470.0
n(pb) = 314.0
n(1-pw) = 4400.0
n(1-pb) = 4556.0 


<div class="span5 alert alert-success">
<p> The appropriate test for this problem is a hypothesis test comparing population proportions. The CLT (Central Limit Theorem) applies since:</p>
    <ol>
    <li>RANDOM -The data come from two groups in a randomized experiment.
    <li>10% - The total number of white people and black people applying for jobs is clearly greater than 10X the sample. 
    <li> LARGE COUNTS - n(pw), n(pb), n(1-pw) & n(1-pb) are all at least 10.
    </ol>
    <p>The null hypothesis is that there is no significant difference between pw and pb, that is, the proportion of white-sounding-names and black-sounding-names who receive calls for job interviews. The alternative hypothesis is that there is a significant difference between the proportion of white-sounding-names and black-sounding-names who receive calls for job interviews. We assume that the null hypothesis is true, that is, pw-pb=0.</p>
</div>

## Frequentist Statistics

### Confidence Interval

In [43]:
#Compute z-score for 99% Confidence Interval
import scipy.stats as stats
z_score = stats.norm.ppf(.995)

#Compute Margin of Error
import numpy as np
margin_error = z_score*np.sqrt(pw*(1-pw)/w.shape[0]+pb*(1-pb)/b.shape[0])
print("Margin of Error=", margin_error, ".")

Margin of Error= 0.02004863403754258 .


In [40]:
#Compute Difference of Proportions
p_diff = pw-pb

#Compute Confidence Interval
c_min = p_diff - margin_error
c_max = p_diff + margin_error

print('We are 99% Confident that (', c_min, ",", c_max, ") captures the true difference in proportions.")

We are 99% Confident that ( 0.011984220171903006 , 0.05208148824698816 ) captures the true difference in proportions.


### p-value

In [42]:
z_score2 = p_diff/np.sqrt(pw*(1-pw)/w.shape[0]+pb*(1-pb)/b.shape[0])
p = 1-stats.norm.cdf(z_score2)
print("The probability that the difference in proportion is due to chance is", p, ".")

The probability that the difference in proportion is due to chance is 1.9312826037620745e-05 .


## Bootstrapping Method

### Define Functions

In [56]:
# Define general bootstrap function to work with pairs
def draw_bs_pairs(x, y, func, size=1):
    """Perform pairs bootstrap for particular function."""

    # Set up array of indices to sample from: inds
    inds = np.arange(len(x))

    # Initialize replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_inds = np.random.choice(inds, len(inds))
        bs_x, bs_y = x[bs_inds], y[bs_inds]
        bs_replicates = func(bs_x, bs_y)
        
    return bs_replicates

# Define difference function to compute difference in proportions
def diff_of_p(x,y):
    return abs(x-y)

### Compute p-value

In [57]:
# Acquire 10000 bootstrap replicates using difference of proprotion
x = np.random.choice(data.call, size=w.shape[0])
y = np.random.choice(data.call, size=w.shape[0])
bs_replicates = draw_bs_pairs(x, y, diff_of_p, size=10000)

results = np.sum(bs_replicates>=diff_of_p(white_calls,black_calls))/len(bs_replicates)

print("p-value =", results)

p-value = 0.0


Extraordinary! By taking random samples with replacement, in 10000 tries, not once did the difference in proportion of those who received call backs surpass the difference in proprotion between black and white-sounding names. Let's try 100,000 samples out of curiosity.

In [58]:
# Acquire 10000 bootstrap replicates using difference of proprotion
x = np.random.choice(data.call, size=w.shape[0])
y = np.random.choice(data.call, size=w.shape[0])
bs_replicates = draw_bs_pairs(x, y, diff_of_p, size=100000)

results = np.sum(bs_replicates>=diff_of_p(white_calls,black_calls))/len(bs_replicates)

print("p-value =", results)

p-value = 0.0


It look a little longer, but same result.

<div class="span5 alert alert-success">
<p> The data unquestionably reveal racial discrimination in the US job market. None of the 100,000 random samples found a difference in proportion of resumes that received callbacks from potential employers as great as the difference in proportion between black and white-sounding names. </p>
<p> This does not prove, however, that race is the most important factor in callback success. Exploratory data analysis on years of experience, education and other columns is recommended for further research.
</p>
</div>