# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt


In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1,0,1,0,0,0,0,0,0,
1,b,1,3,3,6,0,1,1,0,316,...,1,0,1,0,0,0,0,0,0,
2,b,1,4,1,6,0,0,0,0,19,...,1,0,1,0,0,0,0,0,0,
3,b,1,3,4,6,0,1,0,1,313,...,1,0,1,0,0,0,0,0,0,
4,b,1,3,3,22,0,0,0,0,313,...,1,1,0,0,0,0,0,1,0,Nonprofit


## 1. What test is appropriate for this problem? Does CLT apply?

A two sample hypothesis test is appropriate as we will compare the average number of interview requests from both b and w race identifiers.  The central limit theorem applies if the sample size is greater than 30. As the sample size for both b and w are 2,435 we have large enough samples for the central limit theorem to apply.  

In [5]:
brequests = data[data.race=='b'].call
wrequests = data[data.race=='w'].call

print len(brequests)
print len(wrequests)

2435
2435


## 2. What are the null and alternate hypotheses?

Null hypothesis: Race does not have a significant impact on interview requests (proportion of requests for b = proportion of requests for w)

Alternative hypothesis: Race does have a significant impact on interview requests (proportion of requests for b != mean proportion of requests for w)

The siginificance level that will be used for this test will be the typical .05 threshold.

## 3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

In [6]:
#create a function to draw bootstrap replicates
def bootstrap_replicate_1D(data, func):
    bs_sample = np.random.choice(data, len(data))
    return func(bs_sample)

def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1D(data, func)

    return bs_replicates

In [7]:
# Create a function that find the proportion of requests
def proportion(data):
    """Find the proportion"""
    proportion = np.sum(data) / len(data)
    
    return proportion

# Compute proportion of all requests, w requests, and b requests
mean_requests = proportion(data.call)
mean_requests_w = proportion(wrequests)
mean_requests_b = proportion(brequests)

# Compute difference of means
diff_means = mean_requests_w - mean_requests_b

# Generate shifted arrays
wrequests_shifted = wrequests - mean_requests_w + mean_requests
brequests_shifted = brequests - mean_requests_b + mean_requests

# Compute 10,000 bootstrap replicates from shifted arrays
bs_replicates_w = draw_bs_reps(wrequests_shifted, proportion, size=10000)
bs_replicates_b = draw_bs_reps(brequests_shifted, proportion, size=10000)

# Get replicates of difference of means: bs_replicates
bs_replicates = bs_replicates_w - bs_replicates_b

In [8]:
print 'The difference of observed means: ', diff_means

# Compute a 95% confidence interval
conf95 = np.percentile(bs_replicates, [2.5, 97.5])
print '95% confidence interval of difference of means (assuming null) = ', conf95

# Compute and print p-value: p
p = np.sum(bs_replicates >= diff_means) / len(bs_replicates)
print 'p-value =', p

The difference of observed means:  0.0320328542094
95% confidence interval of difference of means (assuming null) =  [-0.0151951   0.01478437]
p-value = 0


In [9]:
#frequentist approach

#the sample difference of means is already calculated as diff_means

#find Standard Error (margin of error)
stdb = (proportion(brequests) * (1-proportion(brequests))) / len(brequests)
stdw = (proportion(wrequests) * (1-proportion(wrequests))) / len(wrequests)
se = np.sqrt(stdb + stdw)

# find zscore (use z because sample size is greater than 30)
zscore = (diff_means - 0) / se
zscore



4.1155504357300003

With a zscore of more than 4 std deviations from the mean, the likelyhood of us getting the observed result (assuming the null hypothesis is true) is much less than our significance level of 5%

## 4.Write a story describing the statistical significance in the context of the original problem

In this study we found that race does have a significant impact on the number of interview requests received in the US labor market. 4,870 resumes and interview request results were collected, and a total number of 392 candidates out of the sample received requests for interviews. 235 contained white sounding names and 157 contained black sounding names. To determine if race played a signficant role in the number of requests, we conducted a hypothesis test assuming that the average number of resumes with white sounding names and black sounding names had the same callback success rate. Conducting this test, we found that the probability of getting our observed difference in callback successes was almost zero percent assuming average callback success was equal between white and black sounding names. This result allowed us to reject the null hypothesis, and determine the difference between white sounding name and black sounding name callback success was statistically significant.

## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Our testing proved that race/name was a significant factor in callback success, but that is the extent of what was studied. There are other factors that can potentially contribute to callback success given in the data. Years of experience, education, and if they had served in military or not are a few examples. To truly test which factor is the most important, we would have to test all the possible factors in the data, and compare their impacts on callback success. This would help determine which factor had the highest impact overall.