# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
#Number of samples for white-sounding names
w_len=len(data[data.race=='w'])

#Number of samples for black-sounding names
b_len=len(data[data.race=='b'])

print('Number of samples for white-sounding names: ',w_len)
print('Number of samples for black-sounding names: ',b_len)

Number of samples for white-sounding names:  2435
Number of samples for black-sounding names:  2435


<h3>Question 1</h3>

What test is appropriate for this problem? Does CLT apply?

- Since we have 2 data samples, a permutation test will provide more accurate results than a bootstrap test and it is the appropiate to use in this case.
- Since each sample has 2435 observations and the race values are assigned randomly to the resumes when presented to the employer, the CLT applies. The sample is large enough.

<h3>Question 2</h3>

What are the null and alternate hypotheses?

- The null hypothesis: the callback rate for white people is the same as for black people
- The alternate hypothesis: the callback rate for while people and black people is different.

H0: pw=pb versus Ha: pw≠pb

Where p is the probability of success or callback rate.

The statistic is the difference in callback rates between both samples.

In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

<h3>Question 3</h3>

In [8]:
#Function that generates many bootstrap replicates from a data set
def draw_bs_reps(data, func, reps=1):
    
    # Initialize array of replicates: bs_replicates
    bs_replicates=np.empty(reps)
    
    # Generate replicates
    for i in range(reps):
        bs_replicates[i]=func(np.random.choice(data, size=len(data)))
        
    return bs_replicates

def callback_p(data):
    return np.sum(data)/len(data)

In [9]:
#Generate bootstrap replicates of white-sounding calls
bs_reps_w=draw_bs_reps(w.call,callback_p,reps=10000)

#Generate bootstrap replicates of black-sounding calls
bs_reps_b=draw_bs_reps(b.call,callback_p,reps=10000)

In [10]:
#Confidence interval of white-sounding replicates
ci_bs_reps_w=np.percentile(bs_reps_w,[2.5,97.5])
print('Callback rate 95% confidence interval of white-sounding names: ', ci_bs_reps_w)

#Confidence interval of black-sounding replicates
ci_bs_reps_b=np.percentile(bs_reps_b,[2.5,97.5])
print('Callback rate 95% confidence interval of black-sounding names: ', ci_bs_reps_b)

Callback rate 95% confidence interval of white-sounding names:  [0.08501027 0.10841889]
Callback rate 95% confidence interval of black-sounding names:  [0.0550308  0.07433265]


Since the lower limit of the white-sounding names is higher than the upper limit of the black-sounding names, there is no overlap of the confidence intervals and therefore we reject the null hypothesis.

In [11]:
#Compute differences in callback rates from the bs replicates
diff_of_callback_rates=ci_bs_reps_w-ci_bs_reps_b

#Compute p_value
p_value=np.sum(diff_of_callback_rates>=0)/len(diff_of_callback_rates)
print('p-value: ',p_value)

p-value:  1.0


In all of the cases the difference between callback rates was above 0, therefore the null hypothesis is rejected. This very high p value suggests  that the white-sounding names have a higher callback rate.

In [12]:
#Estimation of mean callrate for white and black samples
p_w=np.sum(w.call)/len(w.call)
p_b=np.sum(b.call)/len(b.call)

#Compute SE
SE_w=np.sqrt(p_w*(1-p_w)/w_len)
SE_b=np.sqrt(p_b*(1-p_b)/b_len)

print('callback rate w: ',p_w)
print('callback rate b: ',p_b)
print('SE callback rate w: ',SE_w)
print('SE callback rate b: ',SE_b)

callback rate w:  0.09650924024640657
callback rate b:  0.06447638603696099
SE callback rate w:  0.005984072178128066
SE callback rate b:  0.004977121442811946


In order to use z-score to compute the margin of error we need first to check the following conditions for both samples:
    - np>=10
    - n(1-p)>=10

In [13]:
#Check both samples meet the conditions
values=[len(w)*p_w, len(w)*(1-p_w), len(b)*p_b, len(b)*(1-p_b)]
print(values)

[235.0, 2200.0, 157.0, 2278.0]


In [14]:
#Compute z_score for a 95% confidence level
z_score_95=stats.norm.ppf(0.975)

#Compute margins of error
ME_w=z_score_95*SE_w
ME_b=z_score_95*SE_b

#Print margins of error
print('Margin of error w: ',ME_w)
print('Margin of error b: ',ME_b)

#Compute confidence intervals
CI_w=[p_w-ME_w,p_w+ME_w]
CI_b=[p_b-ME_b,p_b+ME_b]

#Print confidence intervals
print('Confidence interval w: ',CI_w)
print('Confidence interval b: ',CI_b)

Margin of error w:  0.011728565950019164
Margin of error b:  0.009754978774593444
Confidence interval w:  [0.08478067429638741, 0.10823780619642573]
Confidence interval b:  [0.054721407262367544, 0.07423136481155443]


We reach to the same conclusion as in the bootstrap method.


Since the lower limit of the white-sounding names is higher than the upper limit of the black-sounding names, there is no overlap of the confidence intervals and therefore we reject the null hypothesis at a 95% confidence level.

In [15]:
#Compute z_score
z_score=(p_w-p_b)/np.sqrt(SE_w**2+SE_b**2)
print('z-score: ',z_score)

#Compute p_value based on z_score
p_value=stats.norm.sf(abs(z_score))*2
print('p-value: ',p_value)

z-score:  4.11555043573
p-value:  3.862565207522622e-05


The probability of finding a value as extreme as we found for the z-score is really small, therefore we reject the null hypothesis. The callback rates for black and white-sounding names is diffirent.

<h3>Question 4</h3>

In this dataset we have analyzed the differences in calls received by resumes where the employer could distinguish by the name if it was a black or a white person applying for the job position.  

The process of generating the data ensured that for each job position application, two high- and two low-quality resumes were sent. Higher quality resumes were including things like special honors, employement experience, extra computer skills, or volunteering experience for example.

The hypothesis of the statistical test is that both callback rates are the same and therefore there is no racial discrimination. We have found a very small p-value (3.8E-5). Based on this result, we reject the null hypothesis and conclude that there is a big probability of having racial discrimitation between white and black people. 

<h3>Question 5</h3>

#### Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Our analysis points out the differences in callback success based on the race factor. There are other factors such as years of experience, education or honors that can determine whether a person would receive a call or not. Our analysis does not evaluate if the main factor for callback success is race/name.

It is very possible that people with higher experience receive more calls. If we want to determine how other factors affect callback success we would have to evaluate profiles from the same race, since we have already proven that race has an impact in callback success.

There is no need to amend our analysis for the purpose of evaluating race discrimination. As mentioned before, for each job position four resumes were sent in total:

- 1 white-sounding name, high-quality resume
- 1 black-sounding name, high-quality resume
- 1 white-sounding name, low-quality resume
- 1 black-sounding name, low-quality resume