# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [37]:
import pandas as pd
import numpy as np
from scipy import stats

In [38]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [39]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [40]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

##### For this problem, we can apply Bootstrap Resampling Technique in order to deal with this problem.
##### Principles of CLT are definately appicable in our case.

In [41]:
w = data[data.race=='w']
b = data[data.race=='b']

In [42]:
# Your solution to Q3 here

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

#### For this problem, we can apply Bootstrap Resampling Technique in order to deal with this problem.
#### Principles of CLT are definately appicable in our case.

#### Total Whites and Black Count

In [43]:
data.shape

(4870, 65)

### Total number of candidates taken into experiment is 4870

In [44]:
w.shape

(2435, 65)

In [45]:
b.shape

(2435, 65)

##### Total number of Whites: 2435
##### Total number of Blacks: 2435

In [46]:
whites_getting_calls = w[w.call==1]
black_getting_calls = b[b.call==1]

In [47]:
black_getting_calls.shape

(157, 65)

In [48]:
whites_getting_calls.shape

(235, 65)

##### Number of Black candidates getting calls: 235

### Z-Test for Comparing Two Proportions

##### Null Hypothesis: To answer this question, we will evaluate the hypothesis that the two proportions (whites and blacks getting interview calls ) are the same.
##### Alternate Hypothesis: To answer this question, we will evaluate the hypothesis that the two proportions(whites and blacks getting interview calls ) are not the same.

#### Two Samples z-test for Proportions
#### z = (p1 - p2)/sqrt(p1(1-p1)*(1/n1 -1/n2 ))
#### where
#### p1 = x1/ n1
#### p2 = x2/n2

In [49]:
# Your solution to Q3 here

In [50]:
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import *
from IPython.display import Image
import matplotlib.ticker as mtick

import scipy.stats as stats
import statsmodels.stats.weightstats as wstats
from collections import OrderedDict

from __future__ import print_function
from __future__ import division
%matplotlib inline

In [51]:
whites_getting_calls = 235
total_sample_of_whites = 2435

black_getting_calls = 157
total_sample_of_blacks = 2435

In [66]:
# implementation from scratch
def ztest_proportion_two_samples(x1, n1, x2, n2, one_sided=False):
    p1 = x1/n1
    p2 = x2/n2    

    p = (x1+x2)/(n1+n2)
    se = p*(1-p)*(1/n1+1/n2)
    se = sqrt(se)
    
    z = (p1-p2)/se
    moe = z*se
    CI = (p1-p2)+moe
    p = 1-stats.norm.cdf(abs(z))
    p *= 2-one_sided # if not one_sided: p *= 2
    return z, p, moe,CI


In [67]:
z,p,moe,CI = ztest_proportion_two_samples(whites_getting_calls, total_sample_of_whites, black_getting_calls, total_sample_of_blacks, one_sided=False)
print(' z-stat = {z} \n p-value = {p} \n margin of error = {moe} \n Confidence Interval = {CI}'.format(z=z,p=p,moe=moe,CI=CI))

 z-stat = 4.108412152434346 
 p-value = 3.983886837577444e-05 
 margin of error = 0.032032854209445585 
 Confidence Interval = 0.06406570841889117


##### P-Value = 3.983886837577444e-05
##### Margin of Error = 0.032032854209445585
##### Confidence Interval = 0.06406570841889117
##### The low p-value indicates that the Null-Hypothesis is rejected.
##### Final Conclusion: 
##### This indicates that the race factor does have a bearing on the candidate getting interview calls. But at the same time there could be other factors at play that may influence the candidate's prospects of getting interview calls.