# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [10]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import requests
import numpy as np
import h5py
import seaborn as sns #requires 0.9.0
import researchpy as rp
from scipy import stats
from sklearn import datasets

In [11]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [16]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [13]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [14]:
data.describe()

Unnamed: 0,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,...,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind
count,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,...,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0
mean,3.61848,3.661396,7.842916,0.052772,0.411499,0.097125,0.448049,215.637782,3.48152,0.559548,...,0.106776,0.437166,0.07269,0.082957,0.03039,0.08501,0.213963,0.267762,0.154825,0.165092
std,0.714997,1.219126,5.044612,0.223601,0.492156,0.296159,0.497345,148.127551,2.038036,0.496492,...,0.308866,0.496083,0.259649,0.275854,0.171677,0.278932,0.410141,0.442847,0.361773,0.371308
min,0.0,1.0,1.0,0.0,0.0,0.0,0.0,7.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,3.0,5.0,0.0,0.0,0.0,0.0,27.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4.0,4.0,6.0,0.0,0.0,0.0,0.0,267.0,4.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,4.0,4.0,9.0,0.0,1.0,0.0,1.0,313.0,6.0,1.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
max,4.0,7.0,44.0,1.0,1.0,1.0,1.0,903.0,6.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


<div class="span5 alert alert-success">
<p>Answers to Q1 and Q2</p>
</div>

<h2>1. What test is appropriate for this problem? </h2>
<p>This is a 2 sample hypothesis test.  We are comparing 2 proportions.</p>
<h2>Does CLT apply?</h2>
<p>This is an example of a 2 proportion test which is different than what is normally done for mean testing.</p>
<p>The conditions we need for inference on proportion are:
    <ul>
        <li><strong>Random:</strong> The data needs to come from a random sample or randomized experiment.</li>
        <li><strong>Normal:</strong> The sampling distribution of p&#770; needs to be approximately normal — needs at least 10 expected successes and 10 expected failures.</li>
        <li><strong>Independent:</strong> Individual observations need to be independent. If sampling without replacement, our sample size shouldn't be more than 10%, percent of the population.</li>
    </ul>
    </p>
<p>Based on what is mentioned above, the data appears to be Random.</p>
<p>Based on the sample size, it appears the sample size is less than 10% of the population</p>
<p>We need to determine if the data is normal.</p>

In [17]:
white_df = data[data.race=='w']
black_df = data[data.race=='b']

# Number of resumes per race
white_resumes = len(white_df.race)
black_resumes = len(black_df.race)

# Number of calls per race
white_calls = sum(data[data.race=='w'].call)  #also the mean
black_calls = sum(data[data.race=='b'].call)  #also the mean

# Sample proportions
white_sample_proportions = white_calls / white_resumes
black_sample_proportions = black_calls / black_resumes

print('White Sample Proportions:: ', white_sample_proportions)
print('Black Sample Proportions:: ', black_sample_proportions)

White Sample Proportions::  0.09650924024640657
Black Sample Proportions::  0.06447638603696099


In [18]:
# Pooled Proportions
pooled_proportions = (white_calls + black_calls) / (white_resumes + black_resumes)

# We are using pooled proportion as our percentage to calcualte the success and error
print('Pooled Proportions:: ', pooled_proportions)

Pooled Proportions::  0.08049281314168377


In [19]:
# Now lets check to see if the success and failures for each group is over 10
success_white_resumes = white_resumes * pooled_proportions
failure_white_resumes = white_resumes * (1 - pooled_proportions)

success_black_resumes = black_resumes * pooled_proportions
failure_black_resumes = black_resumes * (1 - pooled_proportions)

print('Success White Resumes:: ', success_white_resumes)
print('Success Black Resumes:: ', success_black_resumes)
print('Failure White Resumes:: ', failure_white_resumes)
print('Failure Black Resumes:: ', failure_black_resumes)

Success White Resumes::  195.99999999999997
Success Black Resumes::  195.99999999999997
Failure White Resumes::  2239.0
Failure Black Resumes::  2239.0


<p>All of these values are larger than 10 so we have met the 3rd condition</p>

<h2>2.  What are the null and alternate hypotheses?</h2>
<p>H<sub>0</sub> p<sub>w</sub> = p<sub>b</sub></p>
<p>H<sub>a</sub> p<sub>w</sub> != p<sub>b</sub></p>

<h2>3.  Compute margin of error, confidence interval, and p-value.</h2>
<p>Margin of Error</p>
<p>ME = Z<sup>*</sup> sqrt((p<sub>1</sub>(1-p<sub>1</sub>)/n<sub>1</sub>) + (p<sub>2</sub>(1-p<sub>2</sub>)/n<sub>2</sub>))</p>


In [22]:
# Find the first portion of the equation
# Since we are trying to calculate the null hypothesis we will use pooled_proportions as p1 & p2

phat_se_diff = np.sqrt(((pooled_proportions*(1-pooled_proportions))/white_resumes)+ ((pooled_proportions*(1-pooled_proportions))/black_resumes))
phat_se_diff

0.007796894036170457

<p>We need the Z value to finish calculating the Margin of Error</p>

In [23]:
# We are using the mean = 0 since we are trying to calculate the null hypothesis
z_value = (white_sample_proportions - black_sample_proportions)/phat_se_diff
z_value

4.108412152434346

In [24]:
margin_of_error = z_value * phat_se_diff
margin_of_error

0.032032854209445585

Our margin of error is 0.032.

Now calculate our upper and lower confidence intervals.

In [35]:
upper_confidence_interval = (white_sample_proportions - black_sample_proportions) + margin_of_error
lower_confidence_interval = (white_sample_proportions - black_sample_proportions) - margin_of_error
print((black_sample_proportions))
print('The confidence interval is ',"{:10.6f}".format(upper_confidence_interval),' ',"{:10.6f}".format(lower_confidence_interval))

0.06447638603696099
The confidence interval is    0.064066     0.000000


In [36]:
# Calculate the p-value
p_value = stats.norm.sf(abs(z_value))*2
print('p-value:: ',"{:10.6f}".format(p_value))

p-value::    0.000040


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

<h2>4.  Write a story describing the statistical significance in the context of the original problem.</h2>

The original problem focuses on racial descrimination around black sounding names on resumes.  It was proposed that the racial descrimination is causing black individuals to get less calls than their white counterparts.

It was found that their appeared to be a difference between white and black calls compared to the number of resumes.  With white indivdiuals having a percentage of 10% and blacks only 6%.

I think calculated the overall p-value for the null hypothesis that there is no difference between whites and blacks.  We found a p-value of 0.00004 which indicates we can reject our null hypothesis and confidently state there is a difference between calls between whites and blacks.

<h2>5.  Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?</h2>

My analysis states that race/name is an important factor in callback success.  The low p-value indicates that there is a confident difference between blacks and whites.

However, the data set has many other features that we did not compare with callback succes.   I'd suggest further analysis around the other factors to see if there is a relationship along with Cramer's V to understand the relationship between the 2 variables.