# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [154]:
%matplotlib inline

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

In [155]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [156]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [157]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1,0,1,0,0,0,0,0,0,
1,b,1,3,3,6,0,1,1,0,316,...,1,0,1,0,0,0,0,0,0,
2,b,1,4,1,6,0,0,0,0,19,...,1,0,1,0,0,0,0,0,0,
3,b,1,3,4,6,0,1,0,1,313,...,1,0,1,0,0,0,0,0,0,
4,b,1,3,3,22,0,0,0,0,313,...,1,1,0,0,0,0,0,1,0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

1.) The appropriate test for this type of problem is a 2-sample proportion test, which checks to see if having a Black or White sounding name will impact the chance of a callback from a job application.

The CLT does apply to this situation. Even though we are dealing with a proportion size and not a mean, repeated simulations of the dataset will result in a normal distribution centered around a mean. 

2.) Let p1 be the proportion of white sounding names that received a call back and p2 be the proportion of black sounding names that received a call back.

    H_0: p1 - p2 = 0

    H_a: p1 - p2 =/= 0
    
    alpha = 0.05

In [158]:
w = data[data.race=='w']
b = data[data.race=='b']

In [159]:
# Your solution to Q3 here


In [160]:
p1 = sum(w.call)/len(w)
p2 = sum(b.call)/len(b)
p1, p2

(0.096509240246406572, 0.064476386036960986)

In [161]:
mean_diff = p1 - p2
mean_diff

0.032032854209445585

In [162]:
def permutations(data_1, data_2):
    "Making permutations of the original data set and outputting them as perm_sample_1 and perm_sample_2"
    data = np.concatenate((data_1, data_2))
    permuted_data = np.random.permutation(data)
    
    perm_sample_1 = permuted_data[:len(data_1)]
    perm_sample_2 = permuted_data[len(data_1):]
    
    return perm_sample_1, perm_sample_2

In [163]:
def diff_of_prop(data_1, data_2):
    "function that calculates the difference between the proportions from the repicants"
    diff = sum(data_1)/len(data_1) - sum(data_2)/len(data_2)
    return diff


In [164]:
def replicants(data_1, data_2, func, size = 1):
    "function that takes 2 datasets and a function, draws permutations of the two, and returns an array of after the function is run"
    reps = np.empty(size)
    for i in range(size):
        permw, permb = permutations(data_1, data_2)
        
        reps[i] = func(permw, permb)
    return reps

In [180]:
bootreps = replicants(w.call, b.call, diff_of_prop, 10000)
p = sum(bootreps >= mean_diff)/len(bootreps)
p

0

In [181]:
bootreps[bootreps >= mean_diff]

array([ 0.03531828])

In [182]:
conf = np.percentile(bootreps, [2.5,97.5])
conf

array([-0.01560575,  0.01478439])

From the results of the bootstrapping and the 95% confidence interval of assuming the null hypothesis is true (that there is no difference between b and w), we find that the chance of obtaining an empirical difference that we achieved of 0.032 to be outside of the 95% confidence interval. 

We've also found that the p value from taking the percentage of permutations that were as extreme as our data was so low that it was approximately 0. 

Now trying frequentist statistical approach

At the 95% confidence interval, z = 1.96. So if I get a z-score greater than 1.96, then I will be able to throw out the null hypothesis. For my test, I will assum p1-p2 = 0 and when calculating the std deviation I will assume they are equal

In [168]:
p_together = (sum(w.call) + sum(b.call))/(len(w) + len(b))
std_diff = np.sqrt(2*(p_together)*(1-p_together)/len(w))
z = (mean_diff - 0)/std_diff
z

4.1084121524343464

In [169]:
p = stats.norm.sf(abs(z))*2
p

3.9838868375850767e-05

This test corroborates what the boostrapping told me already, and the p-value is incredibly low, and the chance of achieving the results from the actual data is incredibly slim if there was no difference in correlation between race and callbacks.

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

4.) Deshawn Miller, a man of Lithuanian and Scottish descent, is applying for jobs and is not having very much success. He is finding that in order to get a single call back, he needs to send out at around 15 applications. 

One day, when arriving to the first day of practice for the basketball league he just joined, he introduces himself as Deshawn and almost every one of his team mates responded with something along the lines, "You're white?!? When I heard Deshawn, I automatically thought you were going to be black!" This got Deshawn thinking. If his team mates thought, based solely on his name, that he was a black man, then maybe so were employers. And if employers thought he was black, then there might be a possibility that they were exhibiting a bias against him (whether it be implicit or explicit). 

In order to test his hypothesis, he makes a simple change to his resume; he changes "Deshawn" to "Sean". After this simple change, he notices that he is consistently getting call back after about only 10 applications. 

Since Deshawn is only one person and he cannot really tell how accurate or repeatable this finding is, he does a study with 1,000 people. He asks them to send out 2 different resumes. One with their actual name, and one with a "black" or "white" version of their name. The results seemed to fall in line with what he was suspecting. He found that out of the 1000 applicants, there was a 3% difference for callbacks between the "white" and "black" names, in favor of the whites. Since this is still not a very big number, he runs 10,000 permutations of the data, meaning he shuffles the data, and randomly selects half of them to be "white" and half of the to be "black". He then finds the differences between all 10,000 runs. What he discovered was shocking.

Out of all 10,000 permutations, **only in one instance was there as big of a difference in call backs.** This has led Deshawn to believe that **there is a real difference between call backs based on whether or not you have a black or white sounding name.**

When running the permutations, 95% of all differences landed between +/- 1.5%, and the real difference was over 3%. Needless to day, he will continue using Shawn as his name on his applications moving forward. 

5.) Though I concluded that race/name has an affect on the number of calls back from employers, I did nothing to eliminate a stronger affect by any other of the other variables. Looking at the table, there are 65 columns that might have an effect on the call back success rate. 

The way I would adjust my analysis would be to run all the different variables through the function I wrote and see if there is a difference in call back success. Some possible variables that might have an affect that I see are military, years of experience, and number of jobs. I would then compare the differences amongst one another and see which one has the strongest correlation, or the lowest p value, and deem that one as the most important of the bunch. 
