# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names and white-sounding names
print("black-sounding names call backs: {}".format(sum(data[data.race=='b'].call)))
print("white-sounding names call backs: {}".format(sum(data[data.race=='w'].call)))
print("total call backs: {}".format(sum(data.call)))

black-sounding names call backs: 157.0
white-sounding names call backs: 235.0
total call backs: 392.0


In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
#find total number of black-sounding and white-sounding names in the dataset.
print(data.race.value_counts())

w    2435
b    2435
Name: race, dtype: int64



Your answers to Q1 and Q2 here


### 1. What test is appropriate for this problem? Does CLT apply?
### 2. What are the null and alternate hypotheses?
* As the total number of instances in both groups is large (2435 of each) and the data was taken in an independent manner, CLT does apply.  Moreover, since there are two groups of interest here a 2 sample test is appropriate.
    * I will do a two sample t test as I don't know the variance of the population, but only of the sample.
    * The null hypothesis is that there isn't a difference between the means of call backs of the two groups.
    * The alternate hypothesis is that there is a difference between the means of call backs of the two groups.
    
    
* Also, I will do a permutation 2 sample test to see if there is a statistically significant difference between the number of white-sounding names that receive call backs versus the number of black-sounding names that recieve call backs.
* The test that I will use is set up as follows:
    * The null hypothesis is that there isn't a difference between the means of call backs of the two groups.
    * I will then disregard race in the data by taking permutations of the call back data and randomly assigning half to the white-sounding name group and half to the black-sounding name group.
    * From these groups I will calculate the means of both groups, and then calculate the test statistic which is the difference between the means.
    * Repeat this process for 10000 iterations.
    * Finally calculate the p-value as the number of simulated cases that have a difference at least as extreme as the empirical value for the test statistic divided by the number of cases (10000).
    
***

### 3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
* First, I will try the freqentist approach.
* Then, I will try the permutation approach.

In [6]:
#Separate out the two groups into sub dataframes
w = data[data.race=='w']
b = data[data.race=='b']

#### Frequentist Approach

In [7]:
#gather needed data for calculating conf int, margin of error, and p-value
b_mean = np.mean(b.call)
w_mean = np.mean(w.call)
b_var = np.var(b.call)
w_var = np.var(w.call)
mean_diff = w_mean - b_mean
std_error = np.sqrt(w_var/w.call.size + b_var/b.call.size)

print("The mean for the black-sounding name call backs is {}.".format(round(b_mean,3)))
print("The variance for the black-sounding name call backs is {}.".format(round(b_var,3)))
print("The mean for the white-sounding name call backs is {}.".format(round(w_mean,3)))
print("The variance for the white-sounding name call backs is {}.".format(round(w_var,3)))
print('')
print("The difference between means is {}.".format(round(mean_diff,3)))

#calculate margin of error from the standard error
moe_95 = 2 * std_error
print("The average margin of error is {}.".format(round(moe_95,3)))
print("The 95% confidence interval is {} to {}.".format(round(mean_diff-moe_95,3),round(mean_diff+moe_95,3)))

#calculate the t score and p value from the 2 sample t test
t_score, p_val = stats.ttest_ind(w.call,b.call,equal_var=False)
print('')
print("The t score is {}, and the p-value is {}.".format(round(t_score,3),p_val))

The mean for the black-sounding name call backs is 0.064.
The variance for the black-sounding name call backs is 0.06.
The mean for the white-sounding name call backs is 0.097.
The variance for the white-sounding name call backs is 0.087.

The difference between means is 0.032.
The average margin of error is 0.016.
The 95% confidence interval is 0.016 to 0.048.

The t score is 4.115, and the p-value is 3.942941513645935e-05.


#### Permutation Approach

In [8]:
#setting up test parameters and functions
emp_diff = w_mean - b_mean
def diff_of_means(data_1, data_2):
    """Generates the difference between means of two sets of data."""
    return np.mean(data_1) - np.mean(data_2)

def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1,data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2
    
def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1,data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1,perm_sample_2)

    return perm_replicates

#create the permutation replicates
perm_reps = draw_perm_reps(w.call,b.call,diff_of_means,10000)
low, high = np.percentile(perm_reps,[2.5,97.5])
moe_perm_95 = (high-low)/2
p_val = np.sum(perm_reps >= emp_diff)/perm_reps.size

print("The 95% confidence interval is {} to {}.".format(round(low,3),round(high,3)))
print("The margin of error is {}.".format(round(moe_perm_95,3)))
print("The p-value is {}.".format(p_val))

The 95% confidence interval is -0.016 to 0.016.
The margin of error is 0.016.
The p-value is 0.0.



Your answers to Q4 and Q5 here

### 4. Write a story describing the statistical significance in the context or the original problem.
* Both tests have the same result saying that there is statistical significance for rejecting the null hypothesis.  Both p-values are significantly low.  The margins of errors are identical.  The reason the confidence intervals are different is because the permutation test assumes there is no difference of means, thus the mean of the difference of means is 0 in that case.
* Being that both tests came out statistically significant, we can reject the null hypothesis that there is no difference in call backs for white-sounding names and black-sounding names.  Thus, there is a statistical difference in the number of call backs given to white-sounding names vs. the number of call backs given to black-sounding names.  This means that there is racial discrimination in call backs given within the bounds of the simple test.

***

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?
* No the result doesn't mean that race is the most important factor in callback success.  It only shows that race is a statistically significant factor.  Other factors in the study were not considered along with race such as job experience, education, etc.  Principal component analysis and/or some sort of supervised learning algorithm (logistic regression, SVM, kNN etc.) could be run to better conceive and allocate weights to the factors behind what generates call backs or not.