# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [22]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [11]:
# number of callbacks for black-sounding names
print(sum(data[data.race=='b'].call))

# number of callbacks for white-sounding names
print(sum(data[data.race=='w'].call))

157.0
235.0


In [7]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

**Question 1**: What test is appropriate for this problem? Does CLT apply?

> The context of this problem suggest that we want to test whether there is a difference between the callback rate of white and black sounding resumes. This suggest a two sample test, in which two different samples are compared. 

> Central Limit Theorem applies to this question due to the following: 

>> 1. Independence: We can reasonably assume that a sample of is less than 10% of the black and white sounding resumes population.
>> 2. Randomness: The experiment assigned black and white sounding names to the same resume at random.
>> 3. Sample: Sample is > 30.


**Question 2**: What are the null and alternate hypotheses?

> * H0: There is no difference in the proportion of callbacks between black-sounding names and white-sounding names.
> * H1: The of callbacks to black-sounding names and white-sounding names is different.
> * Alpha = 0.05

In [17]:
w = data[data.race=='w']
b = data[data.race=='b']

**Question 3**: Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

> **Frequentist Approach**

In [56]:
#Calculate Mean and Length of samples
#Mean is also rate of succeses since is a 0 or 1 entry.

b_mean = np.mean(b.call) 
w_mean = np.mean(w.call)

b_count = len(b.call)
w_count = len(w.call)


print('Mean Black-Sounding Resumes Callback: ',round(b_mean,3))
print('Mean White-Sounding Resumes Callback: ',round(w_mean,3))

print('Black Trials:',b_count, 'White Trials:',w_count)

Mean Black-Sounding Resumes Callback:  0.064
Mean White-Sounding Resumes Callback:  0.097
Black Trials: 2435 White Trials: 2435


In [46]:
# Margin of error / Standard error

print('The standard error black sample: ', round(stats.sem(b.call),3))

print('The standard error white sample: ', round(stats.sem(w.call),3))


# 95% confidence interval

b_interval = stats.t.interval(alpha = .95, df = b_count-1, loc = b_mean, scale=stats.sem(b.call))

print('Black Sample Confidence Interval: ', round(b_interval[0],3), '-',round(b_interval[1],3))

w_interval = stats.t.interval(alpha = .95, df = w_count-1, loc = w_mean, scale=stats.sem(w.call))

print('White Sample Confidence Interval: ', round(w_interval[0],3), '-',round(w_interval[1],3))

#Hypothesis testing using t-test approach because population parameters are unknown. 

ttest_stat, pvalue_ttest = stats.ttest_ind(w.call,b.call)

print('T-statistic: ', ttest_stat)
print('P-value T-Test: ', pvalue_ttest)

#Reject the null hypothesis the sample means (proportions) for black and white sounding resumes are equal.

The standard error black sample:  0.005
The standard error white sample:  0.006
Black Sample Confidence Interval:  0.055 - 0.074
White Sample Confidence Interval:  0.085 - 0.108
T-statistic:  4.114705290861751
P-value T-Test:  3.940802103128886e-05


> **Bootstrap Approach**

In [47]:
#Define bootstrap function

def bs_sample(v, function, size = 10000):
    
    #Generate random seed for replicability
    np.random.seed(42) 
    
    #Initialize replicates
    bs_sample = np.empty(size)
    
    #Create replicates
    for i in range(size):
        bs = function(np.random.choice(v, size=len(v)))
        bs_sample[i] = bs

    return  bs_sample

In [49]:
#Hypothesis testing using bootstrap replicates approach.

# Generate shifted arrays
w_shifted = w.call - np.mean(w.call) + np.mean(data.call)
b_shifted = b.call - np.mean(b.call) + np.mean(data.call)

# Compute 10,000 bootstrap replicates from shifted arrays
w_bs = bs_sample(w_shifted, np.mean, size=10000)
b_bs = bs_sample(b_shifted, np.mean, size=10000)

# Get replicates of difference of means: bs_replicates

bs_replicates_diff = w_bs - b_bs

sample_diff_means = np.mean(w.call) - np.mean(b.call)

# Compute and print p-value: p

pvalue_gender_diff = np.sum(bs_replicates_diff >= sample_diff_means) / len(bs_replicates_diff)

print('P-value bootstrap =', pvalue_gender_diff)

##Reject the null hypothesis the sample means (proportions) for black and white sounding resumes are equal.

P-value bootstrap = 0.0


In [55]:
#Calculate Mean and Length of bootstrap samples

#Mean is also rate of succeses since is a 0 or 1 entry.

b_mean_bs = np.mean(b_bs) 
w_mean_bs = np.mean(w_bs)

b_count_bs = len(b_bs)
w_count_bs = len(w_bs)


print('Bootstrap Mean Black-Sounding Resumes Callback: ',round(b_mean_bs,3))

print('Bootstrap Mean White-Sounding Resumes Callback: ',round(w_mean_bs,3))

print('Black Trials:',b_count_bs, 'White Trials:',w_count_bs)

Bootstrap Mean Black-Sounding Resumes Callback:  0.081
Bootstrap Mean White-Sounding Resumes Callback:  0.081
Black Trials: 10000 White Trials: 10000


In [53]:
# Margin of error / Standard error

print('The standard error black sample: ', round(stats.sem(w_bs),3))

print('The standard error white sample: ', round(stats.sem(b_bs),3))


# 95% confidence interval

b_interval_bs = stats.t.interval(alpha = .95, df = b_count_bs-1, loc = b_mean_bs, scale=stats.sem(b_bs))

print('Black Bootstrap Sample Confidence Interval: ', round(b_interval_bs[0],3), '-',round(b_interval_bs[1],3))

w_interval_bs = stats.t.interval(alpha = .95, df = w_count_bs-1, loc = w_mean_bs, scale=stats.sem(w_bs))

print('White Bootstrap Sample Confidence Interval: ', round(w_interval_bs[0],3), '-',round(w_interval_bs[1],3))

The standard error black sample:  0.0
The standard error white sample:  0.0
Black Bootstrap Sample Confidence Interval:  0.08 - 0.081
White Bootstrap Sample Confidence Interval:  0.08 - 0.081


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

**Question 4**: Write a story describing the statistical significance in the context or the original problem.

> * Using both a bootsrap approach and a frequentist approach (t-test) we can conclude that is highly unlikely the callback rates for the white and black sounding names resumes are the same. In both approaches, the p-value is very small.

> * The confidence intervals of the black and white samples do not overlap making it highly unlikely that the callback rate is the same for the population. 


**Question 5**: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

> * **No, it does not mean it is the most important factor.** It simply establishes that there is a difference between the proportion of callbacks between white-sounding and black-sounding resumes. 

> * To understand factors and relationships we will need to take other factors/variables that can affect the callback rates. 

> * Moreover, when determining factors and their importance we start to enter to realm of effect sizes and causality.

> * I would adjust my analysis as follows: (1) take into account more factors and see if the diffrence still persists, (2) attempt to get an estimate of the effect size between race and callbacks, (3) I would attempt to do a probit/logit regression model to estimate such effect size and take into account other factors available in the data. 

# End of Notebook