<h1> Examining Racial Discrimination in the US Job Market </h1>


<h2>Background</h2>

Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

<h2>Data</h2>

In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


<h2> Hypothesis Testing Approach</h2>

This approach to testing this problem consists of four steps: 
 (1) State the hypotheses, 
 (2) Formulate an analysis 
 - Test Method. Use the two proportion z-test to determine if the hypothesized difference between population
   proportions differs significantly from the observed sample difference. 
   differs significantly from the observed
 (3) Analyze sample data
 
    - **Pooled sample proportion** Since the null hypothesis states that P1=P2, we use a pooled sample proportion (p) 
      to compute the standard error of the sampling distribution.

      p = (p1 * n1 + p2 * n2) / (n1 + n2)

      where p1 is the sample proportion from population 1, p2 is the sample proportion from population 2, n1 is the size 
      of sample 1, and n2 is the size of sample 2.
      <br>  <br>  
    - **Standard error** Compute the standard error (SE) of the sampling distribution difference between two proportions.
      
      SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }

      where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of sample 2.
      <br>  <br>  
    - **Test statistic** The test statistic is a z-score (z) defined by the following equation.

      z = (p1 - p2) / SE

      where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is the standard error of the 
      sampling distribution.
      <br>  <br>
      
    - **P-value** The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the
      test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with 
      the z-score.
     

 (4) Interpret results.

In [174]:
import pandas as pd
import numpy as np
from scipy import stats
import math

In [170]:
dataset = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [34]:
dataset.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [35]:
# number of rows in dataset
len(dataset)

4870

<h2> 1. What test is appropriate for this problem? Does CLT apply? </h2>

The appropriate test for this problem is the **two-proportion z-test**

The reason the **two-proportion z-test** is appropriate is because we need to determine if the difference between the two proportions (black and white candidates) is significant. 

However to satisfy the requirement for the **two-proportion z-test**, the following conditions must be met.

- The sampling method for each population is simple random sampling.
- The samples are independent.
- Each sample includes at least 10 successes and 10 failures.
- Each population is at least 20 times as big as its sample.



<h2> Does CLT apply? </h2>

Certain conditions must be met to use the CLT.

● **The samples must be independent**

According to the problem statement, researchers "randomly assign[ed] identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers. 

● **The sample size must be “big enough”**

CLT requires that the “10% Rule” apply (i.e. the sample size must not be bigger than 10% of the entire population). The size of both the black and white samples is 2435. The assumption is our sample size for both black and white candidates is less than 10% of the overall populations of black and white candidates in the US job market. 
 
It appears for this problem the CLT conditions have been met. 

In [78]:
# number of callbacks for black-sounding names
total_callblk_blk = sum(dataset[dataset.race=='b'].call)
# total number of blacks
blk_samp_sz = len(data[data.race=='b'])

blk_cb_rate = total_callblk_blk / blk_samp_sz
blk_cb_rate

0.064476386036960986

In [81]:
# number of callbacks for white-sounding names
total_callblk_wht = sum(dataset[dataset.race=='w'].call)
# total number of whites
wht_samp_sz = len(data[data.race=='w'])

wht_cb_rate = total_callblk_wht / wht_samp_sz
wht_cb_rate

0.096509240246406572

With our unpaired tests we test if the null hypothesis of the two responses which are measured in the same unit have a difference with a mean value of zero. From our results we can conclude that the rate of callbacks for resumes between black and white candidates are indeed different (p <  0.01). Hence, the two population variances are not equal.

<h2>2. What are the null and alternate hypotheses?</h2>
With our unpaired tests the hypothesis is as follows;
- the **null hypothesis (H0)** is that there is no difference in the means of callbacks of black and white candidates any any difference is purely to chance.
- the **alternate hypothesis (H1)** is that any difference in the means of callbacks of black and white candidates is due to influenced by some non-random cause.


<h3>Ho: callbk_blks - callbk_wht = 0</h3>  

(there is no significant difference between the rate of callbacks for black and white candidates and if any it's purely to chance).

<h3>H1: callbk_blk - callbk_wht !=0</h3>  

(there is a material difference in the rate of callbacks between black and white candidates which is due to influence by some non-random cause).

<h2>3. Compute margin of error, confidence interval, and p-value</h2>

In [194]:
# Calculate the standard deviation  
std_dev = math.sqrt((wht_cb_rate * (1-wht_cb_rate) / total_wht) + (blk_cb_rate * (1-blk_cb_rate)/total_blk))

# margin of error (z_value * std_dev)
margin_err = z_val * std_dev

# critical value is 1.96 for 95% confidence 
# z_critical = stats.norm.ppf(q = 0.975)

# z value for 95% confidence  
z_val = 1.96 


#Pooled sample proportion
#p = (p1 * n1 + p2 * n2) / (n1 + n2)
p = (wht_cb_rate * wht_samp_sz + blk_cb_rate * blk_samp_sz) / (wht_samp_sz + blk_samp_sz)

# where p1 is the sample proportion in sample 1, p2  is the sample proportion in sample 2

# Standard error Compute the standard error (SE) of the sampling distribution difference between two proportions.
# SE = sqrt( p ( 1 - p ) [ (1/n1) + (1/n2) ] )
SE = math.sqrt(p*(1 - p) * (1/wht_samp_sz) + (1/blk_samp_sz))

# where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of sample 2. 

# test statistic The test statistic is a z-score (z) defined by the following equation.
# z = (p1 - p2) / SE
z = (wht_cb_rate - blk_cb_rate)/SE       
       
# where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is the standard error of the sampling distribution. 

# P-value The P-value is the probability of observing a sample statistic as extreme as the test statistic. 
# Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with the z-score.
          
# difference between sample proportion (is the proportion of individuals in the sample that had the
# characteristic of interest, i.e callbacks)
p_hat = wht_cb_rate - blk_cb_rate 

# sample size (both black & white are the sample sizes)
n = wht_samp_sz 

# Confidence interval for population proportion

ci_lower = p_hat - margin_err
ci_upper = p_hat + margin_err

print('Pooled Proportion:',p)
print('Standard Error:',SE)
print('Test Statisic: (z)',z)
print('Margin of Error:',margin_err)
print('Confidence Interval:',ci_lower, "<->", ci_upper)

Pooled Proportion: 0.0804928131417
Standard Error: 0.02100174746004366
Test Statisic: (z) 1.52524709053
Margin of Error: 0.015255406349886438
Confidence Interval: 0.0167774478596 <-> 0.0472882605593


<h2>4. Write a story describing the statistical significance in the context or the original problem.</h2>

Since we have a two-tailed test, the P-value is the probability that the z-score is less than -1.53 or greater than 1.53.

We use the normal distribution tables to find P(z < -1.53) = 0.063, and P(z > 1.53) = 0.063.  Thus, the P-value = 
0.063 + 0.063 = 0.126.

Interpret results. Since the P-value (0.126) is greater than the significance level (0.05), we cannot accept the null hypothesis.

<h2>5. Does your analysis mean that race/name is the most important factor in callback success? 
Why or why not? If not, how would you amend your analysis?</h2>