#### The following is a mini-project submission for Springboard's Data Science Career Track. Work completed by H. Passmore.
#### Project Summary: In this notebook I address statistical questions using a data set of 4870 independent samples. Using exploratory data analysis techniques I address questions of appropriate statistical tests,  confidence intervals, and comparison of resume success depending on assumed race of applicants.

# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

### Start by importing modules and reading in the job market discrimination data file

In [2]:
# import modules
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm

# Build figures inline, set visualization style
%matplotlib inline
sns.set()

In [3]:
# read in the data file
data = pd.io.stata.read_stata('us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [5]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [6]:
# check out the dataframe
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## Q1: What test is appropriate for this problem? Does CLT apply?

__Appropriate Statistical Test__: The question of whether racial discrimination occurred during the call-back phase based on the applicants name and assumed race can be addressed by comparing the call-back:total call ratios for the two groups. __A two-sample z-test to compare sample proportions is appropriate when you have two proportions__ (e.g., calls/total for race = b and calls/total for race = w) to compare. In this case we would do a two-tailed z-test.

__Central Limit Theorem: The Central Limit Theorem applies because the distribution of the test statistic is assumed to approximate a normal distribution.__ We will employ this assumption in order to calculate the critical value of the test.

## Q2: What are the null and alternate hypotheses?

__Null Hypothesis__: Proportions of resumes getting call-backs are THE SAME for black-sounding and white-sounding names.
    
__Alternative__: Proportions of call backs for black-sounding and white-sounding applicants are NOT EQUAL.

## Q3: Compute margin of error, confidence interval, and p-value.

In [7]:
# begin with a contingency table of 'race' and 'call'
table = pd.crosstab(data.race, data.call, margins=True)
table.columns = ['No','Yes','Total']
table.index = ['Black','White','Total']
table

Unnamed: 0,No,Yes,Total
Black,2278,157,2435
White,2200,235,2435
Total,4478,392,4870


In [14]:
# calculate z-test
from statsmodels.stats.proportion import proportions_ztest
counts = np.array([235, 157])
nobs = np.array([2435,2435])
stat, pval = proportions_ztest(counts, nobs, alternative='two-sided', prop_var=False)
print('Test statistic for z-test:', '{0:0.3f}'.format(stat))
print('p-value:', '{0:0.3f}'.format(pval))

Test statistic for z-test: 4.108
p-value: 0.000


In [11]:
# name the z-critical value for 95%
print('The critical value for 95% confidence is 1.96')
# estimate sem: 
p_b = (157/2435) # proportion black-sounding names were called back
p_w = (235/2435) # proportion white-souding names were called back
prop_diff = p_w - p_b # difference in proportions
print('The difference in proportions is', '{0:0.3f}'.format(prop_diff))
sem = np.sqrt(((p_b*(1-p_b))/2435)+((p_w*(1-p_w))/2435))
# compute margin of error for body temperature data: z-critical * sem
moe = 1.96 * sem
print('The margin of error is', '{0:0.3f}'.format(moe))
# calculate the 95% CI using the margin of error
moe_low, moe_high = np.array([prop_diff - moe, prop_diff + moe])
print('Lower margin of error', '{0:0.3f}'.format(moe_low))
print('Upper margin of error', '{0:0.3f}'.format(moe_high))

The critical value for 95% confidence is 1.96
The difference in proportions is 0.032
The margin of error is 0.015
Lower margin of error 0.017
Upper margin of error 0.047


## Q4: Write a story describing the statistical significance in the context or the original problem.

Hiring managers have a big influence on the fates of job applicants. Unconcious biases held by individuals who process job applications may have long-term effects on who gets hired and thus employee diversity (or lack thereof). In a study of 4870 resumes sent to companies, we confirmed racial bias against applications with African American-sounding names. In our study, where 'black-sounding' and 'white-sounding names were randomly assigned to identical resumes, the proportion of 'black' resumes receiving follow-up calls from hiring managers (0.06) was significantly lower than the proportion of 'white' resumes (0.1) receiving calls (z = 4.108, p = 0.00). We recommend hiring managers incorporate strict practices to prevent biased hiring such as removing names, and even ages from visible resume fields during application reviews.

## Q5: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

* __Is race/name the most important factor for call-backs?__: There are many other variables in the dataset that we have not explored yet. There could be other factors that are more influential. There could be interactive effects (e.g., race+gender may be have a bigger influence than each alone).
* __Amend analysis__: This study was designed specifically to test for effects of racial bias on resume success. It is important not to try to find other, unrelated effects hidden in the data that were not originally addressed by the study design. However, there could be combined factors that further elucidate concious and unconcious bias of hiring managers. For example the difference in proportions in sub-groups of applicants could be even more different than for the whole pool of applicants (e.g., women, certain types of positions, entry level positions, etc.).

### Alternative approach: Intead of comparing ratios use $\chi^2$ test of independence to see if observed counts are differnt than expected.

#### The question of weather racial discrimination occurred during the call-back phase of job application processing based on the applicants name and assumed race is can also be addressed by testing whether the frequency distribution of call-backs is different than the expected distribution with no bias. _The Chi-squared test is appropriate when you have count data (observed frequencies) of a categorical event to compare to expected frequencies under the null hypothesis._ A simple example of a Chi-square test is to determine if a coin is 'fair' by flipping it many times and recording the outcomes, and then comparing the frequency of heads to that expected from a fair coin.

__Interpretation:__ Based on the chi-square test we reject our null hypothesis. The observed proportion of call-backs is not equal for the assumed racial categories. The chi-square test statistic is large and the p-value is small. The critical value for this test is much lower than the critical value. There is no margin of error for chi-squared distributions because the distribution is not symmetric.

In [12]:
# calculate chi-square statistic and p-value
# make a version of the contingency table without margin totals
table2 = pd.crosstab(data.race, data.call)

# calculate chi-square statistic, p-value,degrees of freedom, and expected frequencies from the observed data
chi2, p, dof, ex = stats.chi2_contingency(table2, correction=True)
print('Test statistic:', chi2)
print('p-value:', p)
print('Degrees of Freedom:', dof)
print('Expected frequencies:', ex)

Test statistic: 16.4490285842
p-value: 4.99757838996e-05
Degrees of Freedom: 1
Expected frequencies: [[ 2239.   196.]
 [ 2239.   196.]]


In [13]:
# find the chi-square critical value for 95% confidence
crit = stats.chi2.ppf(q =0.95, df = dof)
print('Critical value:', crit)

Critical value: 3.84145882069


The results of the $\chi^2$ test also support rejecting the null hypothesis. The observed proportion of call-backs is not equal for the two groups. Specifically, the black-sounding names on resumes resulted in a lower number of call-backs than white sounding names, and the difference is statistically significant.