# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
sum(data[data.race=='b'].call)

157.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
w = data[data.race=='w']
b = data[data.race=='b']

The Central Limit Theorem applies to this problem. As sample sizes for both groups are greater than 30 and the observations are independent of each other. In other words, no observation affects any other observation in our given dataset.

The races are qualitative in nature; they are either black or white. Similarly, the callbacks are also qualitative in nature; they are either a success or a failure. Therefore, we are trying to check for correlation between two qualitative features.We will be applying the **Chi Square Test for Independece.**


In [7]:
w.shape, b.shape

((2435, 65), (2435, 65))

In [8]:
callback_df = data.pivot_table('id', 'call', 'race', aggfunc='count', margins=True)
callback_df.index = ['Failure', 'Success', 'Total']
callback_df.columns = ['Black', 'White', 'Total']

callback_df

Unnamed: 0,Black,White,Total
Failure,2278,2200,4478
Success,157,235,392
Total,2435,2435,4870


## Null and Alternate Hypothesis

To do hypothesis testing, we define the following:

* **Null Hypothesis:** There is no relation between race and callbacks
* **Alternate Hypothesis:** There is a relationship between race and callbacks

We are going to assume that the null hypothesis is true. Also, the significance level $\alpha$ is assumed to be 10% or 0.1.


In [9]:
total = callback_df.loc['Total', 'Total']
group_total = callback_df.loc['Total', 'Black']

In [10]:
#Likelihood of getting and not getting a callback for the entire sample
p1 = callback_df.loc['Success', 'Total']/total
p0 = callback_df.loc['Failure', 'Total']/total

success_array = np.array(callback_df.loc['Success', ['Black', 'White']])
failure_array = np.array(callback_df.loc['Failure', ['Black', 'White']])
actual_array = np.array([success_array, failure_array])
actual_array

array([[ 157,  235],
       [2278, 2200]], dtype=int64)

Let p0 denote the probability of a failed callback and p1 denote the probability of a successful callback. The first step of our testing is to calculate these quantities.

In [11]:
chi2_stat, p_val, dof, ex = stats.chi2_contingency(actual_array)
print("===Chi2 Stat===")
print(chi2_stat)
print("\n")
print("===Degrees of Freedom===")
print(dof)
print("\n")
print("===P-Value===")
print(p_val)
print("\n")
print("===Contingency Table===")
print(ex)

===Chi2 Stat===
16.44902858418937


===Degrees of Freedom===
1


===P-Value===
4.997578389963255e-05


===Contingency Table===
[[ 196.  196.]
 [2239. 2239.]]


The p-value obtained is extremely small as compared to the threshold $\alpha$ of 0.1. This implies that we have to reject the null hypothesis. In other words, **there is a clear correlation between the race of a candidate and the success in getting a callback based on resume.**

Since the significance level, $\alpha$ is 10%, this directly implies that the confidence level is 90%.

### Hypothesis Testing

The following hypotheses are defined:

* **Null Hypothesis**: There is no difference in mean of callbacks for blacks and whites.
* **Alternate Hypothesis**: There is a difference in mean of callbacks for blacks and whites

Assuming $\alpha$ to be 0.1 and the null hypothesis to be true.

In [12]:
white_mean = callback_df.loc['Success', 'White']/callback_df.loc['Total', 'White']
white_std = callback_df.loc['Success', 'White'] * ((1 - white_mean) ** 2) + callback_df.loc['Failure', 'White'] * ((0 - white_mean) ** 2)
white_std = np.sqrt(white_std/group_total)

white_mean, white_std

(0.09650924024640657, 0.29528834517039093)

In [13]:
black_mean = callback_df.loc['Success', 'Black']/callback_df.loc['Total', 'Black']
black_std = callback_df.loc['Success', 'Black'] * ((1 - black_mean) ** 2) + callback_df.loc['Failure', 'Black'] * ((0 - black_mean) ** 2)
black_std = np.sqrt(black_std/group_total)

black_mean, black_std

(0.06447638603696099, 0.24559963697158382)

In [14]:
mean_diff = white_mean - black_mean

In [15]:
h0_mean = 0

mean_diff = white_mean - black_mean

p_hat = (callback_df.loc['Success', 'Black'] + callback_df.loc['Success', 'White'])/(callback_df.loc['Total', 'Black'] + callback_df.loc['Total', 'White']) 
var_diff = p_hat * (1- p_hat) * (1/callback_df.loc['Total', 'Black'] + 1/callback_df.loc['Total', 'White'])
sigma_diff = np.sqrt(var_diff)

mean_diff, sigma_diff

(0.032032854209445585, 0.007796894036170457)

In [16]:
z = (mean_diff - h0_mean) / sigma_diff
z

4.108412152434346

In [17]:
p = (1-stats.norm.cdf(z))*2
p

3.983886837577444e-05

In [18]:
error = 1.96 * sigma_diff
error

0.015281912310894095

In [19]:
range_upper = mean_diff + error
range_lower = mean_diff - error
confidence_interval = range_lower, range_upper

confidence_interval

(0.01675094189855149, 0.04731476652033968)

The p-value obtained is much lower than the significance level and almost identical to the p-value obtained in the chi square significance test. Therefore, we can safely reject the null hypothesis. This test further strengthens our initial claim of a correlation existing between race and callbacks.

## Conclusion and Final Remarks

1. There is a correlation between the race and the callback success of a particular person. 
2. However, we cannot conclude that race is the most important factor for callback success. Other parameters such as education and work experience may also have a role to play.