# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head()
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32
city        

In [6]:
black_entries = len(data[data['race']=='b'])
white_entries = len(data[data['race']=='w'])
total_entries = black_entries + white_entries

print(black_entries)
print(white_entries)
print(total_entries)


2435
2435
4870


In [7]:
white = sum(data[data.race=='w'].call) / len(data)
black = sum(data[data.race=='b'].call) / len(data)
w_and_b = sum(data.call) / len(data)

# Callback percentage for white sounding names
white/w_and_b

0.59948979591836737

In [8]:
# Callback percentage for black sounding names
black/w_and_b

0.40051020408163268

In [9]:
## Difference between the callback percentage for white and black sounding names
white/w_and_b - black/w_and_b

0.19897959183673469

Q1. What test is appropriate for this problem? Does CLT apply?

To use the normal model, we must meet some assumptions and conditions. The Central Limit Theorem assumes the following:

1.  Randomization Condition: The data must be sampled randomly. 
    * Independence Assumption: The sample values must be independent of each other. This means that the occurrence of one event has no influence on the next event. Usually, if we know that people or items were selected randomly we can assume that the independence assumption is met.
    
The first condition is met since the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

2. 10% Condition: When the sample is drawn without replacement (usually the case), the sample size, n, should be no more than 10% of the population.
    * Sample Size Assumption: The sample size must be sufficiently large. Although the Central Limit Theorem tells us that we can use a Normal model to think about the behavior of sample means when the sample size is large enough, it does not tell us how large that should be. If the population is very skewed, you will need a pretty large sample size to use the CLT, however if the population is unimodal and symmetric, even small samples are acceptable. So think about your sample size in terms of what you know about the population and decide whether the sample is large enough. In general a sample size of 30 is considered sufficient if the sample is unimodal (and meets the 10% condition).

The sample size for both the black (2435) and white (2435) groups are large enough. 

Both conditions are met, therefore, we can use the CLT for our analysis. The appropriate statistical test to use when comparing two Population proportions with independent samples is a z-test. 


Q2. What are the null and alternate hypotheses?

H0: There is no difference between callbacks for white and black sounding names

Ha: There is a difference between callbacks for white and black sounding names

Q3. Compute margin of error, confidence interval, and p-value.

In [10]:
# Separate resumes by race: b and w
b_data = data[data.race=='b']
w_data = data[data.race=='w']

b_callback = sum(data[data.race=='b'].call)
w_callback = sum(data[data.race=='w'].call)

# Sample proportions of callbacks per race (b or w)
p1 = b_callback/len(b_data)
p2 = w_callback/len(w_data)

print('p1 = ' + str(p1))
print('p2 = ' + str(p2))

# Number of samples in each race (b or w)
n1 = len(b_data)
n2 = len(w_data)

print('n1 = ' + str(n1))
print('n2 = ' + str(n2))

# Callback percentage for both black and white names
p = (b_callback + w_callback)/(len(b_data)+len(w_data))
print('p = ' + str(p))

p1 = 0.064476386037
p2 = 0.0965092402464
n1 = 2435
n2 = 2435
p = 0.0804928131417


In [25]:
def ztest_proportion_two_samples(x1, n1, x2, n2, one_sided=False):
    p1 = x1/n1
    p2 = x2/n2    

    p = (x1 + x2)/(n1 + n2)
    se = p*(1 - p)*((1/n1) + (1/n2))
    se = (se)**(1/2)
    
    z = (p1 - p2)/se
    p = 1 - stats.norm.cdf(abs(z))
    p *= 2 - one_sided # if not one_sided: p *= 2
    return z, p

z, p = ztest_proportion_two_samples(b_callback, len(b_data), w_callback, len(w_data), one_sided=False)
print(' z-statistic = {z} \n p-value = {p}'.format(z=z,p=p))

 z-statistic = -4.108412152434346 
 p-value = 3.983886837577444e-05


In [24]:
def compute_standard_error_prop_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2    
    se = p1*(1 - p1)/n1 + p2*(1 - p2)/n2
    return (se)**(1/2)
    
def zconf_interval_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2    
    se = compute_standard_error_prop_two_samples(x1, n1, x2, n2)
    z_critical = stats.norm.ppf(1 - 0.5*alpha)
    return p2 - p1 - z_critical*se, p2 - p1 + z_critical*se

ci_low,ci_upp = zconf_interval_two_samples(b_callback, len(b_data), w_callback, len(w_data), alpha=0.05)

print(' 95% Confidence Interval = ( {0:.2f}% , {1:.2f}%)' .format(ci_low, ci_upp))


 95% Confidence Interval = ( 0.02% , 0.05%)


In [17]:
import math

# z-critical value at 95%
z_critical = stats.norm.ppf(0.95)   

# Point estimate of proportion
p = white/w_and_b               

# Sample size
n = 4870                            

margin_of_error = z_critical * math.sqrt((p*(1-p))/n)
print('Z_critical value = ' + str(z_critical))
print('Margin of Error = ' + str(margin_of_error))

Z_critical value = 1.64485362695
Margin of Error = 0.0115494277534


In [20]:
import math

# z-critical value at 95%
z_critical = stats.norm.ppf(0.95)   

# Point estimate of proportion
p = black/w_and_b               

# Sample size
n = 4870                            

margin_of_error = z_critical * math.sqrt((p*(1-p))/n)
print('Z_critical value = ' + str(z_critical))
print('Margin of Error = ' + str(margin_of_error))

Z_critical value = 1.64485362695
Margin of Error = 0.0115494277534


Q4. Write a story describing the statistical significance in the context or the original problem.

The test result shows the test statistic "z" is equal to -4.108. This test statistic tells us how much the sample mean deviates from the null hypothesis. The p-value (0.0000398) is less than the significance level of 0.05,  therefore, we reject the null hypothesis. There is a significant difference between callbacks for 'white' and 'black' sounding names.

Callback percentage for 'white' sounding names = 0.60

Callback percentage for 'black' sounding names = 0.40


From our sample, we can conclude with 95% confidence that,
'white' sounding names get called back 58.8 - 61.2% of the time or 60.0 ± 1.2% of the time and 'black' sounding names get called back 38.8 - 41.2% of the time or 40.0 ± 1.2% of the time.

Q5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Our analysis does not tell us that the race/name is the most important factor in callback success. Our analysis tells us that resumes with Caucasian sounding names were more likely to get callbacks compared to resumes with African American sounding names. To answer the question of what is the most important factor in resume callback success, we should also measure the number of years of job related experience, highest level of education, specific skill sets, etc...
