# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [11]:
import pandas as pd
import numpy as np
from scipy import stats

In [12]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [85]:
# checking the type of data
type(data)

pandas.core.frame.DataFrame

In [13]:
# number of callbacks for white-sounding names   
sum(data[data.race=='w'].call)

235.0

In [14]:
# number of callbacks for black-sounding names   
sum(data[data.race=='b'].call)

157.0

**Note**: The number of callback for black sounding names are less than the nmber of call back for white sounding applicants. 

In [15]:
# Number of white and black name sounding applicants are the same. 
len(data[data.race=='w'])
len(data[data.race=='b'])

2435

In [16]:
# Looking at top five rows of the data
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [17]:
# Looking at the total observations. 
len(data)

4870

In [27]:
# Opening the file with pandas as a dataframe. 
df = pd.DataFrame(data)

In [28]:
# Looking at the shape of the data. 
df.shape

(4870, 65)

In [29]:
# Looking at the column names. 
df.columns

Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout', 'fraccolp', 'linc', 'col', 'expminreq',
       'schoolreq', 'eoe', 'parent_sales', 'parent_emp', 'branch_sales',
       'branch_emp', 'fed', 'fracblack_empzip', 'fracwhite_empzip',
       'lmedhhinc_empzip', 'fracdropout_empzip', 'fraccolp_empzip',
       'linc_empzip', 'manager', 'supervisor', 'secretary', 'offsupport',
       'salesrep', 'retailsales', 'req', 'expreq', 'comreq', 'educreq',
       'compreq', 'orgreq', 'manuf', 'transcom', 'bankreal', 'trade',
       'busservice', 'othservice', 'missind', 'ownership'],
      dtype='object')

### Qs 1: What test is appropriate for this problem? Does CLT apply?

A two sample z-test is appropriate for this problem as there are two groups to test against each other (black and white sounding names). There are over 30 entries for each group so CLT applies. 

### Qs 2 What are the null and alternate hypotheses?

**Hypothesis:**

Null Hypothesis,             Ho: No difference in the proportion of call back for white sounding names vs. black  
                                sounding names. 

Alternative Hypothesis,      Ha: There is a difference. 

Significance,                  α = 0.05

### Qs 3:  Computing margin of error, confidence interval, and p-value using Bootstrap and frequentist approaches. 

### Frequentist Approach

In [109]:
# Creating data sets for black and white calling entries. 
b_calls =data[data.race=='b'].call
w_calls =data[data.race=='w'].call

# Calculating mean proportion of White call up & Black call up and their difference.  
print ('Sample mean for white call backs rate = ' + str(w_calls.mean())), 
print ('Sample meand for black call backs rate = ' + str(b_calls.mean())) 
empirical_diff_means = w_calls.mean()-b_calls.mean()
print('The difference of means  = ' + str(empirical_diff_means))

Sample mean for white call backs rate = 0.09650924056768417
Sample meand for black call backs rate = 0.0644763857126236
The difference of means  = 0.03203285485506058


In [90]:
# Standard deviation of both samples 
print ('SD for callbacks for WHITE sounding names = ' + str(w_calls.std())) 
print ('SD for callbacks for BLACK sounding names = ' + str(b_calls.std()))


SD for callbacks for WHITE sounding names = 0.2953455150127411
SD for callbacks for BLACK sounding names = 0.24564945697784424


Margin of Error = t(critical) * Standard Error

Standard Error = sqrt (variance of white recall / n) + variance of black recal/ n)  (https://onlinecourses.science.psu.edu/stat414/node/209/)

Degrees of freedom (n-1) = **??** (To read about it). 

The t(critical) at 95% confidence = 2STDs = 1.960

In [38]:
# Calculating Margin of ME. 
import math
standard_error = math.sqrt(w_calls.var()/len(w_calls) + b_calls.var()/len(b_calls)) # check formula
print('Margin of error = ' + str(1.96*standard_error))

Margin of error = 0.015258417380692


In [62]:
# Confidence interval
from scipy import stats as stat
ci_95 = stat.norm.interval(0.95, loc=prop_diff, scale=standard_error)
print('The 95% confidence interval for the difference in callback rates is ' + str(ci_95))

The 95% confidence interval for the difference in callback rates is (0.016774717851368581, 0.047290991858752573)


In [44]:
# Two sample t test. 
t_score, p_value_t = stats.ttest_ind(w_calls,b_calls)
print('t_score : ' + str(t_score) + '    p_value : ' + str(p_value_t) )

t_score : 4.11470529086    p_value : 3.94080210313e-05


In [67]:
# Two sample Z test
P_population = 0
Z = (prop_diff - P_population)/standard_error
print('z-stat:',Z)

z_stat = (prop_diff - 0) / standard_error
pval = stat.norm.sf(abs(z_stat))*2
print('The p-value for the difference in rates was ' + str(pval))

z-stat: 4.1147383735462695
The p-value for the difference in rates was 3.87618807011e-05


**Conclusion from frequentist  t and z tests:**   The p values are low with both t and Z test. We can discard the null hypothesis. The call back rates do not appear to be similar between white and black call back names. 

## Bootstrap method to generate margin of error, confidence interval, and p-value

In [23]:
# Bootstrapping Hypothesis 
def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))

def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)

    return bs_replicates

def callback_p(data):
    return np.sum(data)/len(data)

In [78]:
#Generate bootstrap replicates of white-sounding calls
bs_reps_b =draw_bs_reps(b_calls,np.mean, size=10000)

#Generate bootstrap replicates of black-sounding calls
bs_reps_w =draw_bs_reps(w_calls,np.mean, size=10000)

In [79]:
# means of bootstrap replicates. 
np.mean(bs_reps_w), np.mean(bs_reps_b)

(0.096398028703778976, 0.064439465999230741)

In [80]:
#standard deviation of both samples 
bs_reps_b.std(),bs_reps_w.std()

(0.0050226430315611428, 0.0059864017251575891)

In [81]:
standard_error = math.sqrt(bs_reps_w.var()/len(bs_reps_w) + bs_reps_b.var()/len(bs_reps_b)) # check formula
print('Margin of error = ' + str(1.96*standard_error))

Margin of error = 0.00015316111291240484


In [84]:
# Confidence interval
prop_diff = bs_reps_w-bs_reps_b   # problem with the formula discuss it it Kenneth. 
conf_int = np.percentile(prop_diff, [2.5, 97.5])  # add the bs_replicate data
print('95% confidence interval =', conf_int)

95% confidence interval = [ 0.01683778  0.04722793]


In [144]:
# Calculating the p value and deducing the significance of the test. 
alpha = 0.05
p_value = np.sum(prop_diff >= empirical_diff_means)/len(prop_diff)
if p_value <= alpha:
    print('We can reject the null hypothesis at alpha=0.05 for p-value = ' + str(p_value))
else:
    print('We cannot reject the null hypothesis at alpha=0.05 for p-value = ' + str(p_value))

We cannot reject the null hypothesis at alpha=0.05 for p-value = 0.499


Method 2: Permutation Test to determine if the callbacks are different. (codes from Datacamp, Stats II)

In [119]:
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1)-np.mean(data_2)

    return diff

In [133]:
b_calls =data[data.race=='b'].call
w_calls =data[data.race=='w'].call

empirical_diff_means = diff_of_means(w_calls, b_calls)

In [134]:
# Draw 10,000 permutation replicates: perm_replicates
perm_replicates = draw_perm_reps(b_calls, w_calls, diff_of_means, size=10000)

In [135]:
# Compute p-value: p
p = np.sum(perm_replicates >= empirical_diff_means) / len(perm_replicates)
# Print the result
print('p-value =', p)

p-value = 0.0001


Method 3: A bootstrap test for identical distributions. 
Source: https://campus.datacamp.com/courses/statistical-thinking-in-python-part-2/introduction-to-hypothesis-testing?ex=12

In [140]:

# Concatenate forces: forces_concat
total_concat = np.concatenate((w_calls, b_calls))

In [141]:
# Initialize bootstrap replicates: bs_replicates
bs_replicates = np.empty(10000)

for i in range(10000):
    # Generate bootstrap sample
    bs_sample = np.random.choice(total_concat, size=len(total_concat))
    
    # Compute replicate
    bs_replicates[i] = diff_of_means(bs_sample[:len(b_calls)],
                                     bs_sample[len(b_calls):])

In [142]:
# Compute difference of mean impact force from experiment: empirical_diff_means
empirical_diff_means = diff_of_means(w_calls, b_calls)

In [143]:
# Compute and print p-value: p
p = np.sum(bs_replicates >= empirical_diff_means) / len(bs_replicates)
print('p-value =', p)

p-value = 0.0001


**Conclusion from Bootstrap testing:** Since the p value are low with both methods, we can reject the null hypothesis that there is no difference between white and black call back rates. 

### Q4 What is the analysis results ? 

Frequentist, bootstrapping and permutation methods suggest that the difference in the callback between balck and white is significant. 

### Q5: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

There is a strong indication that race plays an important role in determining if one can find a job in US markets.

However it does not mean that race is the most important factor for call backs. We will have to perfom multivariate analysis to determine which factors are important and the most important factor which determines the call back success. 