# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [15]:
import pandas as pd
import numpy as np
from scipy import stats

In [16]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [17]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [18]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## 1) What test is appropriate for this problem? Does CLT apply?

Following the argument that racial discrimination is pervasive and therefore has a negative impact on employment, a one-tailed, two-sample test between the 'callback' means of black and white individuals is appropriate.

With sample sizes this large for both groups (2435), we can expect the sampling distribution of the difference betweeen the sample means to be nomally distributed under CLT.

In [19]:
#examine data; split into black and white groups; generate statistics for each group
data.race.value_counts()

w    2435
b    2435
Name: race, dtype: int64

In [20]:
data.call.value_counts()

0.0    4478
1.0     392
Name: call, dtype: int64

In [21]:
df_b = data[data.race == 'b']['call']

In [22]:
df_b.value_counts()

0.0    2278
1.0     157
Name: call, dtype: int64

In [23]:
df_b.describe()

count    2435.000000
mean        0.064476
std         0.245649
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: call, dtype: float64

In [24]:
df_w = data[data.race == 'w']['call']

In [25]:
df_w.value_counts()

0.0    2200
1.0     235
Name: call, dtype: int64

In [26]:
df_w.describe()

count    2435.000000
mean        0.096509
std         0.295346
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: call, dtype: float64

In terms of size of effect alone, we can see from the above that the percentage rate of callback is greater in the white group than the black group by roughly a half, 9.7% to 6.4% respectively.

## 2) What are the null and alternate hypotheses?

Null Hypothesis: black pop mean - white pop mean = 0 (being black has no effect on callback)

Alternative Hypothesis: black pop mean - white pop mean < 0 (being black has a negative effect on callback)

I will use the traditional benchmark of testing significance at the 5% level.

## 3) Compute margin of error, confidence interval, and p-value.

Margin of Error = 0.00778 * 1.96 = 0.0152

Difference between sample means = -0.032

Confidence Interval at 95% = (-0.0472, -0.0168)

t statistic = (observed difference - hypothesized difference) / Standard Error
        
        = -0.032 / 0.00778
        
        = -4.113 (well in excess of 1-tailed critical value of -1.65 at 5% significance level)

pvalue too small to appear on t table.

In [27]:
#confirm results using built-in scipy function
stats.ttest_ind(df_b, df_w)

Ttest_indResult(statistic=-4.1147052908617514, pvalue=3.9408021031288859e-05)

## 4) Write a story describing the statistical significance in the context or the original problem.

The computed t-score means that we can definitely reject the Null Hypothesis that there is no difference between the means for white and black and we can accept the Alternative Hypothesis that there is a difference that has a negative effect on probability of call for the black group.

The extremely small p-value also shows that it is not at all likely that the sample results we have can have come from an overall population where there is no difference between the means for black and white.

In addition, our 95% confidence level only gives results in the negative (the percent less of the black group getting callback), further reinforcing the statistical likelihood of there being a negative effect at work.

Based on this, and on this alone, we can be confident that race has a significant impact on the rate of callback for resumes.

## 5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

In [28]:
data.columns

Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout', 'fraccolp', 'linc', 'col', 'expminreq',
       'schoolreq', 'eoe', 'parent_sales', 'parent_emp', 'branch_sales',
       'branch_emp', 'fed', 'fracblack_empzip', 'fracwhite_empzip',
       'lmedhhinc_empzip', 'fracdropout_empzip', 'fraccolp_empzip',
       'linc_empzip', 'manager', 'supervisor', 'secretary', 'offsupport',
       'salesrep', 'retailsales', 'req', 'expreq', 'comreq', 'educreq',
       'compreq', 'orgreq', 'manuf', 'transcom', 'bankreal', 'trade',
       'busservice', 'othservice', 'missind', 'ownership'],
      dtype='object')

The above list shows that there are many other factors potentially involved in the decision to callback than just race alone. We have not considered any of these in the above analysis. It could be that some of these other factors are correlated with race and so are seen as contributing negatively too.

Not only would we need to consider the features associated with the applying individual (race, education, etc), but also the features associated with a particular employer that may also impact likelihood of callback based on race (e.g. employer is located in predominantly black neighbourhood).

To see which factors play the most important part in callback success, I would amend my analysis by examining the impact of other likely factors and also the realtionship between factors. This could be performed more comprehensively using other techniques such as regression analysis where we could compare the coefficients for the features.