# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [6]:
data.head().transpose()

Unnamed: 0,0,1,2,3,4
id,b,b,b,b,b
ad,1,1,1,1,1
education,4,3,4,3,3
ofjobs,2,3,1,4,3
yearsexp,6,6,6,6,22
honors,0,0,0,0,0
volunteer,0,1,0,1,0
military,0,1,0,0,0
empholes,1,0,0,1,0
occupspecific,17,316,19,313,313


In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32
city        

In [31]:
# number of callbacks for black-sounding names
#sum(data[data.race=='b'].call)
data[data.race=='b'].call

2       0.0
3       0.0
7       0.0
8       0.0
9       0.0
10      0.0
12      0.0
14      0.0
17      0.0
19      0.0
20      0.0
25      0.0
26      0.0
28      0.0
29      0.0
31      0.0
33      0.0
36      0.0
37      0.0
38      0.0
41      0.0
44      0.0
45      0.0
47      0.0
49      0.0
50      0.0
53      0.0
55      0.0
57      0.0
59      0.0
       ... 
4808    0.0
4812    0.0
4815    0.0
4818    0.0
4820    0.0
4821    0.0
4823    0.0
4825    0.0
4827    0.0
4828    0.0
4832    0.0
4833    0.0
4835    0.0
4837    0.0
4840    0.0
4841    0.0
4842    0.0
4844    0.0
4848    1.0
4849    0.0
4850    0.0
4853    0.0
4856    0.0
4857    0.0
4858    0.0
4859    1.0
4864    0.0
4865    0.0
4866    0.0
4868    0.0
Name: call, dtype: float32

In [13]:
# number of people with black-sounding names
len(data[data.race=='b'])

2435

In [23]:
# ratio of callbacks for black-sounding names
ratio_b = sum(data[data.race=='b'].call)/len(data[data.race=='b'])
print('The ratio of callbacks for black-sounding names is ' + str(ratio_b))

The ratio of callbacks for black-sounding names is 0.064476386037


In [15]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [14]:
# number of people with white-sounding names
len(data[data.race=='w'])

2435

In [22]:
# ratio of callbacks for white-sounding names
ratio_w = sum(data[data.race=='w'].call)/len(data[data.race=='w'])
print('The ratio of callbacks for white-sounding names is ' + str(ratio_w))

The ratio of callbacks for white-sounding names is 0.0965092402464


In [21]:
df = pd.DataFrame(data.race,index=np.arange(len(data.race)))
df['call']=data.call
df=pd.crosstab(df.race,df.call)
df

call,0.0,1.0
race,Unnamed: 1_level_1,Unnamed: 2_level_1
b,2278,157
w,2200,235


There are 2435 people with white-sounding names and 2435 people with black-sounding names. 
There are 235 people with white sounding names to get a callback and 157 people with black-sounding names to get a callback.

The ratio of callbacks for white-sounding names is 9.7% and for black-sounding names is 6.4%.

Now to find out if this is significant.

#### 1. What test is appropriate for this problem? Does CLT apply?

The z score test for two population proportions is used when you want to know whether two populations or groups (e.g., males and females; theists and atheists) differ significantly on some single (categorical) characteristic - for example, whether they are vegetarians (i.e. Difference of two proportions).

Requirements:
* A random sample of each of the population groups to be compared.
* Categorial data

In this example, we have two population proportions (people with white-sounding names called/not called and people with black-sounding names called/not called). This meets both requirements listed above.

#### 2. What are the null and alternate hypotheses?

The null hypothesis is that the callback rate for people with black-sounding names is the same as the rate for people with white-sounding names. The alternate hypothesis is that these are not equal.

#### 3. Compute margin of error, confidence interval, and p-value.

In [43]:
SE = np.sqrt((ratio_w*(1 - ratio_w)/(len(data[data.race=='w']))) + (ratio_b*(1 - ratio_b) /(len(data[data.race=='b'])))) # standard error
SE

z=1.96 #95% confidence Interval from z-table

margin=abs(SE*z)

print("The proportion of calls received for white-sounding names for their CV's are in between %F and %F" % (ratio_w - ratio_b - margin,ratio_w - ratio_b + margin))

The proportion of calls received for white-sounding names for their CV's are in between 0.016777 and 0.047288


In [42]:
import statsmodels.stats.weightstats as sm
z_score = sm.ztest(data[data.race=='w'].call,data[data.race=='b'].call)
print('The p-value is ' + str(z_score[1]) + ' and the z-score is ' + str(z_score[0]))

The p-value is 3.87674291161e-05 and the z-score is 4.11470535675


#### 4. Write a story describing the statistical significance in the context or the original problem.

With a p-value well under .05, with a 95% confidence we can reject the null hypothesis and state that there is a significant difference between the callback rate for people with white-sounding names and those with black-sounding names.

#### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

This may be true, but I don't think we can make that statement. Our test was only set up to determine if there is a significant difference between the two callback rates, not to determine if it was the most important factor in callback success. It's entirely possible that in this dataset, people who did not get a callback did not have one or more skills that was better represented by the people who did get a callback. 