# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32
city        

<b>Question 1:</b> What test is appropriate for this problem? Does CLT apply?

In this case, we are interested in comparing two proportions: proportion of callbacks for people with black-sounding names and proportion of callbacks for people with white-sounding names. Our null hypothesis is that the two proportions are equal. The good news is that our available sample is actually pretty large, 4870 people. Hence, we should be able to use the two-proportion z-test. However, for us to be able to assume that the sampling distribution of the sample proportion is approximately normal (based on the CLT theorem), which is one of the requirements for the z-test, we need to assure to the best of our ability that the data was sampled randomly (i.e. resumes were sent out randomly) and that each observation was sampled independently of the other observations. The setup of the experiment seems to suggest that it is true, although I am a bit sceptical of cases where several resumes were sent for the same job opening to test the impact of lower-quality vs. higher-quality applicants regardless of race (to me this implies that some observations were not that independent after all). As long as we suppose that those conditions are true, however, and given that we have a pretty large sample, we should be able to assume that the sampling distribution of the two proportions is normal and, consequently, we can safely use the z-statistic to test our hypothesis.  

<b>Question 2:</b> What are the null and alternate hypotheses?

As mentioned above, our null hypothesis is that the two proportions are equal, i.e. that the proportion of callbacks for people with black-sounding names is the same as the proportion of callbacks for people with white-sounding names. The alternate hypothesis will be that the proportions are different.

Let's assume that $p_b$ is the proportion of callback for people with black-sounding names and $p_w$ is a similar proportion for white-sounding names. Then,<br></br>
$H_o$: $p_b$ = $p_w$ <br></br>
$H_1$: $p_b$ $\neq$ $p_w$

<b>Question 3:</b> Compute margin of error, confidence interval, and p-value.

Let's create a contingency table for our data set first. It will come in handy in question 5 as well. From this contingency table we can derive our probabilities, $p_b$ and $p_w$

In [87]:
#number of people with black-sounding names that got the callback
call_b = (sum(data[data.race=='b'].call)).astype(int)
             
#number of people with white-sounding names that got the callback
call_w = (sum(data[data.race=='w'].call)).astype(int)
             
#number of people with black-sounding names that didn't get the callback   
             
n_b = data[data.race == 'b'].race.count()
nocall_b =  n_b - call_b

#number of people with white-sounding names that didn't get the callback   
             
n_w = data[data.race == 'w'].race.count()
nocall_w = n_w - call_w
             

In [77]:
contingency_table = pd.DataFrame([[call_b, call_w], [nocall_b, nocall_w]], index = ['callback', 'no callback'], columns = ['black', 'white'])

In [78]:
contingency_table

Unnamed: 0,black,white
callback,157,235
no callback,2278,2200


In [88]:
#proportion of callback for black-sounding names:

p_b = call_b/n_b
p_w = call_w/n_w

In [89]:
p_b

0.064476386036960986

In [90]:
p_w

0.096509240246406572

Now let's calculate our pooled sample proportion:<br></br>
$$p =(p_1 * n_1 + p_2 * n_2)/(n_1 + n_2)$$

In [91]:
p=(p_b*n_b + p_w*n_w)/(n_b+n_w)
p

0.080492813141683772

Then, our standard error is calculated as <br></br>
$$SE = \sqrt{ p * ( 1 - p ) * [ (1/n_1) + (1/n_2) ] }$$

In [93]:
SE = np.sqrt(p*(1-p)*(1/n_b + 1/n_w))
SE

0.0077968940361704568

At 99% confidence interval, z-score is 2.58. Hence, our margin of error at 99% is plus or minus

In [97]:
mar_error = 2.58 * SE 
mar_error

0.020115986613319779

Our 99% confidence interval is then: 

In [99]:
[p - mar_error, p + mar_error]

[0.060376826528363993, 0.10060879975500356]

Finally, our z-score is equal to $$z = (p_1 - p_2)/SE$$

which means that our p-value is:

In [101]:
z = (p_b - p_w)/SE
stats.norm.cdf(z)

1.9919434187925383e-05

The p-value suggests that the probability of observing this sample given that the null hypothesis is true is very small, less than 1%, so we can reject our null hypothesis that the two proportions are equal at 1% significance level.

<b>Question 4:</b> Write a story describing the statistical significance in the context or the original problem.

Even with a pretty conservative significance level (1%), we can reject the null hypothesis that the proportions are equal and can conclude that there is a difference in the proportion of callbacks for people with black-sounding names compared to the proportion of people with white-sounding names. Given that we had a relatively large sample to begin with, 2435 people in each category (white-sounding and black-sounding names), we did observe different proportions for the two groups (6.4% for people with black-sounding names and 9.7% for people with white-sounding names). We have assumed at the beginning that each resume was assigned a name and sent out randomly and independently from each other. Hopefully, that will take care of any general sampling bias. 

<b> Question 5: </b> Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

However, this hypothesis testing is just a part of the picture. It suggests that name-based/race-based discrimination seems to exist and is important. At the same time, one isolated hypothesis test does not mean that race/name is the most important factor for callback. For example, this experiment was set up in such a way that all 4870 resumes were not 100% identical. Some resumes had more honors, some resumes had "applicants" living in better neighborhoods. The jobs/openings themselves were different, although the majority of them should have been in a similar industry. It implies that there were a host of other factors influencing the callback. To improve this specific hypothesis test (in the absence of any regression analysis), I would narrow down the sample to make sure all other criteria are the same. So, for instance, I would compare proportions for callbacks where all resumes were assigned to females. Then compare proportions for callbacks where all resumes were assigned to males. Then do the same for other factors (equal honors, same neigborhood). Perhaps, it's not just name discrimination, but gender discrimination as well. So what I would be interested in testing here as well is proportions of callbacks for females vs. males, for example. In other words, the particular hypothesis test above is just a piece of a bigger puzzle that needs to be carefully considered and investigated.

The last analysis I wanted to try for the problem is the chi-square test for independence. We have our contingency table already in place, so it should be quite straightforward. The null hypothesis for the test is that race and callbacks are independent, while the alternate hypothesis is that the two variables are not independent.

In [103]:
stats.chi2_contingency(contingency_table)

(16.449028584189371, 4.9975783899632552e-05, 1, array([[  196.,   196.],
        [ 2239.,  2239.]]))

The p-value in this case is much less than 1%:

In [108]:
stats.chi2_contingency(contingency_table)[1]

4.9975783899632552e-05

Based on this p-value, we can reject the null hypothesis that the two variables are independent at a 1% significance level. The test once again suggests that there is a significant dependency between the two factors. We just don't know how significant the relationship is compared to other factors.