In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import ttest_ind
import statsmodels.formula.api as smf

In [2]:
data = pd.io.stata.read_stata('resume_experiment.dta')
data.head()

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black
0,4,2,6,1,0.0,1.0,0.0
1,3,3,6,1,0.0,1.0,0.0
2,4,1,6,1,0.0,1.0,1.0
3,3,4,6,1,0.0,1.0,1.0
4,3,3,22,1,0.0,1.0,0.0


In [3]:
#At first, we create two separate data frames for the treatment (black population) and conrol (non-black population) groups,
#each of which contains 2435 observations.
treatment=data[data['black']==1]
control=data[data['black']==0]

In [4]:
treatment.shape

(2435, 7)

In [5]:
control.shape

(2435, 7)

### Exercise 1: Do gender and computer skills look balanced across race groups? (1 point)

In [6]:
#check means by treatment group first to see practical difference
data.groupby('black')['female','yearsexp','computerskills'].mean()

Unnamed: 0_level_0,female,yearsexp,computerskills
black,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,0.76386,7.856263,0.808624
1.0,0.774538,7.829569,0.832444


#### Answer
1. Since treatment group and control group has the same number of observations, we can tell from aggregated table that gender, yearsexp and computerskills are similarly distributed among two groups. 
2. There some differences of means between two treatment groups, but we are not sure whether the differences are really different. We need to check t-test result.

In [7]:
#Check t-test among two groups

#Gender
print ('Number of females in the treatment group:', len(treatment[treatment['female']==1]))
print ('Number of females in the conrol group:', len(control[control['female']==1]))
print ('T-test for gender:', stats.ttest_ind(np.array(treatment['female']), np.array(control['female'])), '\n')

#Computer skills
print ('Level of the computer skills in the treatment group:', len(treatment[treatment['computerskills']==1]))
print ('Level of the computer skills in the control group:', len(control[control['computerskills']==1]))
print ('T-test for computer skills:', stats.ttest_ind(np.array(treatment['computerskills']), np.array(control['computerskills'])), '\n')

#Years of experience
print ('Years of experience in the treatment group:', len(treatment[treatment['yearsexp']==1]))
print ('Years of experience in the control group:', len(control[control['yearsexp']==1]))
print ('T-test for years of experience:', stats.ttest_ind(np.array(treatment['yearsexp']), np.array(control['yearsexp'])))


Number of females in the treatment group: 1886
Number of females in the conrol group: 1860
T-test for gender: Ttest_indResult(statistic=0.8841321018026016, pvalue=0.37666856909823254) 

Level of the computer skills in the treatment group: 2027
Level of the computer skills in the control group: 1969
T-test for computer skills: Ttest_indResult(statistic=2.1664271042751966, pvalue=0.030326933955391936) 

Years of experience in the treatment group: 19
Years of experience in the control group: 26
T-test for years of experience: Ttest_indResult(statistic=-0.18461970685747395, pvalue=0.8535350182481283)


#### Answer
1. From t-test result, we can tell that gender and years of experiences looks balanced among treatment & control group since p value is larger than 0.05.
2. However, computer skills in treatment group is siginificantly higher than control group.

### Exercise 2: Does education and the number of previous jobs look balanced across racial groups? (2 points)

In [8]:
#compare educational level distribution among two groups
treatment_edu=treatment.groupby(['education'])['black'].agg('count')
print (treatment_edu)

education
0      28
1      22
2     132
3     493
4    1760
Name: black, dtype: int64


In [9]:
control_edu=control.groupby(['education'])['black'].agg('count')
print (control_edu)

education
0      18
1      18
2     142
3     513
4    1744
Name: black, dtype: int64


##### Note: It seems the categorical level are pretty similar distributed. We need to do chisquare test later to confirm whether t have difference among 

In [10]:
#t-test on education level
stats.chisquare(np.array(treatment_edu), np.array(control_edu))

Power_divergenceResult(statistic=8.075185882899378, pvalue=0.08886254294681423)

In [11]:
#compare means of number of previous jobs between two groups
data.groupby('black').mean()['ofjobs']

black
0.0    3.664476
1.0    3.658316
Name: ofjobs, dtype: float64

In [12]:
#t-test on ofjobs
print ('T-test for number of previous jobs:', stats.ttest_ind(np.array(treatment['ofjobs']), np.array(control['ofjobs'])))

T-test for number of previous jobs: Ttest_indResult(statistic=-0.17629292771974545, pvalue=0.8600711511288889)


#### Answer
Given a standard level of significance of 0.05, we have not found a statistical evidence that there is a difference in distribtuion of years of expience & number of previous jobs between the treatment and control groups.

### Exercise 3: 
1. What do you make of the overall results on resume characteristics? 
2. Why do we care about whether these variables look similar across the race groups? (1 point)

In [13]:
data.head(2)

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black
0,4,2,6,1,0.0,1.0,0.0
1,3,3,6,1,0.0,1.0,0.0


#### Answer
1. Overall, treatment and control groups are well balanced for most of the features: gender, years of experience, number of previous jobs, and education variables; however, the level of computer skills between groups indeed varies.
2. It is important to have balance because the only distinguishment between treatment and control groups has to be a response variable - in our case, this is race. Only in this ways, when we sees difference between two groups we could infer that the difference is "caused" by the "race" difference. As in our case, the only concern is "computer skills". Treatment group has a higher average computer skills when compared with control group. This indicates baseline difference between two group and could be problematic. If we see difference among outcome variable we can not infer that it is "caused" by race. And computer skills is a desired skills of employer, it could definetely impact employers' hiring decision. If the average call of treatment group is higher than control group, it could be caused by higher computer skills in treatment group as well as race. At that time, it is impossible for us to come to solid casual inference.

## Estimating Effect of Race

### Exercise 4
The variable of interest in the data set is the variable call, which indicates a call back for an interview. Perform a two-sample t-test comparing applicants with black sounding names and white sounding names.

In [14]:
#compare mean
data.groupby('black').mean()['call']

black
0.0    0.096509
1.0    0.064476
Name: call, dtype: float32

In [15]:
#t-test
stats.ttest_ind(np.array(treatment['call']), np.array(control['call']))

Ttest_indResult(statistic=-4.114705290861751, pvalue=3.940802103128886e-05)

P-value is significantly less than 0.05, which is a statistical evidence to reject a null hypothesis that population means are equal. Call received by treatment & control group are different.


### Exercise 5
Now, use a regression model to estimate the differential likelihood of being called back by applicant race (i.e. the racial discrimination by employers).

In [16]:
smf.logit('call ~ black', data).fit().summary()

Optimization terminated successfully.
         Current function value: 0.278228
         Iterations 7


0,1,2,3
Dep. Variable:,call,No. Observations:,4870.0
Model:,Logit,Df Residuals:,4868.0
Method:,MLE,Df Model:,1.0
Date:,"Tue, 11 Feb 2020",Pseudo R-squ.:,0.006228
Time:,00:55:43,Log-Likelihood:,-1355.0
converged:,True,LL-Null:,-1363.5
,,LLR p-value:,3.771e-05

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.2366,0.069,-32.590,0.000,-2.371,-2.102
black,-0.4382,0.107,-4.083,0.000,-0.649,-0.228


#### Answer
1. The differential log likelihood change from "white" candidates to "black" candidates is -0.438(95% CI: -0.649, -0.228).
2. After we expand log likelihood, the "white" candidates have 0.107 odds of being called, while "black" candidates's odds of being called decreased by a factor of 0.645(95%CI: 0.523, 0.796). 

### Exercise 6
Now let’s see if we can improve our estimates by adding in other variables as controls. Add in education, yearsexp, female, and computerskills – be sure to treat education as a categorical variable!

In [17]:
smf.logit('call ~ black + female + yearsexp + computerskills + C(education)', data).fit().summary()

Optimization terminated successfully.
         Current function value: 0.276112
         Iterations 7


0,1,2,3
Dep. Variable:,call,No. Observations:,4870.0
Model:,Logit,Df Residuals:,4861.0
Method:,MLE,Df Model:,8.0
Date:,"Tue, 11 Feb 2020",Pseudo R-squ.:,0.01378
Time:,00:55:43,Log-Likelihood:,-1344.7
converged:,True,LL-Null:,-1363.5
,,LLR p-value:,8.967e-06

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.4498,0.620,-3.949,0.000,-3.666,-1.234
C(education)[T.1],-0.0158,0.852,-0.019,0.985,-1.685,1.653
C(education)[T.2],0.0116,0.640,0.018,0.986,-1.242,1.265
C(education)[T.3],-0.0135,0.612,-0.022,0.982,-1.214,1.187
C(education)[T.4],-0.0427,0.604,-0.071,0.944,-1.226,1.140
black,-0.4352,0.108,-4.042,0.000,-0.646,-0.224
female,0.1638,0.137,1.198,0.231,-0.104,0.432
yearsexp,0.0370,0.009,3.989,0.000,0.019,0.055
computerskills,-0.2355,0.138,-1.710,0.087,-0.505,0.034


#### Answer
1. The differential log likelihood change from "white" candidates to "black" candidates is -0.435(95% CI: -0.646, -0.224).
2. After we expand log likelihood, a "white" candidates who is man, 0 year of experience, no computer skill, not reported education level, has 0.086 odds of being called. A "black" candidate who has exactly the same features has odds of being called decreased by a factor of 0.647(95%CI: 0.524, 0.799). 

## Estimating Heterogeneous Effects

### Exercise 7
These effects are the average effects. Now let’s look for heterogeneous treatment effects.

Look only at candidates with high educations. Is there more or less racial discrimination among these highly educated candidates?

In [18]:
# high education level model
smf.logit('call ~ black + female + yearsexp + computerskills', data[data['education']==4]).fit().summary()

Optimization terminated successfully.
         Current function value: 0.273868
         Iterations 7


0,1,2,3
Dep. Variable:,call,No. Observations:,3504.0
Model:,Logit,Df Residuals:,3499.0
Method:,MLE,Df Model:,4.0
Date:,"Tue, 11 Feb 2020",Pseudo R-squ.:,0.009338
Time:,00:55:43,Log-Likelihood:,-959.63
converged:,True,LL-Null:,-968.68
,,LLR p-value:,0.001185

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.5569,0.207,-12.326,0.000,-2.963,-2.150
black,-0.3984,0.127,-3.130,0.002,-0.648,-0.149
female,0.2881,0.150,1.924,0.054,-0.005,0.582
yearsexp,0.0235,0.012,1.936,0.053,-0.000,0.047
computerskills,-0.1456,0.159,-0.916,0.360,-0.457,0.166


In [19]:
# low education level model
smf.logit('call ~ black + female + yearsexp + computerskills', data[data['education'] != 4]).fit().summary()

Optimization terminated successfully.
         Current function value: 0.276776
         Iterations 7


0,1,2,3
Dep. Variable:,call,No. Observations:,1366.0
Model:,Logit,Df Residuals:,1361.0
Method:,MLE,Df Model:,4.0
Date:,"Tue, 11 Feb 2020",Pseudo R-squ.:,0.0419
Time:,00:55:43,Log-Likelihood:,-378.08
converged:,True,LL-Null:,-394.61
,,LLR p-value:,1.159e-06

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.9064,0.305,-6.251,0.000,-2.504,-1.309
black,-0.5518,0.203,-2.716,0.007,-0.950,-0.154
female,-0.5588,0.317,-1.762,0.078,-1.180,0.063
yearsexp,0.0702,0.015,4.587,0.000,0.040,0.100
computerskills,-0.4576,0.263,-1.740,0.082,-0.973,0.058


#### Answer
1. For higher educated candidates: 
    a.The differential log likelihood change from "white" candidates to "black" candidates is -0.398(95% CI: -0.648, -0.149).
    b. After we expand log likelihood, a "white" candidates who is man, 0 year of experience, no computer skill, has 0.078 odds of being called. A "black" candidate who has exactly the same features has odds of being called decreased by a factor of 0.672(95%CI: 0.523, 0.862). 

2. For lower educated candidates:
    a.The differential log likelihood change from "white" candidates to "black" candidates is -0.5518(95% CI: -0.950, -0.154).
    b. After we expand log likelihood, a "white" candidates who is man, 0 year of experience, no computer skill, has 0.078 odds of being called. A "black" candidate who has exactly the same features has odds of being called decreased by a factor of 0.576(95%CI: 0.387, 0.857). 

3. Analysis:
From the result above, candidates with higher educational level have lower racial discrimination compared with candidates with lower educational level. But since the CI of "black" for lower education model and higher education mode are overlapped, whether the effect of race for candidates with higher education level is larger than others is hard to conclude.

### Exercise 8
Now let’s compare men and women – is discrimination greater for Black men or Black women?

In [20]:
#compare mean
treatment.groupby('female').mean()['call']

female
0.0    0.058288
1.0    0.066278
Name: call, dtype: float32

In [21]:
#t-test
stats.ttest_ind(np.array(treatment[treatment['female']==1]['call']), np.array(treatment[treatment['female']==0]['call']))

Ttest_indResult(statistic=0.6706419018702243, pvalue=0.5025123365847134)

#### Answer
Given a standard level of significance of 0.05, we have not found a statistical evidence that there is a difference in Black men and Black women in the call thery received. 

### Exercise 9
Calculate and/or lookup the following online:

What is the share of applicants in our dataset with college degrees?

What share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)?

In [22]:
whole_per = data[data['education']==4].shape[0] / data.shape[0] * 100
black_per = treatment[treatment['education']==4].shape[0] / treatment.shape[0] * 100

In [23]:
print('{} percentage of applicants in our dataset have college degrees or even higher degrees'.format(whole_per))
print('{} percentage of black adult Americans in our dataset have college degrees or even higher degrees'.format(black_per))

71.95071868583163 percentage of applicants in our dataset have college degrees or even higher degrees
72.27926078028747 percentage of black adult Americans in our dataset have college degrees or even higher degrees


According to US census Bureau, near 22% of Black American have at least college degree. Our dataset doesn't reflect the distribution of general population. We could not generalize our conclusion from this dataset to general population.

### Exercise 10
What are the implications of your answers to Exercise 7 and to Exercise 9 to how you interpret the Average Treatment Effect you estimated in Exercise 6?

#### Answer
1. to exercise 7&9: 

    a. the result suggests that there might be less racial discrimination among candidates with college educational level (or more), but we are not sure. And from exercise 9 we know that our dataset doesn't reflect the true distribution of black adult American's educational level in real life. Our dataset overrepresents the college level candidates. So we could not generalize the analysis result to general population. If candidates with higher education do experience lower discrimination, on average the general population might experience higher discrimination than this dataset indicates since they should have much more lower educated population.
2. to exercise 6: 

    a. from our model statistics, "black" candidates in our dataset have lower likelihood being called back compared with "not black" candidates. But from exercise 1, we knew that our treatment group and control group have baseline difference: their "computer skills" have significant difference. Since "black" candidates in our dataset are a collectio of highly educated population, it is understandable their computer skills are higher than control group. So it is unknown that the different of "call" among two groups is caused only by "race" or also other features.