# Resume Experiment Analysis

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.value = 'ignore'

In [2]:
resume = pd.read_stata('https://github.com/nickeubank/MIDS_Data/'
                       +'blob/master/resume_experiment/resume_experiment.dta?raw=true')
resume.head()

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black
0,4,2,6,1,0.0,1.0,0.0
1,3,3,6,1,0.0,1.0,0.0
2,4,1,6,1,0.0,1.0,1.0
3,3,4,6,1,0.0,1.0,1.0
4,3,3,22,1,0.0,1.0,0.0


## Exercise 1
### Check for balance

In [3]:
# Calculate the differences in means across treatment arms
check_balance_avg = resume.groupby('black').mean()
check_balance_avg = check_balance_avg[['yearsexp','computerskills','female']]
check_balance_avg

Unnamed: 0_level_0,yearsexp,computerskills,female
black,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,7.856263,0.808624,0.76386
1.0,7.829569,0.832444,0.774538


In [4]:
# test for statistical significance of differences in means
ttest_yearsexp = stats.ttest_ind(resume[
    resume['black']==1]['yearsexp'], resume[resume['black']==0]['yearsexp'])
ttest_computerskills = stats.ttest_ind(resume[
    resume['black']==1]['computerskills'], resume[resume['black']==0]['computerskills'])
ttest_female =  stats.ttest_ind(resume[
    resume['black']==1]['female'], resume[resume['black']==0]['female'])
print('The t-statistic for the difference in years of experience is'
      , f'{round(ttest_yearsexp[0],2)} and the p-value is {round(ttest_yearsexp[1],2)}')
print('The t-statistic for the difference in computer skills is', 
      f'{round(ttest_computerskills[0],2)} and the p-value is {round(ttest_computerskills[1],2)}')
print('The t-statistic for the difference in gender is', 
      f'{round(ttest_female[0],2)} and the p-value is {round(ttest_female[1],2)}')


The t-statistic for the difference in years of experience is -0.18 and the p-value is 0.85
The t-statistic for the difference in computer skills is 2.17 and the p-value is 0.03
The t-statistic for the difference in gender is 0.88 and the p-value is 0.38


* The p-value for difference in years of experience across treatment arms is $0.85 (> 0.05)$, the p-value for computer skills across treatment arms is $0.03 (< 0.05)$, and the p-value for gender across treatment arms is $0.38 (> 0.05)$. There are significant difference between the mean of computer skills across treatment arms and there are no significant difference between years of experience and gender across treatment arms. Therefore, **years of experience and gender looks balanced across race, but computer skills looks imbalanced.**

## Exercise 2
### Check for education (categorical)

In [5]:
share_of_edu = pd.crosstab(resume['black'], resume['education'])
share_of_edu

education,0,1,2,3,4
black,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0.0,18,18,142,513,1744
1.0,28,22,132,493,1760


In [6]:
#chi-squared test for independence
chi2, p, dof, expected = stats.chi2_contingency(share_of_edu)
print(f'The chi-squared statistic is {round(chi2,2)} and the p-value is {round(p,2)}')

The chi-squared statistic is 3.41 and the p-value is 0.49


* The p-value of the chi-squared test is $0.49 (> 0.05)$, meaning that education looks balanced across racial groups.

## Exercise 3

* The average values of applicant gender (female), years of experience (yearsexp), and education (education) across the two arms of the experiment (i.e. by black) has no statistically difference, meaning the data is balance in terms of these variables. However, the average values of applicant computer skills (computerskills) has statistically difference across the two arms of the experiment, meaning computer skills is imbalanced across race groups.

* By checking whether these variables look similar across the race groups, we can get a better understanding of whether our data is balanced aross race groups(control group and experimental group).

* If our data is imbalanced, we should worry not only about whether the groups have baseline differences, but also the possibility of randomization failed. This would be a threat to **internal validity**.




## Exercise 4
### Perform a two-sample t-test comparing applicants with black sounding names and white sounding names on calls

In [7]:
# first, calculate the mean number of calls for each treatment arm
mean_calls = resume.groupby('black').mean()
mean_calls = mean_calls[['call']]
mean_calls

Unnamed: 0_level_0,call
black,Unnamed: 1_level_1
0.0,0.096509
1.0,0.064476


In [8]:
# t-test
ttest_calls = stats.ttest_ind(resume[resume['black']==1]['call'], resume[resume['black']==0]['call'])
print('the percentage of calls for applicants with white and black sounding names are:', round(
    mean_calls.loc[0,'call']*100,2), '% and', round(mean_calls.loc[1,'call']*100,2), '% respectively.')
print('The t-statistic for the difference in calls is', 
      f'{round(ttest_calls[0],2)} and the p-value is {ttest_calls[1]}')
per_diff = (mean_calls.loc[1,'call']-mean_calls.loc[0,'call'])/mean_calls.loc[0,'call']
print(f'The difference in percentage terms in calls is {round(per_diff*100,2)}%')
per_point_diff = mean_calls.loc[1,'call']-mean_calls.loc[0,'call']
print(f'The percentage point difference in calls is {round(per_point_diff*100,2)}%')


the percentage of calls for applicants with white and black sounding names are: 9.65 % and 6.45 % respectively.
The t-statistic for the difference in calls is -4.11 and the p-value is 3.940800981423711e-05
The difference in percentage terms in calls is -33.19%
The percentage point difference in calls is -3.2%


* Having a Black-sounding name (as opposed to a White-sounding name) is 33.19% less likely to get call back for an interview in percentage terms(relative to having White-sounding name).

* Having a Black-sounding name (as opposed to a White-sounding name) is 3.2% less likely to get call back for an interview in percentage points.

* Since the p-value is $3.94*10^{-5} (< 0.05)$, having a Black-sounding name (as opposed to a White-sounding name) is less likely to get call back for an interview and this difference is statistically significant. 

## Exercise 5
### Use a linear probability model to estimate the differential likelihood of being called back by applicant race

In [9]:
# linear regression model
import statsmodels.api as sm
import statsmodels.formula.api as smf

# only use the variable: black
X = resume['black']
y = resume['call']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
model.get_robustcov_results(cov_type='HC3').summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.003
Model:,OLS,Adj. R-squared:,0.003
Method:,Least Squares,F-statistic:,16.92
Date:,"Mon, 20 Feb 2023",Prob (F-statistic):,3.96e-05
Time:,21:00:08,Log-Likelihood:,-562.24
No. Observations:,4870,AIC:,1128.0
Df Residuals:,4868,BIC:,1141.0
Df Model:,1,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0965,0.006,16.121,0.000,0.085,0.108
black,-0.0320,0.008,-4.114,0.000,-0.047,-0.017

0,1,2,3
Omnibus:,2969.205,Durbin-Watson:,1.44
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18927.068
Skew:,3.068,Prob(JB):,0.0
Kurtosis:,10.458,Cond. No.,2.62


* The coefficient of the variable 'black' is $-0.0320$, meaning that having a Black-sounding name (as opposed to a White-sounding name) is $3.20\%$ less likely to to get call back for an interview. This result is statistically significant with $p-value < 0.05$.

## Exercise 6
### Improve estimates by adding in other variables as controls

In [10]:
# add other variables:  education, yearsexp, female, and computerskills
resume.loc[:,'education'] = resume['education'].astype('category')
formula = 'call ~ black + C(education) + yearsexp + female + computerskills'
model = smf.ols(formula, data=resume).fit()
model.get_robustcov_results(cov_type='HC3').summary()


0,1,2,3
Dep. Variable:,call,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,4.35
Date:,"Mon, 20 Feb 2023",Prob (F-statistic):,3.04e-05
Time:,21:00:08,Log-Likelihood:,-551.02
No. Observations:,4870,AIC:,1120.0
Df Residuals:,4861,BIC:,1178.0
Df Model:,8,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0821,0.040,2.053,0.040,0.004,0.160
C(education)[T.1],-0.0017,0.057,-0.030,0.976,-0.113,0.110
C(education)[T.2],-8.953e-05,0.042,-0.002,0.998,-0.082,0.082
C(education)[T.3],-0.0025,0.039,-0.065,0.948,-0.079,0.074
C(education)[T.4],-0.0047,0.038,-0.124,0.901,-0.080,0.070
black,-0.0316,0.008,-4.076,0.000,-0.047,-0.016
yearsexp,0.0032,0.001,3.665,0.000,0.001,0.005
female,0.0112,0.010,1.165,0.244,-0.008,0.030
computerskills,-0.0186,0.011,-1.616,0.106,-0.041,0.004

0,1,2,3
Omnibus:,2950.646,Durbin-Watson:,1.448
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18631.25
Skew:,3.047,Prob(JB):,0.0
Kurtosis:,10.395,Cond. No.,225.0


* After adding education, yearsexp, female, and computerskills into the model, the coefficient of 'black' is $-0.0316$, meaning having a Black-sounding name (as opposed to a White-sounding name) is $3.16\%$ less likely to to get call back for an interview, holding everything else constant (same education, years of experience, gender, computer skills). This result is statistically significant with $p-value < 0.05$.

* Our estimate improves with a higher adjusted $R^2=0.08$.

## Exercise 7
### Heterogeneous treatment effects on college degree

In [11]:
# add college degree as a variable
resume.loc[:,'college_degree'] = resume['education'].apply(lambda x: 1 if x == 4 else 0)
resume.head()

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black,college_degree
0,4,2,6,1,0.0,1.0,0.0,1
1,3,3,6,1,0.0,1.0,0.0,0
2,4,1,6,1,0.0,1.0,1.0,1
3,3,4,6,1,0.0,1.0,1.0,0
4,3,3,22,1,0.0,1.0,0.0,0


In [21]:
resume.loc[:,'college_degree'] = resume['college_degree'].astype('category')
formula_ed ='call~black + C(college_degree) + yearsexp + female + computerskills + black*C(college_degree) + C(education)'
model_ed = smf.ols(formula_ed, data=resume).fit()
warnings.value = 'ignore'
model_ed.get_robustcov_results(cov_type='HC3').summary()
#ignore the warning





In [13]:
hypotheses_ed = 'black + black:C(college_degree)[T.1] = 0'
t_test_ed = model_ed.t_test(hypotheses_ed)
print(t_test_ed)

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0            -0.0282      0.009     -3.071      0.002      -0.046      -0.010


In [14]:
# difference
rel_coef = model_ed.params['black:C(college_degree)[T.1]']/model_ed.params['black']
print(f'The racial discrimination difference in percentage terms is {round(rel_coef,4)*100}%')

The racial discrimination difference in percentage terms is -30.48%


* The coefficient for people without a college degree and have a Black-sounding name is $-0.0405$, meaning as opposed to a White-sounding name, they are $4.05\%$ less likely to to get call back for an interview, holding everything else constant. 

* The coefficient for people with a college degree and have a Black-sounding name is $-0.0405+0.0123 = -0.0282$, meaning as opposed to a White-sounding name, they are $2.82\%$ less likely to to get call back for an interview. 

* The difference of racial discrimination between people without a college degree and people with a college degree is $-12.3\%$ in percentage points and $-30.48\%$ in percentage terms, meaning that the difference of racial discrimination between people without a college degree is $12.3\%$ higher than people with a college degree, and then difference is $30.48\%$ relative to people with a college degree.

* There is more racial discrimination among applicants who do not have a college degree, and the difference is not statistically significant with a $p-value = 0.478 (>0.05)$

## Exercise 8
### Heterogeneous treatment effects on gender

In [15]:
formula_gender = 'call ~ black + C(education) + yearsexp + female + computerskills + black*female'
model_gender = smf.ols(formula_gender, data=resume).fit()
model_gender.get_robustcov_results(cov_type='HC3').summary()

0,1,2,3
Dep. Variable:,call,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,3.866
Date:,"Mon, 20 Feb 2023",Prob (F-statistic):,6.76e-05
Time:,21:00:08,Log-Likelihood:,-551.0
No. Observations:,4870,AIC:,1122.0
Df Residuals:,4860,BIC:,1187.0
Df Model:,9,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0807,0.040,1.996,0.046,0.001,0.160
C(education)[T.1],-0.0021,0.057,-0.037,0.971,-0.114,0.110
C(education)[T.2],-0.0001,0.042,-0.003,0.998,-0.082,0.082
C(education)[T.3],-0.0026,0.039,-0.066,0.947,-0.079,0.074
C(education)[T.4],-0.0048,0.038,-0.125,0.900,-0.080,0.070
black,-0.0287,0.016,-1.840,0.066,-0.059,0.002
yearsexp,0.0032,0.001,3.668,0.000,0.001,0.005
female,0.0131,0.014,0.919,0.358,-0.015,0.041
computerskills,-0.0186,0.011,-1.618,0.106,-0.041,0.004

0,1,2,3
Omnibus:,2950.616,Durbin-Watson:,1.448
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18630.964
Skew:,3.047,Prob(JB):,0.0
Kurtosis:,10.395,Cond. No.,226.0


In [16]:
hypotheses_female = 'black + black:female = 0'
t_test_female = model_gender.t_test(hypotheses_female)
print(t_test_female)

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0            -0.0325      0.009     -3.664      0.000      -0.050      -0.015


In [17]:
perc_term = (2.87-3.25)/3.25
print(f'The difference percentage terms in calls is {round(perc_term*100,2)}%')
pert_point = 2.87-3.25
print(f'The percentage point difference in calls is {round(pert_point*100,2)}%')

The difference percentage terms in calls is -11.69%
The percentage point difference in calls is -38.0%


* The coefficient for males who have a Black-sounding name is $-0.0287$, meaning as opposed to a White-sounding name, they are $2.87\% $ less likely to to get call back for an interview, holding everything else constant. 

* The coefficient for women have a Black-sounding name is $-0.0287 -0.0038 = -0.0325$, meaning as opposed to a White-sounding name, they are $3.25\% $ less likely to to get call back for an interview, holding everything else constant. 

* **The penalty for having a Black-sounding name is greater for Black women**, and the difference is not statistically significant with a $p-value = 0.831 (>0.05)$

## Exercise 9
### The share of applicants in our dataset with college degrees

In [18]:
# calculate the share of apllicants with college degree
share_of_college_degree = resume['college_degree'].value_counts(normalize=True)
share_of_college_degree

1    0.719507
0    0.280493
Name: college_degree, dtype: float64

* $71.95\%$ of applicants in the dataset have college degrees.

### The share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)

In [19]:
# calculate the share of apllicants with college degree for each treatment arm
share_of_college_black = pd.crosstab(resume['black'], resume['college_degree'], normalize='index')
share_of_college_black

college_degree,0,1
black,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,0.283778,0.716222
1.0,0.277207,0.722793


* $72.28\%$ of Black adult Americans have college degrees.

## Exercise 10


* Over $70\%$ of applicants have a college degree, and a higher share of black adult Americans have college degrees. However, in real life, a far smaller share of people have college degrees, and according to some articles, a lower share of black adult Americans have college degrees. 

* Since during applications, people with higher degrees are more likely to get calls, the ATE for the average Black Americans would be more significant compare to the ATE estimated from this experiment, meaning having a Black-sounding name (as opposed to a White-sounding name) is **much less likely** to to get call back for an interview compare(the absolute value of the coefficient should be bigger, that is a greater percentage difference).

## Exercise 11
### Internal validity

* Internal validity is about whether a study has accurately measured a causal effect in the context being studied.

* Since the computer skills is imbalanced across different groups, the data is imbalanced, meaning that resumes that ended up with Black-sounding names do not look like the resumes with White-sounding names. Since other variables in the experiment is balance, there shouldn't be  a problem with random assignment. We should worry about the baseline difference, and the internal validity is not perfectly ensured.

* Since Exercise 10 asks about generalization to the experience of the average Black American, it is more related to external validity.

## Exercise 12
### External validity

* External validity is about whether we think the results of a given study are likely to generalize to other contexts.

* Since in real world, the share of Black Americans with college degree(as well as the distribution of education level) is different from this experiment, the external validity is not ensured and the results of this study may not generalize to a new context. 