<span style="color: #9370DB">*Bárbara Flores*</span>

# Resume Experiment Analysis

How much harder is it to get a job in the United States if you are Black than if you are White? Or, expressed differently, what is the *effect* of race on the difficulty of getting a job in the US?

In this exercise, we will be analyzing data from a real world experiment designed to help answer this question. Namely, we will be analyzing data from a randomized experiment in which 4,870 ficticious resumes were sent out to employers in response to job adverts in Boston and Chicago in 2001. The resumes differ in various attributes including the names of the applicants, and different resumes were randomly allocated to job openings. 

The "experiment" part of the experiment is that resumes were randomly assigned Black- or White-sounding names, and then watched to see whether employers called the "applicants" with Black-sounding names at the same rate as the applicants with the White-sounding names.

(Which names constituted "Black-sounding names" and "White-sounding names" was determined by analyzing names on Massachusetts birth certificates to determine which names were most associated with Black and White children, and then surveys were used to validate that the names were perceived as being associated with individuals of one racial category or the other. Also, please note I subscribe to the logic of [Kwame Anthony Appiah](https://www.theatlantic.com/ideas/archive/2020/06/time-to-capitalize-blackand-white/613159/) and chose to capitalize both the B in Black and the W in White). 

You can get access to original article [here](https://www.aeaweb.org/articles?id=10.1257/0002828042002561). 

**Note to Duke students:** if you are on the Duke campus network, you'll be able to access almost any academic journal articles directly; if you are off campus and want access, you can just go to the [Duke Library](https://library.duke.edu/) website and search for the article title. Once you find it, you'll be asked to log in, after which you'll have full access to the article. You will also find this pattern holds true at nearly any major University in the US.



## Gradescope Autograding

Please follow [all standard guidance](https://www.practicaldatascience.org/html/autograder_guidelines.html) for submitting this assignment to the Gradescope autograder, including storing your solutions in a dictionary called `results` and ensuring your notebook runs from the start to completion without any errors.

For this assignment, please name your file `exercise_resume_experiment.ipynb` before uploading.

You can check that you have answers for all questions in your `results` dictionary with this code:

```python
assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}
```


### Submission Limits

Please remember that you are **only allowed FOUR submissions to the autograder.** Your last submission (if you submit 4 or fewer times), or your third submission (if you submit more than 4 times) will determine your grade Submissions that error out will **not** count against this total.

That's one more than usual in case there are issues with exercise clarity.

## Checking for Balance

The first step in analyzing any experiment is to check whether you have *balance* across your treatment arms—that is to say, do the people who were randomly assigned to the treatment group look like the people who were randomly assigned to the control group. Or in this case, do the resumes that ended up with Black-sounding names look like the resumes with White-sounding names. 

Checking for balance is critical for two reasons. First, it's always possible that random assignment will create profoundly different groups—the *Large of Large Numbers* is only a "law" in the limit. So we want to make sure we have reasonably similar groups from the outset. And second, it's also always possible that the randomization wasn't actually implemented correctly—you would be amazed at the number of ways that "random assignment" can go wrong! So if you ever do find you're getting unbalanced data, you should worry not only about whether the groups have baseline differences, but also whether the "random assignment" was actually random!

### Exercise 1

Download the data set from this experiment (`resume_experiment.dta`) from [github](https://github.com/nickeubank/MIDS_Data/tree/master/resume_experiment). To aid the autograder, please load the data directly from a URL.


In [32]:
# import requests
import warnings
import pandas as pd
from scipy.stats import chi2_contingency

warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore", category=FutureWarning)
pd.set_option("mode.copy_on_write", True)

results = dict()

In [33]:
resume_experiment = pd.read_stata(
    "https://github.com/nickeubank/MIDS_Data/raw/master/resume_experiment/resume_experiment.dta"
)

resume_experiment.head()

Unnamed: 0,education,ofjobs,yearsexp,computerskills,call,female,black
0,4,2,6,1,0.0,1.0,0.0
1,3,3,6,1,0.0,1.0,0.0
2,4,1,6,1,0.0,1.0,1.0
3,3,4,6,1,0.0,1.0,1.0
4,3,3,22,1,0.0,1.0,0.0



### Exercise 2

- `black` is the treatment variable in the data set (whether the resume has a "Black-sounding" name).
- `call` is the dependent variable of interest (did the employer call the fictitious applicant for an interview)

In addition, the data include a number of variables to describe the other features in each fictitious resume, including applicants education level (`education`), years of experience (`yearsexp`), gender (`female`), computer skills (`computerskills`), and number of previous jobs (`ofjobs`). Each resume has a random selection of these attributes, so on average the Black-named fictitious applicant resumes have the same qualifications as the White-named applicant resumes. 

Check for balance in terms of the average values of applicant gender (`female`), computer skills (`computerskills`), and years of experience (`yearsexp`) across the two arms of the experiment (i.e. by `black`). Calculate both the differences in means across treatment arms *and* test for statistical significance of these differences. Does gender, computer skills, and yearsexp look balanced across race groups in terms of both statistical significance and magnitude of difference?

Store the p-values associated with your t-test of these variables in `ex2_pvalue_female`, `ex2_pvalue_computerskills`, and `ex2_pvalue_yearsexp`. **Round your values to 2 decimal places.**


In [34]:
import statsmodels.formula.api as smf

model1 = smf.ols("black ~ female", resume_experiment).fit()
model2 = smf.ols("black ~ computerskills", resume_experiment).fit()
model3 = smf.ols("black ~ yearsexp", resume_experiment).fit()

In [35]:
hypothesis = "female = 0"
print(model1.t_test(hypothesis))

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.0150      0.017      0.884      0.377      -0.018       0.048


In [36]:
hypothesis = "computerskills = 0"
print(model2.t_test(hypothesis))

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.0404      0.019      2.166      0.030       0.004       0.077


In [37]:
hypothesis = "yearsexp = 0"
print(model3.t_test(hypothesis))

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0            -0.0003      0.001     -0.185      0.854      -0.003       0.003


In [38]:
ex2_pvalue_female = model1.pvalues["female"]
ex2_pvalue_computerskills = model2.pvalues["computerskills"]
ex2_pvalue_yearsexp = model3.pvalues["yearsexp"]

results["ex2_pvalue_female"] = round(ex2_pvalue_female, 2)
results["ex2_pvalue_computerskills"] = round(ex2_pvalue_computerskills, 2)
results["ex2_pvalue_yearsexp"] = round(ex2_pvalue_yearsexp, 2)

print(f"P-value for gender obtained from the t-test: {ex2_pvalue_female:.2f}")
print(
    f"P-value for computer skills obtained from the t-test: {ex2_pvalue_computerskills:.2f}"
)
print(
    f"P-value for years of experience (obtained from the t-test: {ex2_pvalue_yearsexp:.2f}"
)

P-value for gender obtained from the t-test: 0.38
P-value for computer skills obtained from the t-test: 0.03
P-value for years of experience (obtained from the t-test: 0.85


> For the variables female and yearsexp, the p-values associated with these variables are 0.38 and 0.85, respectively, indicating that no statistically significant differences in the average values of gender and years of experience were observed between the black and non-black groups. On the other hand, for the variable computerskills, a p-value of 0.03 was found, indicating a statistically significant difference in computer skills between the black and non-black groups.

### Exercise 3

Do a similar tabulation for education (`education`). Education is a categorical variable coded as follows:

- 0: Education not reported
- 1: High school dropout
- 2: High school graduate
- 3: Some college
- 4: College graduate or higher

Because these are categorical, you shouldn't just calculate and compare means—you should compare share or count of observations with each value (e.g., a chi-squared contingency table). You may also find the `pd.crosstab` function useful.

Does education look balanced across racial groups?

Store the p-value from your chi squared test in results under the key `ex3_pvalue_education`. **Please round to 2 decimal places.**

In [39]:
education_crosstab = pd.crosstab(
    resume_experiment["education"], resume_experiment["black"]
)

education_crosstab

black,0.0,1.0
education,Unnamed: 1_level_1,Unnamed: 2_level_1
0,18,28
1,18,22
2,142,132
3,513,493
4,1744,1760


In [40]:
res = chi2_contingency(education_crosstab)
ex3_pvalue_education = res.pvalue

results["ex3_pvalue_education"] = round(ex3_pvalue_education, 2)

print(
    f"P-value for education obtained from the chi-squared test: {ex3_pvalue_education:.2f}"
)

P-value for education obtained from the chi-squared test: 0.49


> Given the result of the chi-squared test, with a p-value of 0.49, there is not enough evidence to reject the null hypothesis that there is no significant association between education and racial group. Therefore, according to the chi-squared test analysis, education does not appear to be unbalanced across racial groups.

### Exercise 4

What do you make of the overall results on resume characteristics? Why do we care about whether these variables look similar across the race groups? And if they didn't look similar, would that be a threat to internal or external validity? 

Answer in markdown, then also store your answer to the question of whether imbalances are a threat to internal or external validity in `"ex4_validity"` as the string `"internal"` or `"external"`.


> The first thing we are interested in knowing when we analyze an experiment is whether there is balance between our treatment group and our control group. Even if the selection was truly random, it can happen that the groups turn out to be different, which would imply that our assumption of "No Baseline Differences" between the treatment and control groups is not met. This concern refers to `internal validity`, as we are interested in knowing how well our predictions fit the data we already have.

In [41]:
results["ex4_validity"] = "internal"

## Estimating Effect of Race

### Exercise 5

The variable of interest in the data set is the variable `call`, which indicates a call back for an interview. Perform a two-sample t-test comparing applicants with black sounding names and white sounding names.

Interpret your results—in both percentage *and* in percentage points, what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume?

Store how much more likely a White applicant is to receive a call back than a Black respondent in percentage and percentage points in `"ex5_white_advantage_percent"`and `"ex5_white_advantage_percentage_points"`. Please scale percentages so 1 is 1% and percentage points so a value of `1` corresponds to 1 percentage point. **Please round these answers to 2 decimal places.**

Store the p-value of the difference in `"ex5_pvalue"` **Please round your p-value to 5 decimal places.**

In [42]:
pd.crosstab(resume_experiment["call"], resume_experiment["black"])

black,0.0,1.0
call,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,2200,2278
1.0,235,157


In [43]:
mean_call_by_race = resume_experiment.groupby("black")["call"].mean()
mean_call_by_race

black
0.0    0.096509
1.0    0.064476
Name: call, dtype: float32

In [44]:
ex5_white_advantage_percent = (
    100 * (mean_call_by_race[0] - mean_call_by_race[1]) / mean_call_by_race[1]
)

ex5_white_advantage_percentage_points = 100 * (
    mean_call_by_race[0] - mean_call_by_race[1]
)

results["ex5_white_advantage_percent"] = round(ex5_white_advantage_percent, 2)
results["ex5_white_advantage_percentage_points"] = round(
    ex5_white_advantage_percentage_points, 2
)

In [45]:
model = smf.ols("call ~ black", resume_experiment).fit()

ex5_pvalue = model.pvalues["black"]
results["ex5_pvalue"] = round(ex5_pvalue, 5)

In [46]:
print(
    f"The advantage of having a White-sounding name over a Black-sounding name in terms of callback percentage is: {ex5_white_advantage_percent:.2f}%"
)
print(
    f"The advantage of having a White-sounding name over a Black-sounding name in terms of percentage points is: {ex5_white_advantage_percentage_points:.2f} percentage points"
)
print(
    f"The p-value of the difference in callback rates between applicants with Black-sounding names and White-sounding names is: {ex5_pvalue:.5f}"
)

The advantage of having a White-sounding name over a Black-sounding name in terms of callback percentage is: 49.68%
The advantage of having a White-sounding name over a Black-sounding name in terms of percentage points is: 3.20 percentage points
The p-value of the difference in callback rates between applicants with Black-sounding names and White-sounding names is: 0.00004


> Based on the previous results, we can observe that there is a statistically significant difference in callback rates between applicants with Black-sounding names and those with White-sounding names. Specifically, applicants with White-sounding names have a higher callback rate than applicants with Black-sounding names.

### Exercise 6

Now, use a linear probability model (a linear regression with a 0/1 dependent variable!) to estimate the differential likelihood of being called back by applicant race (i.e. the racial discrimination by employers). Please use [statsmodels](https://www.statsmodels.org/stable/index.html).

Since we have a limited dependent variable, be sure to use [heteroskedastic robust standard errors.](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html) Personally, I prefer the `HC3` implementation, as it tends to do better with smaller samples than other implementations.

Interpret these results—what is the *effect* of having a Black-sounding name (as opposed to a White-sounding name) on your resume in terms of the likelihood you'll be called back? 

How does this compare to the estimate you got above in exercise 5?

Store the p-value associated with `black` in `"ex6_black_pvalue"`. **Please round your pvalue to 5 decimal places.**

In [47]:
model = smf.ols("call ~ black", resume_experiment).fit()
print(model.get_robustcov_results("HC3").summary())

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                  0.003
Method:                 Least Squares   F-statistic:                     16.92
Date:                Tue, 05 Mar 2024   Prob (F-statistic):           3.96e-05
Time:                        12:20:37   Log-Likelihood:                -562.24
No. Observations:                4870   AIC:                             1128.
Df Residuals:                    4868   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:                  HC3                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0965      0.006     16.121      0.0

In [48]:
ex6_black_pvalue = model.pvalues["black"]
results["ex6_black_pvalue"] = round(ex6_black_pvalue, 5)
print(
    f"The p-value associated with 'black' in the linear probability model is: {ex6_black_pvalue:.5f}"
)

The p-value associated with 'black' in the linear probability model is: 0.00004


> The negative coefficient of -0.0320 for the variable 'black' in the linear probability model indicates that applicants with Black-sounding names have a lower probability of being called back by approximately 3.20 percentage points compared to those with White-sounding names. This finding is supported by a p-value of 4e-05, suggesting strong evidence that this coefficient is statistically significant. This p-value is the same as the one obtained in question 5.

### Exercise 7

Even when doing a randomized experiment, adding control variables to your regression *can* improve the statistical efficiency of your estimates of the treatment effect (the upside is the potential to explain residual variation; the downside is more parameters to be estimated). Adding controls can be particularly useful when randomization left some imbalances in covariates (which you may have seen above). 

Now let's see if we can improve our estimates by adding in other variables as controls. Add in `education`, `yearsexp`, `female`, and `computerskills`—be sure to treat education as a categorical variable!

In [49]:
model = smf.ols(
    "call ~ black + C(education) + yearsexp + computerskills + female",
    resume_experiment,
).fit()
print(model.get_robustcov_results("HC3").summary())

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     4.350
Date:                Tue, 05 Mar 2024   Prob (F-statistic):           3.04e-05
Time:                        12:20:37   Log-Likelihood:                -551.02
No. Observations:                4870   AIC:                             1120.
Df Residuals:                    4861   BIC:                             1178.
Df Model:                           8                                         
Covariance Type:                  HC3                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept             0.0821      0.04

In [50]:
print(model.params["black"])

-0.03161371958887789


In [51]:
print(round(model.pvalues["black"], 5))

5e-05


> After adding control variables to my regression, the coefficient changed from -0.032 to -0.0316, with a p-value that changed from 4e-05 to 5e-05. In general, we can conclude that the results remain consistent. In other words, applicants with Black-sounding names have a lower probability of being called back compared to those with White-sounding names. 

## Estimating Heterogeneous Effects

### Exercise 8

As you may recall from some past readings (such as this one on the [migraine medication Aimovig](https://ds4humans.com/30_questions/15_answering_exploratory_questions.html#faithful-representations)), our focus on estimating *Average Treatment Effects* runs the risk of papering over variation in how individuals respond. In the case of Aimovig, for example, nearly no patients actually experienced the Average Treatment Effect of the medication; around half of patients experienced no benefit, while the other half experienced a benefit of about twice the average treatment effect.

So far in this analysis we've been focusing on the *average* effect of having a Black-sounding name (as compared to a White-sounding name). But we can actually use our regression framework to look for evidence of *heterogeneous treatment effects*—effects that are different for different types of people in our data. We accomplish this by *interacting* a variable we think may be related to experiencing a differential treatment effect with our treatment variable. For example, if we think that applicants with Black-sounding names who have a college degree are likely to experience less discrimination, we can interact `black` with an indicator for having a college degree. If having a college degree reduces discrimination, we could expect the interaction term to be positive. 

Is there more or less racial discrimination (the absolute magnitude difference in call back rates between Black and White applicants) among applicants who have a college degree? Store your answer as the string `"more discrimination"` or `"less discrimination"` under the key `"ex8_college_heterogeneity"`.

Please still include `education`, `yearsexp`, `female`, and `computerskills` as controls.

**Note:** it's relatively safe to assume that someone hiring employees who sees a resume that does *not* report education levels will assume the applicant does not have a college degree. So treat "No education reported" as "not having a college degree."

In percentage points, what is the difference in call back rates:

- between White applicants without a college degree and Black applicants without a college degree (`ex8_black_nocollege`).
- between White applicants with a college degree and Black applicants with a college degree (`ex8_black_college`).

Use negative values to denote a lower probability for Black applicants to get a call back. **Scale so a value of `1` is a one percentage point difference. Please round your answer to 2 percentage points.**

Focus on the coefficient values, even if the significance is low.

In [52]:
resume_experiment["college_degree"] = (resume_experiment["education"] == 4).astype(int)

In [53]:
model = smf.ols(
    "call ~ black*college_degree+ + black + college_degree + yearsexp + computerskills + female ",
    resume_experiment,
).fit()
print(model.get_robustcov_results("HC3").summary())

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.007
Method:                 Least Squares   F-statistic:                     5.874
Date:                Tue, 05 Mar 2024   Prob (F-statistic):           4.06e-06
Time:                        12:20:37   Log-Likelihood:                -550.77
No. Observations:                4870   AIC:                             1116.
Df Residuals:                    4863   BIC:                             1161.
Df Model:                           6                                         
Covariance Type:                  HC3                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept                0.0849 

In [54]:
# In percentage points, what is the difference in call back rates:
# between White applicants without a college degree and Black applicants without a college degree (`ex8_black_nocollege`).

ex8_black_nocollege = model.params["black"] * 100
results["ex8_black_nocollege"] = round(ex8_black_nocollege, 2)

# In percentage points, what is the difference in call back rates:
# between White applicants with a college degree and Black applicants with a college degree (`ex8_black_college`).

# ex8_black_college = (model.params["black"] + model.params["black:college_degree"]) * 100
ex8_black_college = round((0.0123 - 0.0405) * 100, 2)
results["ex8_black_college"] = round(ex8_black_college, 2)

print(
    f"The difference in call back rates between Black applicants without a college degree and White applicants without a college degree is {ex8_black_nocollege:.2f} percentage points.\n"
    f"The difference in call back rates between Black applicants with a college degree and White applicants with a college degree is {ex8_black_college:.2f} percentage points."
)

The difference in call back rates between Black applicants without a college degree and White applicants without a college degree is -4.05 percentage points.
The difference in call back rates between Black applicants with a college degree and White applicants with a college degree is -2.82 percentage points.


In [55]:
ex8_college_heterogeneity = "less discrimination"
results["ex8_college_heterogeneity"] = ex8_college_heterogeneity

print(
    f"Therefore, among applicants with a college degree, there is {ex8_college_heterogeneity} in call back rates."
)

Therefore, among applicants with a college degree, there is less discrimination in call back rates.


### Exercise 9

Now let's compare men and women—is the penalty for having a Black-sounding name greater for Black men or Black women? Store your answer as `"greater discrimination for men"` or `"greater discrimination for women"` in `"ex9_gender_and_discrimination"`.

Focus on the coefficient values, even if the significance is low.

Again, please still include `education`, `yearsexp`, `female`, and `computerskills` as controls.

In [56]:
model = smf.ols(
    "call ~ black*female+ + black + C(education)  + yearsexp + computerskills + female ",
    resume_experiment,
).fit()
print(model.get_robustcov_results("HC3").summary())

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     3.866
Date:                Tue, 05 Mar 2024   Prob (F-statistic):           6.76e-05
Time:                        12:20:37   Log-Likelihood:                -551.00
No. Observations:                4870   AIC:                             1122.
Df Residuals:                    4860   BIC:                             1187.
Df Model:                           9                                         
Covariance Type:                  HC3                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept             0.0807      0.04

In [57]:
black_male = model.params["black"] * 100
black_female = (model.params["black"] + model.params["black:female"]) * 100

print(
    f"The difference in call back rates between Black male applicants and White male applicants is {black_male:.2f} percentage points.\n"
    f"The difference in call back rates between Black female applicants and White female applicants is {black_female:.2f} percentage points."
)

The difference in call back rates between Black male applicants and White male applicants is -2.87 percentage points.
The difference in call back rates between Black female applicants and White female applicants is -3.25 percentage points.


In [58]:
ex9_gender_and_discrimination = "greater discrimination for women"
results["ex9_gender_and_discrimination"] = ex9_gender_and_discrimination
print(f"Therefore, there is {ex9_gender_and_discrimination} in call back rates.")

Therefore, there is greater discrimination for women in call back rates.


### Exercise 10

Calculate and/or lookup the following online:

- What is the share of applicants in our dataset with college degrees?
- What share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)?

Is the share of Black applicants with college degrees in this data `"greater"`, or `"less"` than in the US? Store your answer as one of those strings in `"ex10_experiment_v_us"`

> According to the Census' American Community Survey, in 2021 12% of the total U.S. population identified as Black or African American. Among Black residents aged 25 or over, 22.6% had earned a bachelor's degree or higher. 

In [59]:
college_degree_share = resume_experiment["college_degree"].mean()
black_college_degree_share = resume_experiment[resume_experiment["black"] == 1][
    "college_degree"
].mean()

print(
    f"The share of applicants in our dataset with college degrees is: {college_degree_share:.1%}"
)


print(
    f"The share of Black applicants in our dataset with college degrees is: {black_college_degree_share:.1%}"
)

The share of applicants in our dataset with college degrees is: 72.0%
The share of Black applicants in our dataset with college degrees is: 72.3%


In [60]:
ex10_experiment_v_us = "greater"
results["ex10_experiment_v_us"] = ex10_experiment_v_us

print(
    f"After researching online, we can conclude tha the share of Black applicants with college degrees in this data is {ex10_experiment_v_us} than in the US."
)

After researching online, we can conclude tha the share of Black applicants with college degrees in this data is greater than in the US.


### Exercise 11

Bearing in mind your answers to Exercise 8 and to Exercise 10, how do you think the Average Treatment Effect you estimated in Exercises 5 and 6 might generalize to the experience of the average Black American (i.e., how do you think the ATE for the average Black American would compare to the ATE estimated from this experiment)?


>Considering my responses in Exercises 8 and 10, I believe that the Average Treatment Effect estimated in Exercises 5 and 6 does not adequately reflect the experience of the average Black American.
>
>On one hand, our dataset shows an artificially high proportion of Black individuals with college degrees compared to the actual share of Black adult Americans with such qualifications.
>
>On the other hand, we have observed that the discrimination effect against Black individuals in callback rates is lower among those with college degrees.
>
>Therefore, we could conclude that the ATE for the average Black American should be higher compared to the ATE estimated in this experiment, which is artificially reduced due to the disproportionately high proportion of individuals with college degrees in our sample

### Exercise 12

What does your answer to Exercise 10 imply about the study's *internal* validity?

> In general, we have observed that the distribution of data in the experiment samples is balanced. The proportion of variables such as gender, educational level, and computer skills is distributed similarly for both groups, so we can consider that the study has good internal validity. However, it is important to consider the limitations of the study, as it only measures the effect of having a Black-sounding name.

### Exercise 13

What does your answer to Exercise 10 imply about the study's *external* validity?

> The study has limitations in terms of external validity, as it cannot be extended to other populations, meaning that it does not accurately represent the reality of the experience of a Black person in the United States. It is important to consider that external validity depends on the population we wish to compare it with. However, if our goal is to generalize to the Black population of the United States, our experiment has limited external validity, as the sample we are using differs significantly from the actual population. Furthermore, it does not take into account all the differences that may influence a person's experience in the labor market, beyond the impact of the name, whether it sounds like a Black or White name

## What Did We Just Measure?

It's worth pausing for a moment to think about exactly what we've measured in this experiment. Was it the effect of race on hiring? Or the difference in the experience of the average White job applicant from the average Black job applicant?

Well... no. What we have measured in this experiment is **just** the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume on the likelihood of getting a followup call from someone hiring in Boston or Chicago given identical resumes. In that sense, what we've measured is a small *piece* of the difference in the experience of Black and White Americans when seeking employment. As anyone looking for a job knows, getting a call-back is obviously a crucial step in getting a job, so this difference—even if it's just one part of the overall difference—is remarkable.

In [61]:
assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}

In [63]:
results

{'ex2_pvalue_female': 0.38,
 'ex2_pvalue_computerskills': 0.03,
 'ex2_pvalue_yearsexp': 0.85,
 'ex3_pvalue_education': 0.49,
 'ex4_validity': 'internal',
 'ex5_white_advantage_percent': 49.68,
 'ex5_white_advantage_percentage_points': 3.2,
 'ex5_pvalue': 4e-05,
 'ex6_black_pvalue': 4e-05,
 'ex8_black_nocollege': -4.05,
 'ex8_black_college': -2.82,
 'ex8_college_heterogeneity': 'less discrimination',
 'ex9_gender_and_discrimination': 'greater discrimination for women',
 'ex10_experiment_v_us': 'greater'}