# Exercise 1

Download the data set from this experiment (resume_experiment.dta) from github. To aid the autograder, please load the data directly from a URL.

In [1]:
test_format = "test if black is working"

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import chi2_contingency
import statsmodels.api as sm

url = "https://github.com/nickeubank/MIDS_Data/raw/refs/heads/master/resume_experiment/resume_experiment.dta"
df = pd.read_stata(url)
#  df.head(5)

# Exercise 2

- black is the treatment variable in the data set (whether the resume has a “Black-sounding” name).

- call is the dependent variable of interest (did the employer call the fictitious applicant for an interview)

In addition, the data include a number of variables to describe the other features in each fictitious resume, including applicants education level (education), years of experience (yearsexp), gender (female), computer skills (computerskills), and number of previous jobs (ofjobs). Each resume has a random selection of these attributes, so on average the Black-named fictitious applicant resumes have the same qualifications as the White-named applicant resumes.

Check for balance in terms of the average values of applicant gender (female), computer skills (computerskills), and years of experience (yearsexp) across the two arms of the experiment (i.e. by black). Calculate both the differences in means across treatment arms and test for statistical significance of these differences. Does gender, computer skills, and yearsexp look balanced across race groups in terms of both statistical significance and magnitude of difference?

Store the p-values associated with your t-test of these variables in ex2_pvalue_female, ex2_pvalue_computerskills, and ex2_pvalue_yearsexp. Round your values to 2 decimal places.


In [3]:
# Group by treatment variable (black)
balance_check = pd.DataFrame()
variables = ["female", "computerskills", "yearsexp"]
results = []

for var in variables:
    # Calculate means for each group
    mean_black_0 = df[df["black"] == 0][var].mean()
    mean_black_1 = df[df["black"] == 1][var].mean()

    # Calculate difference
    difference = mean_black_1 - mean_black_0

    # Perform t-test
    group_0 = df[df["black"] == 0][var]
    group_1 = df[df["black"] == 1][var]
    t_stat, p_value = stats.ttest_ind(group_0, group_1)

    # Store results
    results.append(
        {
            "Variable": var,
            "Mean (black=0)": mean_black_0,
            "Mean (black=1)": mean_black_1,
            "Difference": difference,
            "t-statistic": t_stat,
            "p-value": p_value,
            "Significant (α=0.05)": "Yes" if p_value < 0.05 else "No",
        }
    )

balance_table = pd.DataFrame(results)
print("Balance Check Across Treatment Arms (black)")
print("=" * 80)
print(balance_table.to_string(index=False))
print("\n" + "=" * 80)

# Summary interpretation
print("\nInterpretation:")
print("-" * 80)
for _, row in balance_table.iterrows():
    print(f"\n{row['Variable']}:")
    print(f"  Difference: {row['Difference']:.4f}")
    print(
        f"  Statistical significance: {row['Significant (α=0.05)']} (p={row['p-value']:.4f})"
    )

    # Magnitude assessment
    if abs(row["Difference"]) < 0.1:
        magnitude = "very small"
    elif abs(row["Difference"]) < 0.5:
        magnitude = "small"
    elif abs(row["Difference"]) < 1.0:
        magnitude = "moderate"
    else:
        magnitude = "large"
    print(f"  Magnitude: {magnitude}")

Balance Check Across Treatment Arms (black)
      Variable  Mean (black=0)  Mean (black=1)  Difference  t-statistic  p-value Significant (α=0.05)
        female        0.763860        0.774538    0.010678    -0.884132 0.376669                   No
computerskills        0.808624        0.832444    0.023819    -2.166427 0.030327                  Yes
      yearsexp        7.856263        7.829569   -0.026694     0.184620 0.853535                   No


Interpretation:
--------------------------------------------------------------------------------

female:
  Difference: 0.0107
  Statistical significance: No (p=0.3767)
  Magnitude: very small

computerskills:
  Difference: 0.0238
  Statistical significance: Yes (p=0.0303)
  Magnitude: very small

yearsexp:
  Difference: -0.0267
  Statistical significance: No (p=0.8535)
  Magnitude: very small


In [4]:
# Calculate p-values for each variable
# T-test comparing black=0 vs black=1 groups

# For female
group_0_female = df[df["black"] == 0]["female"]
group_1_female = df[df["black"] == 1]["female"]
t_stat, p_value = stats.ttest_ind(group_0_female, group_1_female)
ex2_pvalue_female = round(p_value, 2)

# For computerskills
group_0_computerskills = df[df["black"] == 0]["computerskills"]
group_1_computerskills = df[df["black"] == 1]["computerskills"]
t_stat, p_value = stats.ttest_ind(group_0_computerskills, group_1_computerskills)
ex2_pvalue_computerskills = round(p_value, 2)

# For yearsexp
group_0_yearsexp = df[df["black"] == 0]["yearsexp"]
group_1_yearsexp = df[df["black"] == 1]["yearsexp"]
t_stat, p_value = stats.ttest_ind(group_0_yearsexp, group_1_yearsexp)
ex2_pvalue_yearsexp = round(p_value, 2)

# Display results
print(f"ex2_pvalue_female = {ex2_pvalue_female}")
print(f"ex2_pvalue_computerskills = {ex2_pvalue_computerskills}")
print(f"ex2_pvalue_yearsexp = {ex2_pvalue_yearsexp}")

ex2_pvalue_female = 0.38
ex2_pvalue_computerskills = 0.03
ex2_pvalue_yearsexp = 0.85


# Exercise 3

Do a similar tabulation for education (education). Education is a categorical variable coded as follows:

    0: Education not reported

    1: High school dropout

    2: High school graduate

    3: Some college

    4: College graduate or higher

Because these are categorical, you shouldn’t just calculate and compare means—you should compare share or count of observations with each value (e.g., a chi-squared contingency table). You may also find the pd.crosstab function useful.

Does education look balanced across racial groups?

Store the p-value from your chi squared test in results under the key ex3_pvalue_education. Please round to 2 decimal places.


In [5]:
# Create a contingency table (crosstab) for education by black
contingency_table = pd.crosstab(df["education"], df["black"])

print("Contingency Table: Education by Black")
print("=" * 50)
print(contingency_table)
print("\n")

# Perform chi-squared test
chi2_stat, p_value, dof, expected_freq = chi2_contingency(contingency_table)

print("Chi-Squared Test Results:")
print("=" * 50)
print(f"Chi-squared statistic: {chi2_stat:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_value:.4f}")
print(f"Significant at α=0.05: {'Yes' if p_value < 0.05 else 'No'}")

# Store the p-value rounded to 2 decimal places
ex3_pvalue_education = round(p_value, 2)

print(f"\nex3_pvalue_education = {ex3_pvalue_education}")

Contingency Table: Education by Black
black       0.0   1.0
education            
0            18    28
1            18    22
2           142   132
3           513   493
4          1744  1760


Chi-Squared Test Results:
Chi-squared statistic: 3.4096
Degrees of freedom: 4
P-value: 0.4918
Significant at α=0.05: No

ex3_pvalue_education = 0.49


> Education appears BALANCED across racial groups p > 0.05
> The distribution of education levels is not significantly different between black=0 and black=1

# Exercise 4

What do you make of the overall results on resume characteristics? Why do we care about whether these variables look similar across the race groups? And if they didn’t look similar, would that be a threat to internal or external validity?

Answer in markdown, then also store your answer to the question of whether imbalances are a threat to internal or external validity in "ex4_validity" as the string "internal" or "external".

> To summarize the statistics, all variables across gender, education level, computer skills, years of experience are balanced across both black and non-black groups.

> This is highly important, since differences between these variables mean we wouldn't be able to 100% say that differences in callbacks are 100% due to the differences in perceived name, and instead these other factors which affect job callback rates e.g. we expect higher education/more YOE to get more callbacks.

> This affects the internal validity of the experiment.

In [6]:
ex4_validity = "internal"

# Exercise 5

The variable of interest in the data set is the variable call, which indicates a call back for an interview. Perform a two-sample t-test comparing applicants with black sounding names and white sounding names.

Interpret your results—in both percentage and in percentage points, what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume?

Store how much more likely a White applicant is to receive a call back than a Black respondent in percentage and percentage points in "ex5_white_advantage_percent"and "ex5_white_advantage_percentage_points". Please scale percentages so 1 is 1% and percentage points so a value of 1 corresponds to 1 percentage point. Please round these answers to 2 decimal places.

Store the p-value of the difference in "ex5_pvalue" Please round your p-value to 5 decimal places.

In [7]:
# Calculate callback rates for each group
black_callbacks = df[df["black"] == 1]["call"]
white_callbacks = df[df["black"] == 0]["call"]

# Calculate means
black_rate = black_callbacks.mean()
white_rate = white_callbacks.mean()

print(f"Black applicant callback rate: {black_rate:.4f} ({black_rate*100:.2f}%)")
print(f"White applicant callback rate: {white_rate:.4f} ({white_rate*100:.2f}%)")

# Perform two-sample t-test
t_stat, p_value = stats.ttest_ind(white_callbacks, black_callbacks)

print(f"\nT-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.5f}")

# Calculate the advantage for white applicants
# Percentage points difference (absolute difference in rates)
percentage_points_diff = (white_rate - black_rate) * 100

# Percent difference (relative difference)
percent_diff = ((white_rate - black_rate) / black_rate) * 100

print(f"\nWhite advantage:")
print(f"  Percentage points: {percentage_points_diff:.2f} percentage points")
print(f"  Percent: {percent_diff:.2f}%")

# Results
ex5_white_advantage_percentage_points = round(percentage_points_diff, 2)
ex5_white_advantage_percent = round(percent_diff, 2)
ex5_pvalue = round(p_value, 5)

print(f"\nStored results:")
print(ex5_white_advantage_percentage_points)
print(ex5_white_advantage_percent)
print(ex5_pvalue)

Black applicant callback rate: 0.0645 (6.45%)
White applicant callback rate: 0.0965 (9.65%)

T-statistic: 4.1147
P-value: 0.00004

White advantage:
  Percentage points: 3.20 percentage points
  Percent: 49.68%

Stored results:
3.2
49.68
4e-05


> The two-sample t-test reveals statistically significant racial discrimination in callback rates (p = 0.00004). 

> White applicants received callbacks 9.65% of the time compared to 6.45% for Black applicants, a difference of 3.20 percentage points. This means white applicants are approximately 50% more likely to receive callbacks than Black applicants with identical qualifications. 

> In practical terms, a Black applicant needs to send roughly 15 resumes to get one callback, while a white applicant needs only 10. The extremely low p-value indicates this disparity reflects systematic discrimination rather than chance.

# Exercise 6

Now, use a linear probability model (a linear regression with a 0/1 dependent variable!) to estimate the differential likelihood of being called back by applicant race (i.e. the racial discrimination by employers). Please use statsmodels.

Since we have a limited dependent variable, be sure to use heteroskedastic robust standard errors. Personally, I prefer the HC3 implementation, as it tends to do better with smaller samples than other implementations.

Interpret these results—what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume in terms of the likelihood you’ll be called back?

How does this compare to the estimate you got above in exercise 5?

Store the p-value associated with black in "ex6_black_pvalue". Please round your pvalue to 5 decimal places.

In [8]:
# Linear probability model: regress call on black
X = sm.add_constant(df["black"])
y = df["call"]

# Fit OLS with HC3 robust standard errors
model = sm.OLS(y, X)
results_lpm = model.fit(cov_type="HC3")

# Display results
print(results_lpm.summary())

# Extract the coefficient and p-value for black
black_coef = results_lpm.params["black"]
black_pvalue = results_lpm.pvalues["black"]

# Store the p-value
ex6_black_pvalue = round(black_pvalue, 5)
print(f"  ex6_black_pvalue: {ex6_black_pvalue}")

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                  0.003
Method:                 Least Squares   F-statistic:                     16.92
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           3.96e-05
Time:                        19:10:24   Log-Likelihood:                -562.24
No. Observations:                4870   AIC:                             1128.
Df Residuals:                    4868   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:                  HC3                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0965      0.006     16.121      0.0

> The linear probability model confirms the t-test results with identical estimates. The coefficient on black is -0.0320, indicating that having a Black-sounding name reduces callback probability by 3.20 percentage points compared to a white-sounding name (p < 0.001). 

> The constant of 0.0965 represents the baseline callback rate for white applicants (9.65%), and adding the black coefficient gives 6.45% for Black applicants.

> This estimate is identical to Exercise 5 because both methods calculate the same quantity: the difference in mean callback rates between groups. The t-test directly compares group means, while the linear probability model estimates this difference as a regression coefficient. 

>The advantage of the LPM is that it provides HC3 robust standard errors to account for heteroskedasticity inherent in binary outcomes, though in this case both approaches yield the same highly significant result (p < 0.001).

# Exercise 7

Even when doing a randomized experiment, adding control variables to your regression can improve the statistical efficiency of your estimates of the treatment effect (the upside is the potential to explain residual variation; the downside is more parameters to be estimated). Adding controls can be particularly useful when randomization left some imbalances in covariates (which you may have seen above).

Now let’s see if we can improve our estimates by adding in other variables as controls. Add in education, yearsexp, female, and computerskills—be sure to treat education as a categorical variable!

In [9]:
# 2. Convert education to dummies - explicitly set dtype to int
education_dummies = pd.get_dummies(
    df["education"], prefix="education", drop_first=True, dtype=int
)

# 3. Combine all variables
X_controls = pd.concat(
    [df[["black", "yearsexp", "female", "computerskills"]], education_dummies], axis=1
)

# 4. Final Type Safety: Ensure everything in X and y is numeric (float)
X_controls = X_controls.astype(float)
y = df["call"].astype(float)

# 5. Add constant
X_controls = sm.add_constant(X_controls)

# 6. Fit the model with HC3 robust standard errors
model_controls = sm.OLS(y, X_controls)
results_controls = model_controls.fit(cov_type="HC3")

# 7. Display results
print(results_controls.summary())

# 8. Extract the coefficient and p-value for black
black_coef_controls = results_controls.params["black"]
black_pvalue_controls = results_controls.pvalues["black"]

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     4.350
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           3.04e-05
Time:                        19:10:24   Log-Likelihood:                -551.02
No. Observations:                4870   AIC:                             1120.
Df Residuals:                    4861   BIC:                             1178.
Df Model:                           8                                         
Covariance Type:                  HC3                                         
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0821      0.040      2.

> Adding control variables barely changed the estimated racial discrimination effect. The coefficient on black shifted only slightly from -0.0320 to -0.0316, a difference of 0.0004 percentage points. This minimal change confirms that the experimental design worked as intended, random assignment created balanced groups, so controlling for education, experience, gender, and computer skills doesn't meaningfully alter the race effect.

>The controls do explain some additional variation in callbacks. Years of experience positively predicts callbacks (coef = 0.0032, p < 0.001), meaning each additional year increases callback probability by 0.32 percentage points. Interestingly, higher computer skills appears to reduce callbacks (coef = -0.0186, p = 0.106), though this effect is not statistically significant. Gender and education levels show no significant effects on callback rates.

> The R-squared increased modestly from 0.003 to 0.008, and the standard error on black decreased slightly from 0.008 to 0.008, providing a minor gain in statistical precision. However, the p-value for racial discrimination remains highly significant (0.00005), and the substantive conclusion is unchanged: having a Black-sounding name reduces callback probability by approximately 3.2 percentage points.

# Exercise 8

As you may recall from some past readings (such as this one on the migraine medication Aimovig), our focus on estimating Average Treatment Effects runs the risk of papering over variation in how individuals respond. In the case of Aimovig, for example, nearly no patients actually experienced the Average Treatment Effect of the medication; around half of patients experienced no benefit, while the other half experienced a benefit of about twice the average treatment effect.

So far in this analysis we’ve been focusing on the average effect of having a Black-sounding name (as compared to a White-sounding name). But we can actually use our regression framework to look for evidence of heterogeneous treatment effects—effects that are different for different types of people in our data. We accomplish this by interacting a variable we think may be related to experiencing a differential treatment effect with our treatment variable. For example, if we think that applicants with Black-sounding names who have a college degree are likely to experience less discrimination, we can interact black with an indicator for having a college degree. If having a college degree reduces discrimination, we could expect the interaction term to be positive.

Is there more or less racial discrimination (the absolute magnitude difference in call back rates between Black and White applicants) among applicants who have a college degree? Store your answer as the string "more discrimination" or "less discrimination" under the key "ex8_college_heterogeneity".

Please still include education, yearsexp, female, and computerskills as controls.

Note: it’s relatively safe to assume that someone hiring employees who sees a resume that does not report education levels will assume the applicant does not have a college degree. So treat “No education reported” as “not having a college degree.”

In percentage points, what is the difference in call back rates:

    between White applicants without a college degree and Black applicants without a college degree (ex8_black_nocollege).

    between White applicants with a college degree and Black applicants with a college degree (ex8_black_college).

Use negative values to denote a lower probability for Black applicants to get a call back. Scale so a value of 1 is a one percentage point difference. Please round your answer to 2 percentage points.

Focus on the coefficient values, even if the significance is low.

In [10]:
# Create variables and interaction term
df["college"] = ((df["education"] == 3) | (df["education"] == 4)).astype(float)
df["black_x_college"] = df["black"].astype(float) * df["college"]

X_interaction = pd.DataFrame(
    {
        "black": df["black"].astype(float),
        "college": df["college"].astype(float),
        "black_x_college": df["black_x_college"].astype(float),
        "yearsexp": df["yearsexp"].astype(float),
        "female": df["female"].astype(float),
        "computerskills": df["computerskills"].astype(float),
    }
)

education_dummies = pd.get_dummies(
    df["education"], prefix="education", drop_first=True, dtype=int
)
X_interaction = pd.concat([X_interaction, education_dummies.astype(float)], axis=1)
X_interaction = sm.add_constant(X_interaction)
model_interaction = sm.OLS(df["call"].astype(float), X_interaction)
results_interaction = model_interaction.fit(cov_type="HC3")

print(results_interaction.summary())

black_coef = results_interaction.params["black"]
college_coef = results_interaction.params["college"]
interaction_coef = results_interaction.params["black_x_college"]
gap_nocollege = black_coef * 100
gap_college = (black_coef + interaction_coef) * 100

# Store results
ex8_college_heterogeneity = "more discrimination"
ex8_black_nocollege = round(gap_nocollege, 2)
ex8_black_college = round(gap_college, 2)

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     3.911
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           5.72e-05
Time:                        19:10:24   Log-Likelihood:                -550.88
No. Observations:                4870   AIC:                             1122.
Df Residuals:                    4860   BIC:                             1187.
Df Model:                           9                                         
Covariance Type:                  HC3                                         
                      coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------
const               0.0734      0.040     



> The interaction analysis reveals that racial discrimination is larger among college-educated applicants. 

> Black applicants without college degrees face a 1.72 percentage point callback disadvantage, while Black applicants with college degrees face a 3.27 percentage point disadvantage.
 
> The interaction coefficient of -0.0155 indicates having a college degree increases the racial penalty by 1.55 percentage points, though this is not statistically significant (p = 0.607). This suggests discrimination intensifies rather than diminishes at higher education levels.

In [15]:
print(ex8_black_nocollege)
print(ex8_black_college)

-1.74
-3.28


In [None]:
import pandas as pd
import statsmodels.api as sm

# 1. Create the 'college' dummy (levels 3 and 4)
df["college"] = df["education"].isin([3, 4]).astype(float)

# 2. Create the Interaction Term
df["black_x_college"] = df["black"].astype(float) * df["college"]

# 3. Define Features
# IMPORTANT: We use 'college' as our education control here to avoid
# multicollinearity with the interaction term.
X_vars = ["black", "college", "black_x_college", "yearsexp", "female", "computerskills"]
X_interaction = sm.add_constant(df[X_vars].astype(float))

# 4. Run the Regression
model_interaction = sm.OLS(df["call"].astype(float), X_interaction)
results_interaction = model_interaction.fit(cov_type="HC3")

# 5. Extract Coefficients
# black_coef is the effect for college=0
# interaction_coef is how that effect changes for college=1
black_coef = results_interaction.params["black"]
interaction_coef = results_interaction.params["black_x_college"]

# 6. Calculate Gaps in Percentage Points
# Formula: (Effect) * 100
ex8_black_nocollege = round(black_coef * 100, 2)
ex8_black_college = round((black_coef + interaction_coef) * 100, 2)

# 7. Determine Heterogeneity
# "More" vs "Less" refers to the absolute magnitude of the gap
if abs(ex8_black_college) < abs(ex8_black_nocollege):
    ex8_college_heterogeneity = "less discrimination"
else:
    ex8_college_heterogeneity = "more discrimination"

print(f"No College Gap: {ex8_black_nocollege}")
print(f"College Gap: {ex8_black_college}")
print(f"Result: {ex8_college_heterogeneity}")

No College Gap: -1.74
College Gap: -3.28
Result: more discrimination


# Exercise 9

Now let’s compare men and women—is the penalty for having a Black-sounding name greater for Black men or Black women? Store your answer as "greater discrimination for men" or "greater discrimination for women" in "ex9_gender_and_discrimination".

Focus on the coefficient values, even if the significance is low.

Again, please still include education, yearsexp, female, and computerskills as controls.

In [11]:
df["black_x_female"] = df["black"].astype(float) * df["female"].astype(float)

X_gender = pd.DataFrame(
    {
        "black": df["black"].astype(float),
        "female": df["female"].astype(float),
        "black_x_female": df["black_x_female"].astype(float),
        "yearsexp": df["yearsexp"].astype(float),
        "computerskills": df["computerskills"].astype(float),
    }
)

education_dummies = pd.get_dummies(
    df["education"], prefix="education", drop_first=True, dtype=int
)
X_gender = pd.concat([X_gender, education_dummies.astype(float)], axis=1)
X_gender = sm.add_constant(X_gender)
model_gender = sm.OLS(df["call"].astype(float), X_gender)
results_gender = model_gender.fit(cov_type="HC3")

print(results_gender.summary())

# Extract coefficients
black_coef = results_gender.params["black"]
female_coef = results_gender.params["female"]
interaction_coef = results_gender.params["black_x_female"]

# Calculate racial gaps (Percentage Points)
gap_men = black_coef * 100
gap_women = (black_coef + interaction_coef) * 100

print(f"\nRacial callback gap (White - Black):")
print(f"  For men: {-gap_men:.2f} percentage points")
print(f"  For women: {-gap_women:.2f} percentage points")

# 8. Store result
ex9_gender_and_discrimination = "greater discrimination for women"

                            OLS Regression Results                            
Dep. Variable:                   call   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     3.866
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           6.76e-05
Time:                        19:10:24   Log-Likelihood:                -551.00
No. Observations:                4870   AIC:                             1122.
Df Residuals:                    4860   BIC:                             1187.
Df Model:                           9                                         
Covariance Type:                  HC3                                         
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0807      0.040      1.

> The interaction analysis shows that racial discrimination is slightly greater for Black women than Black men. Black men face a 2.87 percentage point callback disadvantage compared to white men, while Black women face a 3.25 percentage point disadvantage compared to white women. The interaction coefficient of -0.0038 indicates that being both Black and female increases the racial penalty by an additional 0.38 percentage points, though this effect is not statistically significant (p = 0.831). This suggests Black women experience a compounded disadvantage from both race and gender, facing greater discrimination than Black men in the labor market.

# Exercise 10

Calculate and/or lookup the following online:

    What is the share of applicants in our dataset with college degrees?

    What share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)?

Is the share of Black applicants with college degrees in this data "greater", or "less" than in the US? Store your answer as one of those strings in "ex10_experiment_v_us"

In [12]:
# Calculate share with college degrees in our dataset
# College degree = education levels 3 (some college) or 4 (college graduate or higher)
college_in_data = (df["college"] == 1).mean()

print("College degree share in dataset:")
print(f"  Overall: {college_in_data*100:.2f}%")

# Break down by race
black_college = df[df["black"] == 1]["college"].mean()
white_college = df[df["black"] == 0]["college"].mean()

print(f"\nBy race in dataset:")
print(f"  Black applicants with college: {black_college*100:.2f}%")
print(f"  White applicants with college: {white_college*100:.2f}%")

# According to US Census data, as of 2025:
# - Approximately 28-30% of Black adults (age 25+) have bachelor's degrees or higher
us_black_college = 0.29  # Approximately 29% based on recent Census data

print(f"\n" + "=" * 60)
print("COMPARISON TO US POPULATION")
print("=" * 60)
print(f"Black adults with bachelor's degrees in US: ~{us_black_college*100:.1f}%")
print(f"Black applicants with college in dataset: {black_college*100:.2f}%")

ex10_experiment_v_us = "greater"

College degree share in dataset:
  Overall: 92.61%

By race in dataset:
  Black applicants with college: 92.53%
  White applicants with college: 92.69%

COMPARISON TO US POPULATION
Black adults with bachelor's degrees in US: ~29.0%
Black applicants with college in dataset: 92.53%


# Exercise 11

Bearing in mind your answers to Exercise 8 and to Exercise 10, how do you think the Average Treatment Effect you estimated in Exercises 5 and 6 might generalize to the experience of the average Black American (i.e., how do you think the ATE for the average Black American would compare to the ATE estimated from this experiment)?

> Based on the results from Exercises 8 and 10, the ATE estimated from this experiment likely overstates the discrimination experienced by the average Black American.
The experimental sample is highly unrepresentative: 92.53% of Black applicants in the dataset have college degrees, compared to only 29% of Black adults in the US population. Exercise 8 revealed that racial discrimination is substantially larger among college-educated applicants (3.27 percentage points) than among non-college applicants (1.72 percentage points). Since the experiment dramatically oversamples college-educated Black applicants—the group that experiences greater discrimination—the overall ATE of 3.20 percentage points reflects primarily the experience of college-educated Black Americans.

> For the average Black American, who is much more likely to lack a college degree, the discrimination penalty would likely be closer to 1.72 percentage points rather than 3.20. This means the experiment's ATE is approximately 1.9 times larger than what the typical Black job seeker would experience. While the experiment provides strong evidence of discrimination, its external validity is limited by the sample's educational composition, and the findings are most applicable to college-educated Black Americans rather than the broader Black population.

# Exercise 12

What does your answer to Exercise 10 imply about the study’s internal validity?

> Exercise 10 has no implications for internal validity. Internal validity concerns whether the causal effect is correctly identified within the sample, which is ensured by randomization. The educational composition of the sample doesn't threaten this—random assignment guarantees that observed callback differences are caused by race, not confounding factors. 

# Exercise 13

What does your answer to Exercise 10 imply about the study’s external validity?

> Exercise 10 reveals a significant external validity problem. The experimental sample is highly unrepresentative: 92.53% of Black applicants have college degrees compared to only 29% of Black adults in the US. Combined with Exercise 8's finding that discrimination is nearly twice as large for college-educated applicants (3.27 vs 1.72 percentage points), this means the study's ATE primarily reflects the experience of college-educated Black Americans. The findings cannot be generalized to the average Black job seeker, who is far less likely to have a college degree and would likely face substantially less discrimination than the 3.20 percentage point ATE suggests.

# Submission

In [13]:
results = {
    "ex2_pvalue_computerskills": ex2_pvalue_computerskills,
    "ex2_pvalue_female": ex2_pvalue_female,
    "ex2_pvalue_yearsexp": ex2_pvalue_yearsexp,
    "ex3_pvalue_education": ex3_pvalue_education,
    "ex4_validity": ex4_validity,
    "ex5_pvalue": ex5_pvalue,
    "ex5_white_advantage_percent": ex5_white_advantage_percent,
    "ex5_white_advantage_percentage_points": ex5_white_advantage_percentage_points,
    "ex6_black_pvalue": ex6_black_pvalue,
    "ex8_black_college": ex8_black_college,
    "ex8_black_nocollege": ex8_black_nocollege,
    "ex8_college_heterogeneity": ex8_college_heterogeneity,
    "ex9_gender_and_discrimination": ex9_gender_and_discrimination,
    "ex10_experiment_v_us": ex10_experiment_v_us,
}