In this module, we explore heterogeneous treatment effects as they relate to a (fictional) political experiment. Note that in your homework, you will analyze an actual political experiment.

Consider the following setup. Researchers want to understand whether exposing voters to information on homelessness increases their support for taxes on the wealthy. So the treatment is a binary indicator for whether the voter received information on homelessness, and the response variable is a measure of support (from 0 - 100) on raising taxes for the wealthy. The experiment will be conducted by flipping a coin before approaching each voter to determine treatment -- and so with 50% probability, the voter will be exposed to information; and with 50% probability, the voter will not.


In [1]:
# Load libraries: numpy, pandas, and linear regression
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm

Now we construct what is known as the data-generating process. Since this is simulated data (and since we want to see if our methods can pick up on the true subtleties of the data), we have the liberty to construct the sample as we see fit.

We include four covariates: political affiliation, age, gender, and state.

In [11]:
# Set random seed. ensures randomness 
np.random.seed(3)

# For each voter, randomly sample his or her treatment status and covariates
n = 10000
data = pd.DataFrame(np.random.choice([0, 1], n, replace = True), 
                    columns = ['treatment'])
data['political'] = np.random.choice(['r', 'd'], n, replace = True)
data['age'] = np.random.choice(range(18, 66), n, replace = True)
data['gender'] = np.random.choice(['m', 'f', 'o'], n, replace = True)
states = ['ca', 'or', 'wa', 'nv', 'az', 'id', 'mt', 'wy', 'ut', 'co', 'nm']
data['states'] = np.random.choice(states, n, replace = True)

      treatment political  age gender states
0             0         r   43      o     wy
1             0         r   56      m     co
2             1         r   57      m     nm
3             1         r   51      m     wy
4             0         r   21      f     wy
...         ...       ...  ...    ...    ...
9995          0         r   38      o     ca
9996          0         d   42      m     az
9997          0         d   56      m     id
9998          0         r   22      f     ut
9999          0         d   63      f     wy

[10000 rows x 5 columns]


Now we need to simulate the process by which these covariates and treatments drive an outcome. We first specify the treatment effects, and second construct the process by which these treatments, covariates, and treatment effects drive the outcome. Third, we scale the data, to make it more interpretable.

In particular, we choose to have three treatment effects operating:

1.   Democratic voters respond strongly to treatment
2.   Republican voters respond moderately to treatment
3.   Californian voters respond moderately to treatment


In [13]:
# Define the preliminary treatment effects
beta_democrat = 10
beta_republican = 6
beta_california = 5

# Construct the outcome variable (support for taxes on the wealthy) 

data['y'] = (
  (data['political'] == 'd') * 40 + 
  (65 - data['age']) * 2 + 
  #pretreatment abaove ^
  beta_democrat * data['treatment'] * (data['political'] == 'd') + 
  beta_republican * data['treatment'] * (data['political'] == 'r') + 
  beta_california * data['treatment'] * (data['states'] == 'ca') + 
  np.random.normal(0, 20, n))

# Do an optional scaling step, so that the response variable is between 0 - 100
scaling_factor = max(data['y']) - min(data['y'])
data['y'] = round(100 * (data['y'] - min(data['y']))/scaling_factor)
beta_democrat = beta_democrat * 100/scaling_factor
beta_republican = beta_republican * 100/scaling_factor
beta_california = beta_california * 100/scaling_factor

# Demonstrate the revised treatment effects, net of scaling
print("The Democrat treatment effect is: " + str(round(beta_democrat, 2)))
print("The Republican treatment effect is: " + str(round(beta_republican, 2)))
print("The California treatment effect is: " + str(round(beta_california, 2)))

The Democrat treatment effect is: 4.0
The Republican treatment effect is: 2.4
The California treatment effect is: 2.0


In [14]:
print(data)

      treatment political  age gender states     y
0             0         r   43      o     wy  45.0
1             0         r   56      m     co  26.0
2             1         r   57      m     nm  26.0
3             1         r   51      m     wy  36.0
4             0         r   21      f     wy  45.0
...         ...       ...  ...    ...    ...   ...
9995          0         r   38      o     ca  45.0
9996          0         d   42      m     az  52.0
9997          0         d   56      m     id  43.0
9998          0         r   22      f     ut  68.0
9999          0         d   63      f     wy  26.0

[10000 rows x 6 columns]


So let's remind ourselves: what is the expected treatment effect for an Oregon-based Democrat? What is the expected treatment effect for a California-based Republican?

Now, let's begin by running an exploratory regression model, without the treatment effect heterogeneity. What is the effect of treatment? Are the standard errors correct?

In [4]:
# Run the regression model
result = sm.ols(formula="y ~ political + treatment + age + gender + states", 
                data = data).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.749
Model:                            OLS   Adj. R-squared:                  0.748
Method:                 Least Squares   F-statistic:                     1984.
Date:                Tue, 01 Nov 2022   Prob (F-statistic):               0.00
Time:                        02:20:12   Log-Likelihood:                -35100.
No. Observations:               10000   AIC:                         7.023e+04
Df Residuals:                    9984   BIC:                         7.035e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept         88.2352      0.400    220.

Now we actually include treatment effect heterogeneity for the low-dimensional heterogeneity. Here, we can reasonably suspect that political affiliation matters, and the dimensionality is low so we can test for it directly without worrying very much about the multiple hypothesis problem.

In [5]:
# Run the regression model
result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political", 
                data = data).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.750
Model:                            OLS   Adj. R-squared:                  0.750
Method:                 Least Squares   F-statistic:                     1871.
Date:                Tue, 01 Nov 2022   Prob (F-statistic):               0.00
Time:                        02:20:13   Log-Likelihood:                -35077.
No. Observations:               10000   AIC:                         7.019e+04
Df Residuals:                    9983   BIC:                         7.031e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept               

Let's assume the standard errors were in fact correct. What is the treatment effect if you are Democrat, and is it statistically significant? What is the treatment effect if you are Republican, and is it statistically significant? Are they different from each other, and is that difference statistically significant? Which of these questions can you answer easily from this output, and which are difficult from this output?




Given all these complications, let's Bootstrap! This will give us standard errors that are correct and useful.

Notice that we still use regression as the tool here, but we only use the point estimates from each regression model to perform the bootstrap.

In [6]:
# Measure three components over each iteration: the Democrat effect, 
# the Republican effect, and the difference in effects
bootstrap_democrat = []
bootstrap_republican = []
bootstrap_difference = []

# Remember: the sampling design was supposed to be 50-50 treatment-control, but 
# of course due to sampling error, it was slightly off. So we need to reweight
# when we do the Bootstrap iterations
weight_control = sum(data['treatment'] == 1)/n
weight_treatment = 1 - weight_control
probability = ((data['treatment'] == 1) *  weight_treatment + 
               (data['treatment'] == 0) * weight_control)
probability = probability/sum(probability)

# Bootstrap through 300 iterations
# Note that 300 is normally low, but we do this in the interest of time
for i in range(300):

  # Sample observations (with the reweighting)
  index = np.random.choice(range(n), n, replace = True, p = probability)

  # Run the regression model. for eacf sample w/ the iloc function
  result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political", 
                  data = data.iloc[index,]).fit()
  
  # This code is messy, but it basically extracts the two key coefficients: 
  # treatment effect, and treatment effect interacted with being Republican
  output = pd.DataFrame({'cols': result.params.index, 'par': result.params})
  sample_effect = output.iloc[np.where(output['cols'] == 'treatment')[0][0],1]
  sample_effect_republican_addon = output.iloc[np.where(output['cols'] == 'treatment:political[T.r]')[0][0],1]

  # Save the relevant indicators
  bootstrap_democrat.append(sample_effect)
  bootstrap_republican.append(sample_effect + sample_effect_republican_addon)
  bootstrap_difference.append(sample_effect_republican_addon)

  if i % 100 == 0:
    print("On iteration: " + str(i))


# Report summary statistics
print("The Democratic std error is: " + 
      str(round(np.std(bootstrap_democrat), 4)))
print("The Republican std error is: " + 
      str(round(np.std(bootstrap_republican), 4)))
print("The difference std error is: " + 
      str(round(np.std(bootstrap_difference), 4)))

# Remind ourselves of the actual treatment effects estimated from the regression
result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political", 
                data = data).fit()
output = pd.DataFrame({'cols': result.params.index, 'par': result.params})
effect = output.iloc[np.where(output['cols'] == 'treatment')[0][0],1]
effect_republican_addon = output.iloc[np.where(output['cols'] == 'treatment:political[T.r]')[0][0],1]
print("The Democratic treatment effect is: " + str(round(effect, 4)))
print("The Republican treatment_effect is: " + 
      str(round(effect + effect_republican_addon, 4)))
print("The difference in treatment effects is: " + 
      str(round(effect_republican_addon, 4)))

On iteration: 0
On iteration: 100
On iteration: 200
The Democratic std error is: 0.2313
The Republican std error is: 0.2407
The difference std error is: 0.3284
The Democratic treatment effect is: 4.2968
The Republican treatment_effect is: 2.1241
The difference in treatment effects is: -2.1727


In [17]:
print(sum(data['treatment'] == 1))
print(sum(data['treatment'] == 0))
print(weight_control)
print(weight_treatment)

4968
5032
0.4968
0.5032


But suppose you suspect heterogeneity for some more complex variable, e.g. states? How might you test for that? One approach would be to assume that we can estimate it the same way we do for low-dimensional variables, e.g. by regression.

Let's see how that works. What conclusions would you draw?

In [7]:
result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political + treatment * states",
                data = data).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.750
Model:                            OLS   Adj. R-squared:                  0.750
Method:                 Least Squares   F-statistic:                     1154.
Date:                Tue, 01 Nov 2022   Prob (F-statistic):               0.00
Time:                        02:21:13   Log-Likelihood:                -35066.
No. Observations:               10000   AIC:                         7.019e+04
Df Residuals:                    9973   BIC:                         7.038e+04
Df Model:                          26                                         
Covariance Type:            nonrobust                                         
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept               

Given that, we need to adjust in some way. The state-of-the-art ways -- Ridge Regression, Causal Forests, etc -- are beyond the scope of today's class in practical terms. Today, we'll demonstrate some of the spirit of Causal Forests, but using regression rather than random forests.

In [18]:
# We split the sample into two components, using the first component to identify
# statistically significant heterogeneity and the second component to estimate 
# the actual treatment effects
np.random.seed(4)

# Note that this isn't the Bootstrap: we are just splitting the sample
index = np.random.choice(range(n), round(n/2), replace = False)

In [19]:
# Now run the first regression with half the data
result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political + treatment * states", 
                data = data.drop(index)).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.746
Model:                            OLS   Adj. R-squared:                  0.745
Method:                 Least Squares   F-statistic:                     562.7
Date:                Tue, 01 Nov 2022   Prob (F-statistic):               0.00
Time:                        02:53:32   Log-Likelihood:                -17597.
No. Observations:                5000   AIC:                         3.525e+04
Df Residuals:                    4973   BIC:                         3.542e+04
Df Model:                          26                                         
Covariance Type:            nonrobust                                         
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept               

In [20]:
# Now run the second regression with the remaining half the data
result = sm.ols(formula="y ~ political + treatment + age + gender + states + treatment * political + treatment * states", 
                data = data.iloc[index]).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.756
Model:                            OLS   Adj. R-squared:                  0.755
Method:                 Least Squares   F-statistic:                     593.1
Date:                Tue, 01 Nov 2022   Prob (F-statistic):               0.00
Time:                        02:55:40   Log-Likelihood:                -17452.
No. Observations:                5000   AIC:                         3.496e+04
Df Residuals:                    4973   BIC:                         3.513e+04
Df Model:                          26                                         
Covariance Type:            nonrobust                                         
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept               