# Minimum Wage Example Notebook with DiD

This notebook implements Difference-in-Differences in an application on
the effect of minimum wage changes on teen employment. We use data from
[Callaway
(2022)](https://bcallaway11.github.io/files/Callaway-Chapter-2022/main.pdf). The data are annual county level data from the United States covering 2001 to 2007. The outcome variable is log county-level teen employment, and the treatment variable is an indicator for whether the county has a minimum wage above the federal minimum wage. Note that this definition of the treatment variable makes the analysis straightforward but ignores the nuances of the exact value of the minimum wage in each county and how far those values are from the federal minimum. The data also include county population and county average annual pay.
See [Callaway and Sant’Anna
(2021)](https://www.sciencedirect.com/science/article/abs/pii/S0304407620303948)
for additional details on the data.

First, we will load some libraries.

In [1]:
!pip install doubleml

import numpy as np
import pandas as pd
import doubleml as dml
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression, LogisticRegression, LassoCV, RidgeCV, LogisticRegressionCV
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.preprocessing import PolynomialFeatures
import patsy
import warnings

warnings.filterwarnings("ignore")
np.random.seed(772023)

## Loading the data

In [2]:
data = pd.read_csv("https://raw.githubusercontent.com/CausalAIBook/MetricsMLNotebooks/main/data/minwage_data.csv", index_col=0)

In [3]:
data.head()

Unnamed: 0,countyreal,state_name,year,FIPS,emp0A01_BS,quarter,censusdiv,pop,annual_avg_pay,state_mw,fed_mw,treated,G,lemp,lpop,lavg_pay,region,ever_treated,id
1,2013,Alaska,2001,2013,15,1,9,2459,22155,5.65,5.15,1,2001,2.70805,7.80751,10.005818,4,1,2013
2,2013,Alaska,2002,2013,17,1,9,2664,28447,5.65,5.15,1,2001,2.833213,7.887584,10.255798,4,1,2013
3,2013,Alaska,2003,2013,12,1,9,2715,30184,7.15,5.15,1,2001,2.484907,7.906547,10.315067,4,1,2013
4,2013,Alaska,2004,2013,13,1,9,2677,27557,7.15,5.15,1,2001,2.564949,7.892452,10.224012,4,1,2013
5,2013,Alaska,2005,2013,11,1,9,2646,30396,7.15,5.15,1,2001,2.397895,7.880804,10.322066,4,1,2013


### Data Preparation

We remove observations that are already treated in the first observed period (2001). We drop all variables that we won't use in our analysis.

In [4]:
data = data.loc[(data.G==0) | (data.G>2001)]
data.drop(columns=["countyreal","state_name","FIPS","emp0A01_BS",
                   "quarter", "censusdiv","pop","annual_avg_pay",
                   "state_mw","fed_mw", "ever_treated"], inplace=True)

Next, we create the treatment groups. We focus our analysis exclusively on the set of counties that had wage increases away from the federal minimum wage in 2004. That is, we treat 2003 and earlier as the pre-treatment period.

In [5]:
years = [2001,2002,2003,2004,2005,2006,2007]
treat, cont = [], []
for year in years:
    treat.append(data.loc[(data.G == 2004) & (data.year == year)].copy())
    cont.append(data.loc[((data.G == 0) | (data.G > year)) & (data.year == year)].copy())

We assume that the basic assumptions, particularly parallel trends, hold after conditioning on pre-treatment variables: 2001 population, 2001 average pay and 2001 teen employment, as well as the region in which the county is located. (The region is characterized by four
categories.)

Consequently, we want to extract the control variables for both treatment and control group in 2001.

In [6]:
treat[0].drop(columns=["year","G","region","treated"], inplace=True)
cont[0].drop(columns=["year","G","region","treated"], inplace=True)

2003 serves as the pre-treatmeny period for both counties that do receive the treatment in 2004 and those that do not.

In [7]:
treatB = pd.merge(treat[2], treat[0], on = "id", suffixes = ["_pre","_0"])
treatB.drop(columns = ["treated","lpop_pre","lavg_pay_pre","year","G"], inplace= True)

contB = pd.merge(cont[2], cont[0], on = "id", suffixes = ["_pre","_0"])
contB.drop(columns = ["treated","lpop_pre","lavg_pay_pre","year","G"], inplace= True)

We estimate the ATET in 2004-2007, which corresponds to the effect in the year of treatment as well as in the three years after the treatment. The control observations are the observations that still have the federal minimum wage in each year. (The control group is shrinking in each year as additional units receive treatment).

In [8]:
tdid, cdid = [], []
for year in [3,4,5,6]:
    treat[year].drop(columns=["lpop","lavg_pay","year","G","region"], inplace=True)
    cont[year].drop(columns=["lpop","lavg_pay","year","G","region"], inplace=True)

    tdid.append(pd.merge(treat[year], treatB, on = "id"))
    tdid[year-3]["dy"] = tdid[year-3]["lemp"] - tdid[year-3]["lemp_pre"]
    tdid[year-3].drop(columns=["id","lemp","lemp_pre"], inplace=True)

    cdid.append(pd.merge(cont[year], contB, on = "id"))
    cdid[year-3]["dy"] = cdid[year-3]["lemp"] - cdid[year-3]["lemp_pre"]
    cdid[year-3].drop(columns=["id","lemp","lemp_pre"], inplace=True)


### Estimation of the ATET with DML

We estimate the ATET of the county level minimum wage being larger than the federal minimum with the DML algorithm presented in Section 16.3 in the book. This requires estimation of the nuisance functions $E[Y|D=0,X]$, $E[D|X]$ as well as $P(D = 1)$. For the conditional expectation functions, we will consider different modern ML regression methods, namely: Constant (= no controls); a linear combination of the controls; an expansion of the raw control variables including all third order interactions; Lasso (CV); Ridge (CV); Random Forest; Shallow Tree; Deep Tree; and CV Tree.
The methods indicated with CV have their tuning parameter selected by cross-validation.

We implement a helper for fitting the constant value model.

In [9]:
class DummyClassifier(object):
    def __init__(self, strategy=None):
        self._estimator_type = "classifier"
        pass
    def get_params(self, deep=True):
        return dict()
    def set_params(self):
        pass
    def fit(self, X, y):
        self.classes_ = np.unique(y)
        self.prediction = np.mean(y)
    def predict_proba(self, X):
        return np.ones((X.shape[0],2)) * self.prediction

The following code block implements the DML cross-fitting procedure. Please note, that it will run for a while (around 15min). We do not implement the "Best" learner in python, because of the structre of the DML implementation in `DoubleML`. 

In [10]:
att = np.zeros((4,9))
se_att = np.zeros((4,9))
RMSE_d = np.zeros((4,9))
RMSE_y = np.zeros((4,9))
for year in range(3): # These are the years 2004, 2005, 2006, 2007
        print(f"Estimating ATET for year {2004+year}. Please wait.")
        did_data = pd.concat((tdid[year], cdid[year]))
        dummy_data = pd.get_dummies(did_data.region)
        dummy_data = dummy_data.rename(columns=lambda x: 'region_' + str(x))
        did_data = pd.concat((did_data, dummy_data.drop(columns=["region_4"])), axis=1)

        dml_data = dml.DoubleMLData(data = did_data, x_cols=["lemp_0","lpop_0","lavg_pay_0", 
                                                        "region_1", "region_2", "region_3"],
                                                y_col="dy",
                                                d_cols="treated")

        learners = [{"ml_g": DummyRegressor(strategy="mean"), "ml_m": DummyClassifier(strategy="mean")},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LassoCV(n_jobs=-1), "ml_m": LogisticRegressionCV(penalty="l1", solver="liblinear", n_jobs=-1)},
                {"ml_g": RidgeCV(), "ml_m": LogisticRegressionCV(n_jobs=-1)},
                {"ml_g": RandomForestRegressor(n_estimators=1000, max_features=4, n_jobs=-1), 
                "ml_m": RandomForestClassifier(n_estimators=1000, max_features=4, n_jobs=-1)},
                {"ml_g": DecisionTreeRegressor(max_depth=15, ccp_alpha=0, min_samples_split=10), 
                "ml_m": DecisionTreeClassifier(max_depth=15, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(max_depth=3, ccp_alpha=0, min_samples_split=10),
                "ml_m": DecisionTreeClassifier(max_depth=3, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(),
                "ml_m": DecisionTreeClassifier()}]

        for i in [0,1,5,6,7]: # Constant, Baseline, Random Forest, Deep Tree and Shallowtree
                dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att[year,i] = dml_obj._coef
                se_att[year,i] = dml_obj._se
                RMSE_d[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for Region Specific index
        i = 2
        formula = " ~ region_1 * (lemp_0 + lpop_0 + lavg_pay_0) + region_2 * (lemp_0 + lpop_0 + lavg_pay_0) + region_3 * (lemp_0 + lpop_0 + lavg_pay_0)"
        design_matrix = patsy.dmatrix(formula, data=did_data)

        dml_data_reg = dml.DoubleMLData.from_arrays(x=design_matrix,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        dml_obj = dml.DoubleMLDID(dml_data_reg, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.fit()
        att[year,i] = dml_obj._coef
        se_att[year,i] = dml_obj._se
        RMSE_d[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for polynomial features
        pf = PolynomialFeatures(degree=3)
        poly_X = pf.fit_transform(did_data[["lemp_0","lpop_0","lavg_pay_0","region_1", "region_2", "region_3"]])

        dml_data_poly = dml.DoubleMLData.from_arrays(x=poly_X,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        for i in [3,4]:
                dml_obj = dml.DoubleMLDID(dml_data_poly, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att[year,i] = dml_obj._coef
                se_att[year,i] = dml_obj._se
                RMSE_d[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # run cross-validated tree
        i = 8
        grid = {"ml_m": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)},
                "ml_g": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)}}

        dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.tune(grid, n_jobs_cv=-1)
        dml_obj.fit()
        att[year,i] = dml_obj._coef
        se_att[year,i] = dml_obj._se
        RMSE_d[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

Estimating ATET for year 2004. Please wait.
Estimating ATET for year 2005. Please wait.
Estimating ATET for year 2006. Please wait.


The year `2007` has no observations of untreated individuals in region `1`. Thus, we need to do some adjustments for that year.

In [11]:
for year in [3]: # These are the years 2007
        print(f"Estimating ATET for year {2004+year}. Please wait.")
        did_data = pd.concat((tdid[year], cdid[year]))
        dummy_data = pd.get_dummies(did_data.region)
        dummy_data = dummy_data.rename(columns=lambda x: 'region_' + str(x))
        did_data = pd.concat((did_data, dummy_data.drop(columns=["region_4"])), axis=1)

        dml_data = dml.DoubleMLData(data = did_data, x_cols=["lemp_0","lpop_0","lavg_pay_0", 
                                                        "region_3", "region_2"],
                                                y_col="dy",
                                                d_cols="treated")

        learners = [{"ml_g": DummyRegressor(strategy="mean"), "ml_m": DummyClassifier(strategy="mean")},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LassoCV(n_jobs=-1), "ml_m": LogisticRegressionCV(penalty="l1", solver="liblinear", n_jobs=-1)},
                {"ml_g": RidgeCV(), "ml_m": LogisticRegressionCV(n_jobs=-1)},
                {"ml_g": RandomForestRegressor(n_estimators=1000, max_features=4, n_jobs=-1), 
                "ml_m": RandomForestClassifier(n_estimators=1000, max_features=4, n_jobs=-1)},
                {"ml_g": DecisionTreeRegressor(max_depth=15, ccp_alpha=0, min_samples_split=10), 
                "ml_m": DecisionTreeClassifier(max_depth=15, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(max_depth=3, ccp_alpha=0, min_samples_split=10),
                "ml_m": DecisionTreeClassifier(max_depth=3, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(),
                "ml_m": DecisionTreeClassifier()}]

        for i in [0,1,5,6,7]: # Constant, Baseline, Random Forest, Deep Tree and Shallowtree
                dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att[year,i] = dml_obj._coef
                se_att[year,i] = dml_obj._se
                RMSE_d[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for Region Specific index
        i = 2
        formula = " ~ region_2 * (lemp_0 + lpop_0 + lavg_pay_0) + region_3 * (lemp_0 + lpop_0 + lavg_pay_0)"
        design_matrix = patsy.dmatrix(formula, data=did_data)

        dml_data_reg = dml.DoubleMLData.from_arrays(x=design_matrix,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        dml_obj = dml.DoubleMLDID(dml_data_reg, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.fit()
        att[year,i] = dml_obj._coef
        se_att[year,i] = dml_obj._se
        RMSE_d[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for polynomial features
        pf = PolynomialFeatures(degree=3)
        poly_X = pf.fit_transform(did_data[["lemp_0","lpop_0","lavg_pay_0", "region_2", "region_3"]])

        dml_data_poly = dml.DoubleMLData.from_arrays(x=poly_X,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        for i in [3,4]:
                dml_obj = dml.DoubleMLDID(dml_data_poly, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att[year,i] = dml_obj._coef
                se_att[year,i] = dml_obj._se
                RMSE_d[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # run cross-validated tree
        i = 8
        grid = {"ml_m": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)},
                "ml_g": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)}}

        dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.tune(grid, n_jobs_cv=-1)
        dml_obj.fit()
        att[year,i] = dml_obj._coef
        se_att[year,i] = dml_obj._se
        RMSE_d[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

Estimating ATET for year 2007. Please wait.


We start by reporting the RMSE obtained during cross-fitting for each learner in each period.

In [12]:
table1y = pd.DataFrame(RMSE_y[:,0:9].T, columns = ["2004","2005","2006","2007"], 
                       index = ["No Controls", "Basic", "Expansion", "Lasso (CV)", "Ridge (CV)",
                                "Random Forest","Deep Tree", "Shallow Tree", "Tree (CV)"])

table1y

Unnamed: 0,2004,2005,2006,2007
No Controls,0.355682,0.387948,0.418108,0.453009
Basic,0.366668,0.370151,0.404329,0.434847
Expansion,0.352975,0.384314,0.401772,0.445056
Lasso (CV),0.349643,0.368332,0.40583,0.433699
Ridge (CV),0.33587,0.379488,0.407717,0.4457
Random Forest,0.371786,0.379539,0.441016,0.486183
Deep Tree,0.426649,0.415303,0.486346,0.574548
Shallow Tree,0.393449,0.378987,0.426423,0.469549
Tree (CV),0.353772,0.384895,0.417904,0.452951


In [13]:
table1d = pd.DataFrame(RMSE_d[:,0:9].T, columns = ["2004","2005","2006","2007"], 
                       index = ["No Controls", "Basic", "Expansion", "Lasso (CV)", "Ridge (CV)",
                                "Random Forest","Deep Tree", "Shallow Tree", "Tree (CV)"])

table1d

Unnamed: 0,2004,2005,2006,2007
No Controls,0.198375,0.200732,0.21112,0.250505
Basic,0.194086,0.196242,0.204615,0.223249
Expansion,0.193859,0.197086,0.205414,0.223516
Lasso (CV),0.198695,0.201153,0.211804,0.251897
Ridge (CV),0.195181,0.197515,0.206271,0.222062
Random Forest,0.202125,0.200904,0.213112,0.23808
Deep Tree,0.226236,0.242409,0.234684,0.261518
Shallow Tree,0.196959,0.197994,0.206247,0.228207
Tree (CV),0.198375,0.200732,0.21112,0.239808


Here we see that the Deep Tree systematically performs worse in terms of cross-fit predictions than the other learners for both tasks and that Expansion performs similarly poorly for the outcome prediction. It also appears there is some signal in the regressors, especially for the propensity score, as all methods outside of Deep Tree and Expansion produce smaller RMSEs than the No Controls baseline. The other methods all produce similar RMSEs, with a small edge going to Ridge and Lasso. While it would be hard to reliably conclude which of the relatively good performing methods is statistically best here, one could exclude Expansion and Deep Tree from further consideration on the basis of out-of-sample performance suggesting
they are doing a poor job approximating the nuisance functions. Best (or a different ensemble) provides a good baseline that is principled in the sense that one could pre-commit to using the best learners without having first looked at the subsequent estimation results.

We report estimates of the ATET in each period in the following table.

In [14]:
table2 =np.zeros((18, 4))
table2[np.arange(0,18,2),] = att.T
table2[np.arange(1,18,2),] = se_att.T
table2 = pd.DataFrame(table2, columns=["2004","2005","2006","2007"],
                      index = ["No Controls","s.e.","Basic","s.e.",
                               "Expansion","s.e.","Lasso (CV)","s.e.",
                               "Ridge (CV)","s.e.","Random Forest","s.e.",
                               "Deep Tree","s.e.","Shallow Tree","s.e.",
                               "Tree (CV)","s.e."])

table2

Unnamed: 0,2004,2005,2006,2007
No Controls,-0.040141,-0.076152,-0.11683,-0.130854
s.e.,0.018996,0.020104,0.019786,0.022568
Basic,-0.025488,-0.04994,-0.054026,-0.069152
s.e.,0.019273,0.020061,0.019319,0.022559
Expansion,-0.023411,-0.048877,-0.050754,-0.062235
s.e.,0.019714,0.021169,0.019834,0.025347
Lasso (CV),-0.035736,-0.048229,-0.060006,-0.072672
s.e.,0.019053,0.019824,0.019619,0.022104
Ridge (CV),-0.025642,-0.043243,-0.049233,-0.054087
s.e.,0.0192,0.020162,0.019443,0.023782


Here, we see that all methods provide point estimates that suggest the effect of the minimum wage increase leads to decreases in youth employment with small effects in the initial period that become larger in the years following the treatment. This pattern seems economically plausible as it may take time for firms to adjust employment and other input choices in response to a minimum wage change. In the estimates that are reported in the book we have values that are not consistent with this pattern, however, they systematically underperform in terms of having poor cross-fit prediction performance. In terms of point estimates, the other pattern that emerges is that all estimates that use the covariates produce ATET estimates that are systematically smaller in magnitude than the No Controls baseline, suggesting that failing to include the controls may lead to overstatement of treatment effects in this example.

Turning to inference, we would reject the hypothesis of no minimum wage effect two or more years after the change at the 5% level, even after multiple testing correction, if we were to focus on many of the estimators.

### Assess pre-trends

Because we have data for the period 2001-2007, we can perform a so-called pre-trends test to provide some evidence about the plausibility of the conditional parallel trends assumption. Specifically, we can continue to use 2003 as the reference period but now consider 2002 to be the treatment period. Sensible economic mechanisms underlying the assumption would then typically suggest that the ATET in 2002 - before the 2004 minimum wage change we are considering - should be zero. Finding evidence that the ATET in 2002 is non-zero then calls into question the validity of the assumption.

We change the treatment status of those observations, which received treatment in 2004 in the 2002 data and create a placebo treatment as well as control group.

In [15]:
treat[1].drop(columns=["lpop","lavg_pay","year","G","region"], inplace=True)
treat[1].treated = 1  # Code these observations as treated

tdid02 = pd.merge(treat[1], treatB, on = "id")
tdid02["dy"] = tdid02["lemp"] - tdid02["lemp_pre"]
tdid02.drop(columns=["id","lemp","lemp_pre"], inplace=True)

cont[1].drop(columns=["lpop","lavg_pay","year","G","region"], inplace=True)

cdid02 = pd.merge(cont[1], contB, on = "id")
cdid02["dy"] = cdid02["lemp"] - cdid02["lemp_pre"]
cdid02.drop(columns=["id","lemp","lemp_pre"], inplace=True)

We repeat the exercise for obtaining our ATET estimates and standard error for 2004-2007. Particularly, we also use all the learners as mentioned above.

In [16]:
att_pre = np.zeros((1,9))
se_att_pre = np.zeros((1,9))
RMSE_d_pre = np.zeros((1,9))
RMSE_y_pre = np.zeros((1,9))
for year in range(1): # Only year 2002
        print(f"Estimating ATET for year {2002+year}. Please wait.")
        did_data = pd.concat((tdid02, cdid02))
        dummy_data = pd.get_dummies(did_data.region)
        dummy_data = dummy_data.rename(columns=lambda x: 'region_' + str(x))
        did_data = pd.concat((did_data, dummy_data.drop(columns=["region_4"])), axis=1)

        dml_data = dml.DoubleMLData(data = did_data, x_cols=["lemp_0","lpop_0","lavg_pay_0", 
                                                        "region_1", "region_2", "region_3"],
                                                y_col="dy",
                                                d_cols="treated")

        learners = [{"ml_g": DummyRegressor(strategy="mean"), "ml_m": DummyClassifier(strategy="mean")},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LinearRegression(), "ml_m": LogisticRegression()},
                {"ml_g": LassoCV(n_jobs=-1), "ml_m": LogisticRegressionCV(penalty="l1", solver="liblinear", n_jobs=-1)},
                {"ml_g": RidgeCV(), "ml_m": LogisticRegressionCV(n_jobs=-1)},
                {"ml_g": RandomForestRegressor(n_estimators=1000, max_features=4, n_jobs=-1), 
                "ml_m": RandomForestClassifier(n_estimators=1000, max_features=4, n_jobs=-1)},
                {"ml_g": DecisionTreeRegressor(max_depth=15, ccp_alpha=0, min_samples_split=10), 
                "ml_m": DecisionTreeClassifier(max_depth=15, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(max_depth=3, ccp_alpha=0, min_samples_split=10),
                "ml_m": DecisionTreeClassifier(max_depth=3, ccp_alpha=0, min_samples_split=10)},
                {"ml_g": DecisionTreeRegressor(),
                "ml_m": DecisionTreeClassifier()}]

        for i in [0,1,5,6,7]: # Constant, Baseline, Random Forest, Deep Tree and Shallowtree
                dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att_pre[year,i] = dml_obj._coef
                se_att_pre[year,i] = dml_obj._se
                RMSE_d_pre[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y_pre[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for Region Specific index
        i = 2
        formula = " ~ region_1 * (lemp_0 + lpop_0 + lavg_pay_0) + region_2 * (lemp_0 + lpop_0 + lavg_pay_0) + region_3 * (lemp_0 + lpop_0 + lavg_pay_0)"
        design_matrix = patsy.dmatrix(formula, data=did_data)

        dml_data_reg = dml.DoubleMLData.from_arrays(x=design_matrix,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        dml_obj = dml.DoubleMLDID(dml_data_reg, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.fit()
        att_pre[year,i] = dml_obj._coef
        se_att_pre[year,i] = dml_obj._se
        RMSE_d_pre[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y_pre[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # make interaction data for polynomial features
        pf = PolynomialFeatures(degree=3)
        poly_X = pf.fit_transform(did_data[["lemp_0","lpop_0","lavg_pay_0","region_1", "region_2", "region_3"]])

        dml_data_poly = dml.DoubleMLData.from_arrays(x=poly_X,
                                                y=did_data.dy.values,
                                                d=did_data.treated.values)

        for i in [3,4]:
                dml_obj = dml.DoubleMLDID(dml_data_poly, ml_g=learners[i]["ml_g"], 
                                ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
                dml_obj.fit()
                att_pre[year,i] = dml_obj._coef
                se_att_pre[year,i] = dml_obj._se
                RMSE_d_pre[year,i] = dml_obj.rmses["ml_m"]
                RMSE_y_pre[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

        # run cross-validated tree
        i = 8
        grid = {"ml_m": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)},
                "ml_g": {"max_depth": [15], "min_samples_split": [10], "ccp_alpha" : np.linspace(0,0.1,10)}}

        dml_obj = dml.DoubleMLDID(dml_data, ml_g=learners[i]["ml_g"], 
                        ml_m=learners[i]["ml_m"], trimming_threshold=0.05)
        dml_obj.tune(grid, n_jobs_cv=-1)
        dml_obj.fit()
        att_pre[year,i] = dml_obj._coef
        se_att_pre[year,i] = dml_obj._se
        RMSE_d_pre[year,i] = dml_obj.rmses["ml_m"]
        RMSE_y_pre[year,i] = np.mean(dml_obj.rmses["ml_g0"] + dml_obj.rmses["ml_g1"])

Estimating ATET for year 2002. Please wait.


We report the results in the following table.

In [17]:
tableP = np.zeros((4, 9))
tableP[0,:] = RMSE_y_pre
tableP[1,:] = RMSE_d_pre
tableP[2,:] = att_pre
tableP[3,:] = se_att_pre
tableP = pd.DataFrame(tableP.T, columns = ["RMSE Y","RMSE D","ATET","s.e."],
                      index = ["No Controls", "Basic", "Expansion",
                               "Lasso (CV)", "Ridge (CV)", "Random Forest","Deep Tree",
                               "Shallow Tree", "Tree (CV)"])

tableP

Unnamed: 0,RMSE Y,RMSE D,ATET,s.e.
No Controls,0.287126,0.194687,-0.004932,0.013258
Basic,0.295007,0.192069,0.003385,0.013469
Expansion,0.301267,0.191703,0.005378,0.013259
Lasso (CV),0.287478,0.194943,-0.003734,0.013216
Ridge (CV),0.317353,0.192812,0.00046,0.012827
Random Forest,0.308639,0.234943,-0.002581,0.010669
Deep Tree,0.345325,0.255873,0.022914,0.01874
Shallow Tree,0.341059,0.195106,-0.007538,0.013786
Tree (CV),0.285115,0.194687,-0.004861,0.013275


Here we see broad agreement across all methods in the sense of returning point estimates that are small in magnitude and small relative to standard errors. In no case would we reject the hypothesis that the pre-event effect in 2002 is different from zero at usual levels of significance. We note that failing to reject the hypothesis of no pre-event effects certainly does not imply that the conditional DiD assumption is in fact satisfied. For example, confidence intervals include values that would be consistent with relatively large pre-event effects. However, it is reassuring to see that there is not strong evidence of a violation of the underlying identifying assumption.