## Difference-in-Difference (DID)

One of the most well-known estimators is probably Difference-in-Difference (DID). It compares the differences pre- and post-experiment of treatment and control.

In [None]:
from azcausal.data import CaliforniaProp99

# load an example data set with the columns Year, State, PacksPerCapita, treated.
df = CaliforniaProp99().load()

# rename the columns
df = df.rename(columns=dict(State='unit', Year='time', PacksPerCapita='outcome', treated='intervention'))

df.head(3)

### Panel

First, create a `Panel` object given the data.

In [None]:
from azcausal.core.panel import Panel
from azcausal.util import to_matrix, intervention_from_outcome

# convert to matrices where the index represents each Year (time) and each column a state (unit)
outcome = to_matrix(df, "time", "unit", "outcome", fillna=0.0)

# the time when the intervention started
start_time = df.query("intervention == 1")["time"].min()

# the units that have been treated
treat_units = list(df.query("intervention == 1")["unit"].unique())

# create the intervention matrix
intervention = intervention_from_outcome(outcome, start_time, treat_units)

# create a panel combining outcome and treatment
panel = Panel(outcome, intervention)

Then, run the DID analysis (very efficient on pre-formatted panel data).

In [None]:
from azcausal.core.error import Bootstrap
from azcausal.core.parallelize import Pool, Joblib
from azcausal.estimators.panel.did import DID

# initialize an estimator object, here difference in difference (did)
estimator = DID()

# run the estimator
result = estimator.fit(panel)

# plot the results
estimator.plot(result, title="CaliforniaProp99", show=False)

# create a process pool for parallelization
pool = Joblib(n_jobs=5, progress=False)

# run the error validation method
estimator.error(result, Bootstrap(n_samples=101), parallelize=pool)

# print out information about the estimate
print(result.summary(title="CaliforniaProp99"))

### Regression

In [None]:
from azcausal.estimators.panel.did import DIDRegressor

# initialize an estimator object, here difference in difference (did)
estimator = DIDRegressor()

# fit the estimator
result = estimator.fit(df)

# print out information about the estimate
print(result.summary(title="CaliforniaProp99", conf=95))

### Event Study

Exclude the time period -1 and set all DiD in relation to it

In [None]:
from azcausal.estimators.panel.did import EventStudy

# now perform the event study
estimator = EventStudy(n_pre=None, exclude=-1)

# you can fit this with a data frame or panel
result = estimator.fit(df)

# show the results in a plot
estimator.plot(result)

Assume that we run DID also before the experiment.

In [None]:
from azcausal.estimators.panel.did import EventStudy

# now perform the event study
estimator = EventStudy(n_pre=8, exclude=None)

# you can fit this with a data frame or panel
result = estimator.fit(df)

# show the results in a plot
estimator.plot(result)