# Mastering Metrics Ch 5 - Differences in Differences

Differences in differences is a technique that works by examining the way a trend changes with a change in some input. In it's simplest form, we just work out (over some period of time) the change in some observable feature of the treatment group and compare it to the change in the same feature for the control group. For example, give one group a drug and the other a placebo and examine which group has more people who lose weight. 

This method makes the assumption that the trend *should be* the same which may not be true. Thus, fr such observations to be meaningful, we might want to see evidence of a similar trend is the observed variable in each group before the treatment, or after the treatment is turned off.

One way of applying DD is to use a regression. Here is what that looks like with dummy variables:

$$Y_{xt} = \alpha + \beta TREAT_x + \gamma POST_t + \delta_{DD} (TREAT_x \times POST_t) + e_{xt}$$

$\delta_{DD}$ is our measure of causal effect (as it will capture the difference between the cases where treatment is applied (by group and time) and those in which either is not true (where it will resolve to zero).

Let's run it for real on some data that I've just written out from the textbook.

In [14]:
#Imports
import pandas as pd

#Set up data:
stlouis  = [170, 165, 135, 125, 115, 110]
atlanta  = [142, 138, 125, 118, 105, 105]
year     = [1929, 1930, 1931, 1932, 1933, 1934]

#Create dataframe:
data_df = pd.DataFrame()
data_df['Y'] = atlanta + stlouis

#Create some dummies
data_df['TREAT'] = [1] * 6 + [0] * 6
data_df['POST'] = [x >= 1931 for x in year] * 2
data_df['TREATxPOST'] = data_df['TREAT'] * data_df['POST']

data_df

Unnamed: 0,Y,TREAT,POST,TREATxPOST
0,142,1,False,0
1,138,1,False,0
2,125,1,True,1
3,118,1,True,1
4,105,1,True,1
5,105,1,True,1
6,170,0,False,0
7,165,0,False,0
8,135,0,True,0
9,125,0,True,0


In [28]:
#Run a regression:
#(StatsModels)
import statsmodels.api as sm

#Add a constant (for alpha):
data_df['const'] = 1

#Run regression
data_df = data_df.astype(float)    #VERY IMPORTANT#

lr = sm.OLS(exog = data_df[['const', 'TREAT', 'POST', 'TREATxPOST']],
       endog = data_df['Y'])
result = lr.fit()
result.summary()

  "anyway, n=%i" % int(n))


0,1,2,3
Dep. Variable:,Y,R-squared:,0.866
Model:,OLS,Adj. R-squared:,0.816
Method:,Least Squares,F-statistic:,17.25
Date:,"Wed, 30 Nov 2016",Prob (F-statistic):,0.000748
Time:,16:50:15,Log-Likelihood:,-41.303
No. Observations:,12,AIC:,90.61
Df Residuals:,8,BIC:,92.55
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,167.5000,6.548,25.581,0.000,152.401 182.599
TREAT,-27.5000,9.260,-2.970,0.018,-48.854 -6.146
POST,-46.2500,8.020,-5.767,0.000,-64.743 -27.757
TREATxPOST,19.5000,11.341,1.719,0.124,-6.653 45.653

0,1,2,3
Omnibus:,0.564,Durbin-Watson:,1.535
Prob(Omnibus):,0.754,Jarque-Bera (JB):,0.591
Skew:,0.307,Prob(JB):,0.744
Kurtosis:,2.102,Cond. No.,8.44


These numbers match those in the textbook (including the standard errors).

### What if we want to do something bigger and more complicated?
Well, we can apply the above (or similar). For example, we can encode for each of many groups over each of many periods some output $Y_i$ and some combination of inputs. For example, say we want to look at the different states in the US through the period 1971 -> 1988 (a period in which many changed their minimum drinking age) we can do so but using a big sparse dummy matrix for a each state ($STATE_s$), a big sparse dummy matrix for each year ($YEAR_t$) and a matrix for the proportion of each state in each year (within the contentious age range of 18 -> 21) which is allowed to drink. Written out this gives us:
$$Y_{st} = \alpha + \delta_{DD} LEGAL_{st} + \sum_{k = state_1}^{state_n} \beta_k STATE_{ks} + \sum_{j = year_1}^{year_n} \gamma_k YEAR_{jt} + e_{st}$$


### This is all well and good, but aren't we still assuming that each group has it's own trend which holds through on either side of the treatment?
Yes. If we have enough data and want to have a different slope fitted on each section, we just need to add a control into the regression for "slope for each given time block":
$$Y_{st} = \alpha + \delta_{DD} LEGAL_{st} + \sum_{k = state_1}^{state_n} \beta_k STATE_{ks} + \sum_{j = year_1}^{year_n} \gamma_k YEAR_{jt} + \sum_{k = state_1}^{state_n} \theta_k (STATE_{ks} \times t) + e_{st}$$

### What if some groups are very different sizes?
We can use 'weighted least squares'. This should not significantly change the result (after all; there's no reason the treatment effect in a large group will be larger, or that we should trust this treatment effect magnitude more) but it reflects the fact that larger groups should have less volatility and thus be more likely to be measured accurately, and a WLS allows us to weight the samples which we trust more for that reason. This should have the effect of increasing the precision (if a set of assumptions are met).

Essentially, just pray that the WLS result is similar to the OLS and if so it's a good sign.