From [here](https://www.coursera.org/learn/data-what-it-is-what-can-we-do-with-it/lecture/8l5Qa/observational-studies-strategies-for-estimating-causal-effects) and [here](https://www.princeton.edu/~otorres/DID101R.pdf).

In [None]:
library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")

In [None]:
head(mydata)

Unnamed: 0_level_0,country,year,y,y_bin,x1,x2,x3,opinion,op
Unnamed: 0_level_1,<fct>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<dbl>
1,A,1990,1342787840,1,0.2779036,-1.1079559,0.28255358,Str agree,1
2,A,1991,-1899660544,0,0.3206847,-0.94872,0.49253848,Disag,0
3,A,1992,-11234363,0,0.3634657,-0.789484,0.70252335,Disag,0
4,A,1993,2645775360,1,0.246144,-0.885533,-0.09439092,Disag,0
5,A,1994,3008334848,1,0.424623,-0.7297683,0.94613063,Disag,0
6,A,1995,3229574144,1,0.4772141,-0.723246,1.02968037,Str agree,1


In [None]:
# Create a dummy variable to indicate the time when the treatment started. Lets
# assume that treatment started in 1994. In this case, years before 1994 will have a
# value of 0 and 1994+ a 1. If you already have this skip this step.

mydata$time = ifelse(mydata$year >= 1994, 1, 0)

In [None]:
# Create a dummy variable to identify the group exposed to the treatment. In this
# example lets assumed that countries with code 5,6, and 7 were treated (=1).
# Countries 1-4 were not treated (=0). If you already have this skip this step.

mydata$treated = ifelse(mydata$country == "E" |
mydata$country == "F" |
mydata$country == "G", 1, 0)

In [None]:
# Create an interaction between time and treated. We will call this interaction
# ‘did’.

mydata$did = mydata$time * mydata$treated

In [None]:
# Estimating the DID estimator

didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)


Call:
lm(formula = y ~ treated + time + did, data = mydata)

Residuals:
       Min         1Q     Median         3Q        Max 
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  3.581e+08  7.382e+08   0.485   0.6292  
treated      1.776e+09  1.128e+09   1.575   0.1200  
time         2.289e+09  9.530e+08   2.402   0.0191 *
did         -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104 
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249


In [None]:
# The coefficient for ‘did’ is the differences-in-differences
# estimator. The effect is significant at 10% with the treatment having
# a negative effect.

In [None]:
# Estimating the DID estimator (using the multiplication method, no
# need to generate the interaction)

didreg1 = lm(y ~ treated*time, data = mydata)
summary(didreg1)


Call:
lm(formula = y ~ treated * time, data = mydata)

Residuals:
       Min         1Q     Median         3Q        Max 
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.581e+08  7.382e+08   0.485   0.6292  
treated       1.776e+09  1.128e+09   1.575   0.1200  
time          2.289e+09  9.530e+08   2.402   0.0191 *
treated:time -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104 
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249


In [None]:
# The coefficient for ‘treated:time’ is the difference-in-difference estimator (‘did’ in the previous example). The effect is
# significant at 10% with the treatment having a negative effect.