# 1. Introduction

This shows how to estimate a differences in differences regression model using the Spanish provincial data from Abadie and Gareazabal (2003). Basque Country struggled to gain independence from Spain, and the conflict turned viloent in 1975, following Francisco Franco's death in 1975. This example shows how to use differences-in-differences methodology to estimate the effects of this conflict on regional GDP per capita. Basque Country is used as treatment group, while the region of Catalunia as control.

References:

Abadie, A., & Gardeazabal, J. (2003). The economic costs of conflict: A case study of the Basque Country. American economic review, 93(1), 113-132.


# 2. Preparing The Data: Time and Treatment Dummies

Here we start by importing the original dataset from Abadie and Gareazabal (2003) and generating time and treatment dummies.

In [2]:
# import libraries
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

In [3]:
# read data
data = pd.read_csv('basque_data.csv')

In [6]:
# examine head and tail
data.head()

Unnamed: 0.1,Unnamed: 0,regionno,regionname,year,gdpcap,sec.agriculture,sec.energy,sec.industry,sec.construction,sec.services.venta,sec.services.nonventa,school.illit,school.prim,school.med,school.high,school.post.high,popdens,invest
0,1,1,Spain (Espana),1955,2.354542,,,,,,,,,,,,,
1,2,1,Spain (Espana),1956,2.480149,,,,,,,,,,,,,
2,3,1,Spain (Espana),1957,2.603613,,,,,,,,,,,,,
3,4,1,Spain (Espana),1958,2.637104,,,,,,,,,,,,,
4,5,1,Spain (Espana),1959,2.66988,,,,,,,,,,,,,


In [7]:
data.tail()

Unnamed: 0.1,Unnamed: 0,regionno,regionname,year,gdpcap,sec.agriculture,sec.energy,sec.industry,sec.construction,sec.services.venta,sec.services.nonventa,school.illit,school.prim,school.med,school.high,school.post.high,popdens,invest
769,770,18,Rioja (La),1993,9.132391,,,,,,,,,,,,,16.765787
770,771,18,Rioja (La),1994,9.498,,,,,,,,,,,,,16.469452
771,772,18,Rioja (La),1995,9.752213,,,,,,,,,,,,,20.27565
772,773,18,Rioja (La),1996,10.056413,,,,,,,,,,,,,
773,774,18,Rioja (La),1997,10.476292,,,,,,,,,,,,,


In [21]:
# extract "treatment" data belonging to Basque Country (region # 17 in this dataset)
treatment_data = data.loc[data.regionno == 17]

In [22]:
# extract "control" data for Catalunia (region # 10)
control_data = data.loc[data.regionno == 10]

In [23]:
# drop all variables other than year, gdpcap (GDP per captia), and regionno (region number) from both datasets
treatment_data = treatment_data[['year','gdpcap','regionno']]
control_data = control_data[['year','gdpcap','regionno']]

In [24]:
# vertically concatinate dataframes to get single dataset
merged_data = pd.concat((treatment_data,control_data),ignore_index = True)

In [25]:
# examine merged data
merged_data.head()

Unnamed: 0,year,gdpcap,regionno
0,1955,3.853185,17
1,1956,3.945658,17
2,1957,4.033562,17
3,1958,4.023422,17
4,1959,4.013782,17


In [26]:
merged_data.tail()

Unnamed: 0,year,gdpcap,regionno
81,1993,9.625107,10
82,1994,10.006427,10
83,1995,10.339903,10
84,1996,10.576264,10
85,1997,11.045416,10


In [32]:
# generate treatment (region) and time period dummies
merged_data['treatment_dummy'] = (merged_data['regionno'] == 17).astype(int) # .astype(int) converts True/False values to 1/0
merged_data['time_dummy'] = (merged_data['year'] > 1975).astype(int)

In [30]:
# add constant
merged_data = sm.add_constant(merged_data)

In [33]:
# examine data one final time
merged_data.head()

Unnamed: 0,const,year,gdpcap,regionno,treatment_dummy,time_dummy
0,1.0,1955,3.853185,17,1,0
1,1.0,1956,3.945658,17,1,0
2,1.0,1957,4.033562,17,1,0
3,1.0,1958,4.023422,17,1,0
4,1.0,1959,4.013782,17,1,0


In [34]:
merged_data.tail()

Unnamed: 0,const,year,gdpcap,regionno,treatment_dummy,time_dummy
81,1.0,1993,9.625107,10,0,1
82,1.0,1994,10.006427,10,0,1
83,1.0,1995,10.339903,10,0,1
84,1.0,1996,10.576264,10,0,1
85,1.0,1997,11.045416,10,0,1


# Estimate Differences-in-Differences Model

In [43]:
# first, estimate average GDP per capita in Basque country AFTER conflic (post-treatment)
merged_data.loc[(merged_data['year'] > 1975) & (merged_data['regionno'] == 10)]['gdpcap'].mean()

8.582765304297181

In [44]:
# next, compare than to average GDP per capita in Basque country BEFORE conflic (pre-treatment)
merged_data.loc[(merged_data['year'] < 1975) & (merged_data['regionno'] == 10)]['gdpcap'].mean()

5.1495144606249506

Note that while GDP per capita went UP following 1975 conflict, this does not mean war is good for the economy...let's use differences-in-differences to tease out conflict effects.

In [69]:
# first, add interaction term
merged_data['interaction'] = merged_data['treatment_dummy']*merged_data['time_dummy']

In [70]:
# set up model object
diffs_in_diffs_model = sm.OLS(merged_data['gdpcap'],merged_data[['const','time_dummy','treatment_dummy','interaction']])

In [71]:
# estimate model and show fit
diffs_in_diffs_model.fit().summary()

0,1,2,3
Dep. Variable:,gdpcap,R-squared:,0.607
Model:,OLS,Adj. R-squared:,0.593
Method:,Least Squares,F-statistic:,42.28
Date:,"Tue, 25 Feb 2020",Prob (F-statistic):,1.29e-16
Time:,22:39:12,Log-Likelihood:,-136.91
No. Observations:,86,AIC:,281.8
Df Residuals:,82,BIC:,291.6
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.2436,0.266,19.735,0.000,4.715,5.772
time_dummy,3.3392,0.371,8.989,0.000,2.600,4.078
treatment_dummy,0.1387,0.376,0.369,0.713,-0.609,0.886
interaction,-0.8547,0.525,-1.627,0.108,-1.900,0.190

0,1,2,3
Omnibus:,38.183,Durbin-Watson:,0.321
Prob(Omnibus):,0.0,Jarque-Bera (JB):,6.512
Skew:,0.223,Prob(JB):,0.0385
Kurtosis:,1.728,Cond. No.,6.93


Interaction term cofficient estimate of -0.8547 shows the treatment effect, which is negative.