### DID model to estimate the effect of hurricanes on house prices

Estimate the differential impact of a ‘Treatment’ on the treated group of individuals.

The fitted DID model will tell us whether there is evidence of a net-additional effect observed in the treated group that is purely treatment induced, the estimated value of this, whether this estimate is statistically significant and if so, the 95% or 99% confidence intervals are around the estimated effect.

Difference-In-Differences regression model:

$y_i=\beta_0 + \beta_1*TP_i + \beta_2*Treat_i + \beta_3*(TP_i*Treat_i) + \epsilon_i$

$y_i=$ the observed response for the $ith$ observation (value being measured in each group before and after treatment).

$\beta_0$ intercept.

$TP_i$ Time Perriod, is a dummy variable that takes the value 0 or 1 depending on whether the ith measurement refers to the pre or post treatment period respectively.

$Trat_i$ Treated, is a dummy variable that takes the value 0 or 1 depending on whether the ith measurement refers to an individual in the control group or the treatment group respectively.

$TP_i*Treat_i$ is an interaction term.

$\epsilon_i$ is the error term associated with the ith observation and it captures the effect of all factors that the model was not able to adequately represent.


The two dummy variables in the model yield the follow 2 X 2 matrix of regression equations: 


<table ><tr><th > <th><th> $TP_i=0$ <th><th> $TP_i=1$ <tr><tr>
<tr><td> $Treat_i=0$ <td><td> $y_i=\beta_0+\epsilon_i$ <td><td> $y_i=\beta_0+\beta_1+\epsilon_i$ <td><tr><tr><td> $Treat_i=1$ <td><td> $y_i=\beta_0+\beta_2+\epsilon_i$ <td><td> $y_i=\beta_0+\beta_1+\beta_2+\beta_3+\epsilon_i$ <td><tr><table>
    

Using the Ordinary Least Squares Regression:
    
    
$E(y_i | TP_i=0, Treat=0) = \hat{\beta_0}$

$E(y_i | TP_i=1, Treat=0) = \hat{\beta_0} + \hat{\beta_1} $
    
$E(y_i | TP_i=0, Treat=1) = \hat{\beta_0} + \hat{\beta_2} $    

$E(y_i | TP_i=1, Treat=1) = \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}+  \hat{\beta_3} $      
    
    
Calculate the difference in the expected value of $y_i$ between the before (pre-) and after (post-) treatment phases of the study.
    
For the treatment group, the difference in expectations works out as follows: 

$E(y_i | TP_i=1, Treat=1) - E(y_i | TP_i=0, Treat=1) = (\hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}+  \hat{\beta_3}) - (\hat{\beta_0} + \hat{\beta_2}) = \hat{\beta_1}+  \hat{\beta_3} $

Similarly, for the control group we have:
    
$(E(y_i | TP_i=1, Treat=0)) - (E(y_i | TP_i=0, Treat=0)) = (\hat{\beta_0} + \hat{\beta_1}) - (\hat{\beta_0}) = \hat{\beta_1}$
    

The difference between the two differences gives us the net effect of the treatment on the treatment group:
    
$E(DiD effect) = (\hat{\beta_1} + \hat{\beta_3}) - (\hat{\beta_1}) = \hat{\beta_3}$

    
Difference-in-differences effect is the coefficient of the interaction term (TP_i*Treat_i).
    
After the DID model is trained, the fitted coefficient of the interaction term will give us the the estimated difference-in-differences effect that we are seeking. 


### Example

Difference-In-Differences model to estimate the effect of 2005 of Atlantic coastal weather events (hurricane season) on house prices.


TECHNICAL DATA GENERATION

Source: https://towardsdatascience.com/a-guide-to-using-the-difference-in-differences-regression-model-87cd2fb3224a


Defining the criteria for being included in the Treatment group. Examining the actions taken by the US Federal Emergency Management Agency with Individual Assistance and count the number of counties in each coastal state which qualified for receiving it: "NUM_IND_ASSIST". Any state with a count greater or equal to the median (14) will fall into the treatment group, the rest will be part of the control group: "Disaster_Affected". 


Setting up the Time Period column. we will set to 0 to indicate the period before the start of the 2005 hurricane season, and to 1 to indicate the period after the end of the hurricane season: "Time_Period" 


Calculating the value of the response variable.  The goal is to study the effect of the 2005 hurricane season on house prices in the coastal states, using the state-wise All Transactions House Price Index published by the US Federal Reserve. The time series data sets are selected for the 24 states of interest and knock them together into a 24-state data panel.  The time periods of interest are the 4 quarters immediately prior to the 2005 hurricane season and the 4 quarters immediately following the season. For each state, calculate the average quarter-over-quarter (Q-o-Q) fractional change in the house price index over the two sets of quarters (pre and post): $HPI Fractional Change = \frac{HPI_i — HPI_{i-1}}{HPI_{i-1}}$. Next, take the vertical average of each block of 4 quarters to arrive at the average fractional change in HPI across 4 quarters both before and after the 2005 hurricane season. Repeat this calculation for each state to get the value of the response variable HPI_CHG for the pre-treatment and post-treatment phases. For each state, there are two response values: the top value is the pre-treatment value and the bottom one is the post-treatment value. Thus, there is one value corresponding to Time_Period=0 and another one corresponding to Time_Period=1. The column  HPI_CPG is our response variable $y_i$.



Building the Difference-In-Differences model for house price inflation: 

Equation for our DID model:

$HPI\_CHG_i=\beta_0 + \beta_1*Time\_Period_i + \beta_2*Disaster\_Affected_i + \beta_3*(Time\_Period_i*Disaster\_Affected_i) + \epsilon_i$



In [5]:
import pandas as pd
from patsy import dmatrices
import statsmodels.api as sm


In [8]:
#Load the data set into a Pandas Dataframe
df = pd.read_csv('us_fred_coastal_us_states_avg_hpi_before_after_2005.csv', header=0)

#Print it
#print(df)

In [9]:
#Form the regression expression in Patsy syntax. The intercept is assumed to be present and will be
# included in the data set automatically
reg_exp = 'HPI_CHG ~ Time_Period + Disaster_Affected + Time_Period*Disaster_Affected'



In [10]:
#Carve out the training matrices
y_train, X_train = dmatrices(reg_exp, df, return_type='dataframe')



In [11]:
#Build the DID model
did_model = sm.OLS(endog=y_train, exog=X_train)



In [12]:
#Train the model
did_model_results = did_model.fit()


In [13]:
#Print out the training results
did_model_results.summary()

0,1,2,3
Dep. Variable:,HPI_CHG,R-squared:,0.536
Model:,OLS,Adj. R-squared:,0.504
Method:,Least Squares,F-statistic:,16.92
Date:,"Sun, 05 Nov 2023",Prob (F-statistic):,1.88e-07
Time:,22:15:09,Log-Likelihood:,145.14
No. Observations:,48,AIC:,-282.3
Df Residuals:,44,BIC:,-274.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0371,0.003,13.157,0.000,0.031,0.043
Time_Period,-0.0278,0.004,-6.985,0.000,-0.036,-0.020
Disaster_Affected,-0.0139,0.006,-2.258,0.029,-0.026,-0.001
Time_Period:Disaster_Affected,0.0197,0.009,2.260,0.029,0.002,0.037

0,1,2,3
Omnibus:,5.463,Durbin-Watson:,1.165
Prob(Omnibus):,0.065,Jarque-Bera (JB):,4.279
Skew:,0.623,Prob(JB):,0.118
Kurtosis:,3.767,Cond. No.,6.78


We see that the adjusted R-squared is 0.504. The model has been able to explain more than 50% of the variance in the response variable HPI_CHG.

The p value of the F-statistic is 1.88e-07 which is statistically significant, leading us to conclude that the model’s variables are jointly significant and they are together doing a good job of explain the variance in HPI_CHG. 

All coefficients are statistically significant as indicated by their p values which are all smaller than 0.05.

The equation of the fitted model is: $HPI\_CHG_i = 0.0371 - 0.0278*Time\_Period_i - 0.139*Disaster\_Affected_i + 0.0197*(Time\_Period_i*Disaster\_Affected_i) + \epsilon_i$

### Interpretation of each combination of the two dummy variables:

Time_Period_i=0 and Dister_Affected_i=0

$E(HPI\_CHG) = 0.0371$

Ii is the expected Q-o-Q change in house price index in the control group states during the pre-hurricane period. This equation gives us the estimated mean inflation in house prices in the control group during the four quarters immediately preceding the 2005 hurricane season. The value of the estimated mean inflation is simply the intercept of regression: 0.0371, or 3.71%.

Time_Period_i=1 and Disaster_Affected_i=0

$E(HPI\_CHG) = 0.0371 - 0.0278 = 0.0093$

It is the expected Q-o-Q change in house price index in the control group states during the post-hurricane period. This equation give us the estimated mean inflation in house prices in the control group states in the post-treatment period, i.e. during the four quarters following the hurricane season. The value of the estimated mean inflation is 0.0093, or 0.93%.

Time_Period_i=0 and Disaster_Affected_i=1

$E(HPI\_CHG) = 0.0371 - 0.0371 = 0.0232$

It is the expected Q-o-Q change in house price index in the treatment group states during the pre-hurricane period. This equation gives us the estimated mean house price inflation in the treatment group states during the four quarters prior to the start of the hurricane season. The value of this inflation is 0.0232, or 2.32%.


Time_Period_i=1 and Disaster_Affected_i=1

$E(HPI\_CHG) = 0.0371 - 0.0278 - 0.0139 + 0.0197 = 0.0151$

It is the expected Q-o-Q change in house price index in the treatment group states during the post-hurricane period. This equation gives us the estimated mean house price inflation in the treatment group during the four quarters following the end of the hurricane season. The value of this inflation is 0.0151 or 1.51%.




<table ><tr><th > <th><th> Treatment Group <th><th> Control Group <tr><tr> <tr><th > <th><th> E(HPI_CHG | DisasterAffected=1) <th><th> E(HPI_CHG | DisasterAffected=0) <tr><tr>
<tr><td> Time_Period = 0 <td><td> $2.32\%$ <td><td> $3.71\%$ <td><tr><tr><td> Time_Period = 1 <td><td> $1.51\%$ <td><td> $0.93\%$ <td><tr><tr><td> $\delta$E(HPI_CPG) <td><td> $-0.81\%$ <td><td> $-2.78\%$ <td><tr><table>
    

In the Disaster Affected group, the inflation in house prices in the four quarters following the hurricane season were lower by 0.81% as compared to the house price inflation experienced in the four quarters prior to the start of the hurricane season. In the non Disaster Affected group, the inflation in house prices in the four quarters following the hurricane season were lower by 2.78% as compared to the house price inflation experienced in the four quarters prior to the start of the hurricane season.

The difference-in-difference effect between the two groups is: 

$\delta$ E(HPI_CPG | Disaster_Affected=1) - $\delta$ E(HPI_CPG | Disaster\_Affected=0) =(-0.81%) - (-2.78%)= 1.97%

The value of 1.97% is exactly the value of the coefficient of Time_Period*Disaster_Affected interaction term reported by the trained DID regression model

The estimated difference-in-differences of 1.97% suggests that the house price inflation in the states that were especially affected by the 2005 hurricane season cooled down less than in the rest of the coastal states after the season ended.