### Application: Natural Experiment. Consider the empirical model about house pricing and incinerator. In 1980, there was the news that a garbage incinerator would be built somewhere in Springﬁeld; In 1981, the construction actually began. We want to know the eﬀect of the garbage on house prices. Use the data in KIELMC for this exercise. See KIELMC description.txt for variable descriptions.

In [3]:
use "/mnt/c/Users/xunin/Dropbox/Working_Directory/Tex_MD/Teaching/tutorial/6002/W8/KIELMC.DTA", clear

### (a) The following simple regression model was estimated using only the year 1981 sample of data

$$ \widehat{lrprice} = 11.4785 − 0.4025 \cdot nearinc $$ 

### while the following was estimated using only the 1978 sample of data 

$$ \widehat{lrprice} = 11.2854 − .3399 \cdot nearinc $$ 

### where lrprice is the log of real transaction price (in 1978 dollars) and nearinc is the indicator that a house is nearby the garbage incinerator. Explain why we cannot infer from the estimates in (1), based on the year 1981 data alone, that the location of the incinerator caused the price of houses located nearby to fall by an average of 40.25%. What evidence from model (2), based on the 1978 data, supports this conclusion?

#### Although the slope estimate in (1) based on the year 1981 data showed that the location of the incinerator was associated with a 40.25% decrease in the average selling price for the houses located nearby, it couldn’t be interpreted as a causal eﬀect as the estimate was the combined result of the causal eﬀect and the location eﬀect. This was supported by the estimates in (2) based on the data from 1978, in which year there was no news about the new incinerator. The estimates in (2) suggested that, without any eﬀect of the future incinerator, the location itself was not desirable so that the houses nearby the location of future incinerator had been sold for 33.99% less on average than the houses not nearby (the location of future incinerator).

### (b) As discussed in class, we may use the model below to estimate the eﬀect of the garbage incinerator

$$ lrprice = \gamma_0 + \gamma_1 \cdot y81 + \gamma_2 \cdot nearinc + \gamma_3 \cdot y81 \cdot nearinc + u $$

### where  y81  is a dummy variable equals to one if the year is 1981. One by one, interpret the coeﬃcients  $\gamma_1$, $\gamma_2$, and $\gamma_3$.

#### $\gamma_0$  captures the average house price sold in 1978; $\gamma_1$ captures the average house price change from 1978 to 1981, that is, the time eﬀect; $\gamma_2$ captures the location eﬀect not due to the garbage incinerator; $\gamma_3$ captures the treatment eﬀect of the garbage incinerator on house prices

### (c) Estimate the model (3); Explain brieﬂy why the OLS results of (3) is also called the diﬀerence-in-diﬀerences (DID) estimates. Compare the estimate of $\gamma_3$  to the slope estimates in (1) and (2).

In [4]:
reg lrprice y81 nearinc y81nrinc


      Source |       SS           df       MS      Number of obs   =       321
-------------+----------------------------------   F(3, 317)       =     34.47
       Model |  11.8433724         3  3.94779079   Prob > F        =    0.0000
    Residual |   36.305771       317  .114529246   R-squared       =    0.2460
-------------+----------------------------------   Adj R-squared   =    0.2388
       Total |  48.1491434       320  .150466073   Root MSE        =    .33842

------------------------------------------------------------------------------
     lrprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         y81 |   .1930937   .0453207     4.26   0.000     .1039263    .2822611
     nearinc |   -.339923   .0545555    -6.23   0.000    -.4472595   -.2325865
    y81nrinc |  -.0626489   .0834408    -0.75   0.453    -.2268166    .1015188
       _cons |   11.28542   .0305145   369.84   0.

to validate "how reliable" the choice of variable is, let's estimate its AIC / BIC score

* smaller the AIC score, more reliable the choice of the variable, i.e. more likely this choice of variable matches the population model

* smaller the BIC score, more reliable the choice of the variable for prediction, i.e. more appropriate this choice of variable for prediction

* AIC and BIC typically agree with each other on the "optimal" choice of the variable choice, given enough data size

* if AIC and BIC disagree with each other, the first give u the variables that is most likely to be in the population model; the second give you the best combo for prediction

In [5]:
estat ic


Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        321 -150.9887  -105.6752       4    219.3505   234.4362
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.


#### It is interesting to see that the estimate 33.99% for nearinc is the same as the slope estimate in (2), which represents the location eﬀect. More importantly, the estimate for y81·nearinc is only 6.3%, much lower than the slope estimate 40.25% in (1). Note that the slope estimate 40.25% in (1) is roughly the location eﬀect + the treatment eﬀect¶

### (d) We may improve the estimation by replacing nearinc by the variable dist, which is the distance from a house to the incinerator site, in feet. Consider the model

$$\log \left( price \right) = \beta_0 + \delta_0 \cdot y81 + \beta_1 \cdot \log \left( dist \right) + \delta_1 \cdot y81 \cdot \log \left( dist \right) + \epsilon$$
 
### If building the incinerator reduces the value of homes closer to the site, what is the sign of $\delta_1$ ? What does it mean if  $\beta_1>0$

#### Other things equal, houses farther from the incinerator should be worth more, so $\delta_1>0$. If $\beta_1>0$, then the incinerator was located farther away from more expensive homes.

### (e) Estimate the model (4) and report the results. Interpret the coeﬃcient on y81·log(dist). What do you conclude?

In [6]:
reg lprice y81 ldist y81ldist


      Source |       SS           df       MS      Number of obs   =       321
-------------+----------------------------------   F(3, 317)       =     69.22
       Model |  24.3172548         3  8.10575159   Prob > F        =    0.0000
    Residual |  37.1217306       317  .117103251   R-squared       =    0.3958
-------------+----------------------------------   Adj R-squared   =    0.3901
       Total |  61.4389853       320  .191996829   Root MSE        =     .3422

------------------------------------------------------------------------------
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         y81 |  -.0113101   .8050622    -0.01   0.989     -1.59525     1.57263
       ldist |    .316689   .0515323     6.15   0.000     .2153005    .4180775
    y81ldist |   .0481862   .0817929     0.59   0.556    -.1127394    .2091117
       _cons |   8.058468   .5084358    15.85   0.

the AIC/BIC score of this model is

In [7]:
estat ic


Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        321 -190.1091  -109.2425       4    226.4849   241.5707
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.


#### While $\delta_1=.048$ is the expected sign, it is not statistically signiﬁcant ( t statistic ≈ .59).

### (f) Add age, age2, rooms, baths, log(intst), log(land), and log(area) to (4), where intst is the distance to interstate in feet (for other regressors, see the variable description). Now, what do you conclude about the eﬀect of the incinerator on housing values?

In [8]:
reg lprice y81 ldist y81ldist age agesq rooms baths lintst lland larea


      Source |       SS           df       MS      Number of obs   =       321
-------------+----------------------------------   F(10, 310)      =    114.55
       Model |   48.353762        10   4.8353762   Prob > F        =    0.0000
    Residual |  13.0852234       310  .042210398   R-squared       =    0.7870
-------------+----------------------------------   Adj R-squared   =    0.7802
       Total |  61.4389853       320  .191996829   Root MSE        =    .20545

------------------------------------------------------------------------------
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         y81 |  -.2254466   .4946914    -0.46   0.649    -1.198824    .7479309
       ldist |   .0009226   .0446168     0.02   0.984    -.0868674    .0887125
    y81ldist |   .0624668   .0502788     1.24   0.215     -.036464    .1613976
         age |  -.0080075   .0014173    -5.65   0.

the AIC/BIC of this model is (smallest among all 3 model, hence the best model for prediction and most likely to be the population model among all 3 candidates)

In [9]:
estat ic


Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        321 -190.1091   58.11391      11   -94.22782  -52.74197
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.


#### When we add the list of housing characteristics to the regression, the coeﬃcient on y81·log(dist) becomes .062 (se = .050). So the estimated eﬀect is larger – the elasticity of price with respect to dist is .062 after the incinerator site was chosen – but its t statistic is only 1.24. The p-value for the one-sided alternative H1 : δ1 > 0 is about .108, which is close to being signiﬁcant at the 10% level.

### (g) Why is the coeﬃcient on log(dist) positive and statistically signiﬁcant in part (e) but not in part (f)? What does this say about the controls used in part (f)?

#### The coeﬃcient on log(dist) positive and statistically signiﬁcant in part (e) but not in part (f) may be explained by the possibility that the controls used in part (f) suﬃciently capture the location eﬀect.

In [10]:
!jupyter nbconvert --to html W8_Stata.ipynb


[NbConvertApp] Converting notebook W8_Stata.ipynb to html
[NbConvertApp] Writing 598616 bytes to W8_Stata.html
