## Introduction

This in-class example demonstrates how to calculate heteroskeasticity-robust standard error and conduct tests on the existence of heteroskedasticity.

What you need to know:  
- Statsmodels and pandas modules in python
- Theoretical concepts on time series regression model

The list of [references](#References) for detailed concepts and techniques used in this exerise.
***

## Content
- [Model a Time Trend](#Model-a-Time-Trend)
- [Static Housing Investment Model](#Estimate-a-Static-Housing-Investment-Model)
- [Static Housing Investment Model with Time Trend](#Estimate-a-Static-Housing-Investment-Model-with-Time-Trend)
- [Finite Distributed Lag Model](#Estimate-a-Finite-Distributed-Lag-Model) 
- [Finite Distributed Lag Model with Time Trend](#Estimate-a-Finite-Distributed-Lag-Model-with-Time-Trend)
- [References](#References)

***
## Data Description

The data set is contained in a comma-separated value (csv) file named ```HSEINV.csv``` with column headers. 

The data is a set of annual observations on housing investment and a housing price index in the United States for 1947 through 1988.

Description of the data is as follow:

| Name | Description |
| :--- | :--- |
| year     | 1947-1988 |
| inv      | real housing inv, millions $ |
| pop      | population, 1000s |
| price    | housing price index; 1982 = 1 |
| linv     | log(inv) |
| lpop     | log(pop) |
| lprice   | log(price) |
| t        | time trend: t = 1, ..., 42 |
| invpc    | per capita inv: inv/pop |
| linvpc   | log(invpc) |
| lprice_1 | lprice *last* period |
| linvpc_1 | linvpc *last* period |
| gprice   | lprice - lprice_1 |
| ginvpc   | linvpc - linvpc_1 |

***
## Load the required modules

In [None]:
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot

***
## Import the data set

#### Load the data set into Python

#### Sort the data in ascending order by year

#### Generate time index

Create new variable ```t``` such that $t=0$ in the first period 

#### Generate lag variable

Create a new column in the date set named ```lprice_1```, such that $lprice\_1 = \log(price_{t-1})$ 

#### Generate "*first-differencing*" variables

Create a new column named ```gprice```, such that $gprice = \log(price_t) - \log(price_{t-1})$

#### Compare the series of ```lprice``` with ```gprice``` 

***
## Model a Time Trend

#### Estimate the time trend of log housing investment
$$linvpc = \beta_0 + \beta_1 t + u_t$$

#### Get the estimation results

How would you interpret $\beta_1$?

At 5% significance level, what would you conclude about $\beta_1$?

#### Plot the time series of log housing investment (predicted vs actual)

***
## Estimate a Static Housing Investment Model

Consider a housing investment model log-log specification:
$$linvpc_t = \beta_0 + \beta_1 lprice_t + u_t$$

#### Estimate the model

#### Get the estimation results

How would you interpret $\beta_1$?

At 5% significance level, what would you conclude about $\beta_1$?

#### Plot the time series of log housing investment (predicted vs actual)

***
## Estimate a Static Housing Investment Model with Time Trend

Consider a housing investment model log-log specification:
$$linvpc_t = \beta_0 + \beta_1 lprice_t + \beta_2 t + u_t$$
where $t$ is the time trend.

#### Estimate the model

#### Get the estimation results

At 5% significance level, what would you conclude about $\beta_1$?

#### Plot the time series of log housing investment (predicted vs actual)

***
## Estimate a Finite Distributed Lag Model

Consider a housing investment model log-log specification:
$$linvpc_t = \beta_0 + \beta_1 lprice_t + \beta_2 lprice_{t-1} + u_t$$

#### Estimate the model

#### Get the estimation results

How would you interpret $\beta_1$?

How would you interpret $\beta_2$?

#### Plot the time series of log housing investment (predicted vs actual)

***
## Estimate a Finite Distributed Lag Model with Time Trend

Consider a housing investment model log-log specification:
$$linvpc_t = \beta_0 + \beta_1 lprice_t + \beta_2 lprice_{t-1} + t + u_t$$

#### Estimate the model

#### Get the estimation results

How would you interpret $\beta_1$?

How would you interpret $\beta_2$?

#### Plot the time series of log housing investment (predicted vs actual)

***
## References

- Jeffrey M. Wooldridge (2019). "Introductory Econometrics: A Modern Approach, 7e" Chapter 10.

- The pandas development team (2020). "[pandas-dev/pandas: Pandas](https://pandas.pydata.org/)." Zenodo.

- Seabold, Skipper, and Josef Perktold (2010). "[statsmodels: Econometric and statistical modeling with python](https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html)." Proceedings of the 9th Python in Science Conference.