## Introduction

This in-class example demonstrates how to calculate heteroskeasticity-robust standard error and conduct tests on the existence of heteroskedasticity.

What you need to know:  
- Statsmodels and pandas modules in Python
- Theoretical concepts on linear regression model
- Theoretical concepts on heteroskedasticity

The list of [references](#References) for detailed concepts and techniques used in this exerise.

***

## Data Description

The data set is contained in a comma-separated value (csv) file named ```hprice1.csv``` with column headers. 

Description of the data is as follow:
```
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
price           float  %9.0g                  house price, $1000s
assess          float  %9.0g                  assessed value, $1000s
bdrms           byte   %9.0g                  number of bdrms
lotsize         float  %9.0g                  size of lot in square feet
sqrft           int    %9.0g                  size of house in square feet
colonial        byte   %9.0g                  =1 if home is colonial style
lprice          float  %9.0g                  log(price)
lassess         float  %9.0g                  log(assess
llotsize        float  %9.0g                  log(lotsize)
lsqrft          float  %9.0g                  log(sqrft)
-------------------------------------------------------------------------------
 ```

***
## Load the required modules

In [None]:
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.diagnostic as smd
import matplotlib.pyplot

***
## Import the data set

#### Load the data set into Python

***
## Estimate a House Price Model

Consider a house price model in terms of the levels:
$$price = \beta_0 + \beta_1 lotsize + \beta_2 sqrft + \beta_3 bdrms + u$$

### Model under the assumption of *homoskedasticity*

#### Estimate the model

#### Get the estimation results

***
### Test for heteroskedasticity

There are a few useful Python [methods](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.RegressionResults.html) for linear regression results in the Statsmodels module that we use here:
- ```resid``` for the residuals of the estimated model
- ```model.exog``` for the data of the exogenous regressors

Label the test statistics. Later we will use the ```zip()``` function to return combine the label with the test statistics.

In [None]:
labels = ['LM Statistic', 'LM-Test p-value', 'F-Statistic', 'F-Test p-value']

#### Breusch-Pagan test

Estimate the equation
$$\hat{u}^2 = \delta_0 + \delta_1 x_1 + \ldots + \delta_k x_k + error$$

Null hypothesis: 
$$H_0: \delta_1 = \delta_2 = \ldots = \delta_k = 0$$

#### White test

For a model contains $k=3$ independent variables:
$$\hat{u}^2 = \delta_0 + \delta_1 x_1 + \delta_2 x_2 + \delta_3 x_3 + \delta_4 x_1^2 + \delta_5 x_2^2 + \delta_6 x_3^2 + \delta_7 x_1 x_2 + \delta_8 x_1 x_3 + \delta_9 x_2 x_3 + error$$

Null hypothesis:
$$H_0: \delta_1 = \delta_2 = \ldots = \delta_9 = 0$$

#### White test (specific form)

Use $\hat{y}_i^2$ to get a particular function of all the squares and cross products
$$\hat{u}^2 = \delta_0 + \delta_1 \hat{y} + \delta_2 \hat{y}^2 + error$$

Null hypothesis: $$H_0: \delta_1 = 0, \delta_2 = 0$$

Create the set of regressors

Conduct the test:

***
### Create a scatter plot to visualize the heteroskedastic variance

***
## Model under the assumption of *heteroskedasticity*

#### Estimate the model
For robust standard error in MacKinnon and White (1985), specify the covariance type using ```cov_type='HC1'``` in OLS class method ```fit()```.

#### Get the estimation results

***
## References
- Jeffrey M. Wooldridge (2012). "Introductory Econometrics: A Modern Approach, 5e" Chapter 8.
    
- Seabold, Skipper, and Josef Perktold (2010). "[statsmodels: Econometric and statistical modeling with python](https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html)." Proceedings of the 9th Python in Science Conference.