## Introduction

This in-class example focuses on practical issues on executing multiple linear regression.

What you need to know:  
- Statsmodels and pandas modules in python
- Hands-on experience on multiple linear regression model
- Theoretical concepts on various regression techniques  

The list of [references](#References) for detailed concepts and techniques used in this exerise.
***

## Content
- [Include an Interaction Term](#Include-an-Interaction-Term)
- [Reformulate the Model with Demeaned Interactive Term](#Reformulate-the-Model-with-Demeaned-Interactive-Term)
- [Log-Log Linear Regression Model](#Log-Log-Linear-Regression-Model) 
- [References](#References)

***

## Data Description

The data set is contained in a comma-separated value (csv) file named ```hprice1.csv``` with column headers. 

Description of the data is as follow:

| Name | Description |
| :--- | :--- |
| price    | house price, \$1000s |
| assess   | assessed value, \$1000s |
| bdrms    | number of bdrms |
| lotsize  | size of lot in square feet |
| sqrft    | size of house in square feet |
| colonial | =1 if home is colonial style |
| lprice   | log(price) |
| lassess  | log(assess) |
| llotsize | log(lotsize) |
| lsqrft   | log(sqrft) |

***
## Load the required modules

In [1]:
import math
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf

***
## Load Data and Check

#### Load the data set

#### Check if the data is properly imported

***
## Include an Interaction Term

Following the housing example in the slides. Consider a model with the following functional form:
$$price = \beta_0 + \beta_1 sqrft + \beta_2 bdrms + \beta_3 sqrft \cdot bdrms + u$$

Partial effect of $bdrms$ on $price$, holding all other variables fixed:
$$ \frac{\Delta price}{\Delta bdrms} = \beta_2 + \beta_3 sqrft $$

Estimate the model with the interaction term.

#### Method 1: Generate a new interaction term

#### Method 2.1: Use Statsmodels Built-in Formula (*)

#### Method 2.2: Use Statsmodels Built-in Formula (:)

***
## Reformulate the Model with Demeaned Interactive Term

Consider the following model:
$$ price = \alpha_0 + \delta_1 sqrft + \delta_2 bdrms + \delta_3 (sqrft - \overline{sqrft}) \cdot (bdrms - \overline{bdrms}) + u$$

Partial effect of $bdrms$ is
$$\frac{\Delta price}{\Delta bdrms} = \delta_2 + \delta_3 (sqrft - \overline{sqrft})$$

#### Demean the required variables

#### Estimate the regression model

***
## References

- Jeffrey M. Wooldridge (2019) "Introductory Econometrics: A Modern Approach, 7e" Chapter 6.

- The pandas development team (2020). "[pandas-dev/pandas: Pandas](https://pandas.pydata.org/)." Zenodo.
    
- Seabold, Skipper, and Josef Perktold (2010). "[statsmodels: Econometric and statistical modeling with python](https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html)." Proceedings of the 9th Python in Science Conference.