## Introduction

This in-class example demonstrates how your data analysis can change due to different fuctional forms.

What you need to know:  
- Statsmodels and pandas modules in python
- Theoretical concepts on statistical moments
- Theoretical concepts on simple linear regression model

The list of [references](#References) for detailed concepts and techniques used in this exerise.
***

## Content
- [Load the required modules](#Load-the-required-modules)
- [Data check and summary statistics](#Data-check-and-summary-statistics)
- [Simple Linear Regression Model](#Simple-Linear-Regression-Model) 
- [References](#References)

***
## Data Description

The data set is contained in a comma-separated value (csv) file named ```hprice1.csv``` with column headers. 

Description of the data is as follow:

| Name | Description |
| :--- | :--- |
| price    | house price, \$1000s |
| assess   | assessed value, \$1000s |
| bdrms    | number of bdrms |
| lotsize  | size of lot in square feet |
| sqrft    | size of house in square feet |
| colonial | =1 if home is colonial style |
| lprice   | log(price) |
| lassess  | log(assess) |
| llotsize | log(lotsize) |
| lsqrft   | log(sqrft) |

***
## Load the required modules

In [1]:
import math
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf

***
## Data-check-and-summary-statistics

#### Load the data set
The data set is contained in a comma-separated value (csv) file named "*hprice1*" with column header. 

#### Check if the data is properly imported

#### Get statistical moments

#### Create a new column named ``lsqrft_copy`` for $log(sqrft)$
Your result should look identical to the column ``lsqrft``

#### Create a scatter plot to visualize the relationship between price and sqrft

#### Create a scatter plot to visualize the relationship between log(price) and log(sqrft)

***
## Simple Linear Regression Model on levels

#### Model Estimation by the Ordinary Least Square (OLS) method

The model **on levels** is specified as $$price = \beta_0 + \beta_1 sqrft + u,$$
where price is the house price measured in thousands of dollars.

#### Get the estimation results

#### How would you interpret the $\beta_1$ estimate?

#### What percentage of the variation in price is explained by square footage?

***
## Simple Linear Regression Model

#### Model Estimation by the Ordinary Least Square (OLS) method

The **log-log model** is specified as $$log(price) = \beta_0 + \beta_1 log(sqrft) + u,$$
where $log(\cdot)$ is the natural log of variables.

#### Get the estimation results

#### How would you interpret the $\beta_1$ estimate?

#### What percentage of the variation in log(price) is explained by log(square footage)?

#### Compared to the model on levels, which one performs better?

***

## References

- Jeffrey M. Wooldridge (2019) "Introductory Econometrics: A Modern Approach, 7e" Chapter 2.

- The pandas development team (2020). "[pandas-dev/pandas: Pandas](https://pandas.pydata.org/)." Zenodo.
    
- Seabold, Skipper, and Josef Perktold (2010). "[statsmodels: Econometric and statistical modeling with python](https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html)." Proceedings of the 9th Python in Science Conference.