# Ames Housing Data Model Fitting, Evaluation and Selection

This project aims to predict housing prices in Ames, Iowa based on the famous Ames Iowa Housing Dataset from the Ames City Assessor's Office, made available by Dean De Cock of Truman University in 2011. Professor De Cock's original paper can be found [here.](http://jse.amstat.org/v19n3/decock.pdf)

Presentation Slides for this project can be found [here.](https://www.beautiful.ai/player/-N4J5UYshyuRtwl5G4I7)

This notebook includes python code for model fitting, selection and evaluation as well as takeaways for potential home buyers, sellers and flippers in 2011. Data exploration, extraction, cleaning and transformation can be found in the Ames_EDA notebook.

# Sections and steps

- <a href="#SHP">Reading in Data</a><br>
    - Import Packages needed and helper module
    - Read in training data
    - Read in test data
- <a href="#FS">Feature Selection</a><br>
    - Lasso Regression
        - Randomized search range of alphas
        - cross validation
    - Feature Analysis
    - Final Feature Selection
    - Exporting Finalized Datasets
- <a href="#LM">Linear Models</a><br>
    - MLR
    - Ridge
    - Elastic Net
- <a href="#NLM">Non-Linear Models</a><br>
    - Random Forest
        - grid search with 5 fold cross validation
        - best hyperparameters
    - Gradient Boosting Tree Model
        - grid search with 5 fold cross validation
        - best hyperparameters
- <a href="#MES">Model Evaluation and Selection</a><br>

- <a href="#TKW">Takeaways</a><br>
    - Extracting Feature Importance with Ridge Regression
    - Takeaways

<p><a name="IMP"></a></p>

## Importing Packages, Reading in Data

In [13]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso

### Importing helper module

### Reading in cleaned data

In [2]:
pd.set_option('display.max_columns',None)

In [3]:
train = pd.read_csv('./data/cleaned/train_c.csv')

### Preprocessing Data for Linear Models

In [12]:
len(train.columns)

80

In [10]:
#Dropping columns
train.drop(columns = ['Unnamed: 0', 'Id', 'SalePrice'], inplace = True)

In [14]:
#selecting just categorical variables in the dataframe and creating a list
cat_var = train.select_dtypes('O').columns.tolist()

#selecting just continuous variables in the dataframe and creating a list
num_var = train.select_dtypes('number').columns.tolist()

In [15]:
len(cat_var)

39

In [16]:
len(num_var)

41

In [18]:
num_var

['MSSubClass',
 'LotFrontage',
 'LotArea',
 'OverallQual',
 'OverallCond',
 'YearBuilt',
 'YearRemodAdd',
 'MasVnrArea',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtUnfSF',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'LowQualFinSF',
 'GrLivArea',
 'BsmtFullBath',
 'BsmtHalfBath',
 'FullBath',
 'HalfBath',
 'BedroomAbvGr',
 'KitchenAbvGr',
 'TotRmsAbvGrd',
 'Fireplaces',
 'GarageYrBlt',
 'GarageCars',
 'WoodDeckSF',
 'OpenPorchSF',
 'EnclosedPorch',
 '3SsnPorch',
 'ScreenPorch',
 'PoolArea',
 'Fence',
 'MiscVal',
 'MoSold',
 'YrSold',
 'log_SalePrice',
 'AgeHome',
 'YrsSnRmdl',
 'BthrmAbvGrd',
 'BthrmBsmt']

<p><a name="ER"></a></p>

## Feature Selection

Lasso Regression
Randomized search range of alphas
cross validation
Feature Analysis
Final Feature Selection
Exporting Finalized Datasets

<p><a name="LM"></a></p>

## Linear Models

<p><a name="NLM"></a></p>

## Non Linear Models

<p><a name="MES"></a></p>

## Model Evaluation and Selection

<p><a name="TKW"></a></p>

## Takeaways