<a href="https://colab.research.google.com/github/SirivellaAnjani/House-Prices-Prediction/blob/main/House_Prices_Prediction_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **House Prices Prediction Using TensorFlow**

# Abstract

The purpose of this notebook is to create a machine learning model that can accurately predict the sale price of houses in King County, Washington. In order to achieve this goal, regression techniques will be implemented to analyze a dataset containing house sale prices between May 2014 and May 2015. Additionally, k-cross fold validation will be utilized to ensure that the model produces reliable and accurate predictions. By developing a machine learning model capable of accurately predicting house sale prices, this project aims to provide valuable insights to home buyers and sellers in the King County area, and also serves as a useful tool for real estate professionals</body></div>

<div class="my-h1">Table of Contents</div>  

<div class='toc my-body'>
  <ol class="toc-list" role="list">
    <li>
      <a href="#chapter1" class="title">
        Data Description
      </a>
    </li>      
    <li>
      <a href="#chapter2" class="title end-list">
        Exploratory Data Analysis (EDA)
      </a>
      <ol role="list" class="toc-list">
        2.1 <a href="#chapter2.1" class="title">
          Import and Inspect Data
        </a>
      </ol>
        <ol role="list">
        2.2 <a href="#chapter2.2" class="title">
          Univariate Non-Graphical EDA 
        </a>
      </ol>
      <ol role="list" >
        2.3 <a href="#chapter2.3" class="title">
          Univariate Graphical EDA
        </a>
      </ol>
        <ol role="list" >
        2.4 <a href="#chapter2.4" class="title">
          Multivariate Non-Graphical EDA
        </a>
      </ol>
      <ol role="list" >
        2.5 <a href="#chapter2.5" class="title">
          Multivariate Graphical EDA
        </a>
      </ol>
    </li>      
    <li>
      <a href="#chapter3" class="title end-list">
        Data Preprocessing
      </a>
      <ol role="list" class="toc-list">
        3.1 <a href="#chapter3.1" class="title">
          Handling Missing Values
        </a>
      </ol>
        <ol role="list">
        3.2 <a href="#chapter3.2" class="title">
          Handling Categorical Values 
        </a>
      </ol>
      <ol role="list" >
        3.3 <a href="#chapter3.3" class="title">
          Feature Scaling
        </a>
      </ol>
    </li>
  </ol>
</div>

# 1. Data Description

<div class="my-body">There are 80 variables available for the prediction model to leverage. The variable I am trying to predict is called Label and the input variables for the mcahine learning model are called Features.
</div>
<h3>Label</h3>
<div class="my-body">
SalePrice - the property's sale price in dollars.
</div>
<h3>Features</h3>
<div class="my-body">
    There are <em>79 features</em>. The complete details about every variable can be found in <a target=_blank href="https://github.com/SirivellaAnjani/House-Prices-Prediction/blob/1e6bc8b45a2561fbc60766d8805df6f33f4952e8/data/data_description.txt">Data Description</a> text. Here is a brief description of each input variable:
</div>

    
| Input Variable | Description                                                    |
|:---------------|:---------------------------------------------------------------|
| MSSubClass     |  The building class                                            |
| MSZoning       |  The general zoning classification                             |
| LotFrontage    |  Linear feet of street connected to   property                 |
| LotArea        |  Lot size in square feet                                       |
| Street         |  Type of road access                                           |
| Alley          |  Type of alley access                                          |
| LotShape       |  General shape of property                                     |
| LandContour    |  Flatness of the property                                      |
| Utilities      |  Type of utilities available                                   |
| LotConfig      |  Lot configuration                                             |
| LandSlope      |  Slope of property                                             |
| Neighborhood   |  Physical locations within Ames   city limits                  |
| Condition1     |  Proximity to main road or railroad                            |
| Condition2     |  Proximity to main road or railroad   (if a second is present) |
| BldgType       |  Type of dwelling                                              |
| HouseStyle     |  Style of dwelling                                             |
| OverallQual    |  Overall material and finish   quality                         |
| OverallCond    |  Overall condition rating                                      |
| YearBuilt      |  Original construction date                                    |
| YearRemodAdd   |  Remodel date                                                  |
| RoofStyle      |  Type of roof                                                  |
| RoofMatl       |  Roof material                                                 |
| Exterior1st    |  Exterior covering on house                                    |
| Exterior2nd    |  Exterior covering on house (if   more than one material)      |
| MasVnrType     |  Masonry veneer type                                           |
| MasVnrArea     |  Masonry veneer area in square feet                            |
| ExterQual      |  Exterior material quality                                     |
| ExterCond      |  Present condition of the material   on the exterior           |
| Foundation     |  Type of foundation                                            |
| BsmtQual       |  Height of the basement                                        |
| BsmtCond       |  General condition of the basement                             |
| BsmtExposure   |  Walkout or garden level basement   walls                      |
| BsmtFinType1   |  Quality of basement finished area                             |
| BsmtFinSF1     |  Type 1 finished square feet                                   |
| BsmtFinType2   |  Quality of second finished area   (if present)                |
| BsmtFinSF2     |  Type 2 finished square feet                                   |
| BsmtUnfSF      |  Unfinished square feet of basement   area                     |
| TotalBsmtSF    |  Total square feet of basement area                            |
| Heating        |  Type of heating                                               |
| HeatingQC      |  Heating quality and condition                                 |
| CentralAir     |  Central air conditioning                                      |
| Electrical     |  Electrical system                                             |
| 1stFlrSF       |  First Floor square feet                                       |
| 2ndFlrSF       |  Second floor square feet                                      |
| LowQualFinSF   |  Low quality finished square feet   (all floors)               |
| GrLivArea      |  Above grade (ground) living area   square feet                |
| BsmtFullBath   |  Basement full bathrooms                                       |
| BsmtHalfBath   |  Basement half bathrooms                                       |
| FullBath       |  Full bathrooms above grade                                    |
| HalfBath       |  Half baths above grade                                        |
| Bedroom        |  Number of bedrooms above basement   level                     |
| Kitchen        |  Number of kitchens                                            |
| KitchenQual    |  Kitchen quality                                               |
| TotRmsAbvGrd   |  Total rooms above grade (does not   include bathrooms)        |
| Functional     |  Home functionality rating                                     |
| Fireplaces     |  Number of fireplaces                                          |
| FireplaceQu    |  Fireplace quality                                             |
| GarageType     |  Garage location                                               |
| GarageYrBlt    |  Year garage was built                                         |
| GarageFinish   |  Interior finish of the garage                                 |
| GarageCars     |  Size of garage in car capacity                                |
| GarageArea     |  Size of garage in square feet                                 |
| GarageQual     |  Garage quality                                                |
| GarageCond     |  Garage condition                                              |
| PavedDrive     |  Paved driveway                                                |
| WoodDeckSF     |  Wood deck area in square feet                                 |
| OpenPorchSF    |  Open porch area in square feet                                |
| EnclosedPorch  |  Enclosed porch area in square feet                            |
| 3SsnPorch      |  Three season porch area in square   feet                      |
| ScreenPorch    |  Screen porch area in square feet                              |
| PoolArea       |  Pool area in square feet                                      |
| PoolQC         |  Pool quality                                                  |
| Fence          |  Fence quality                                                 |
| MiscFeature    |  Miscellaneous feature not covered   in other categories       |
| MiscVal        |  Dollar value of miscellaneous feature                         |
| MoSold         |  Month Sold                                                    |
| YrSold         |  Year Sold                                                     |
| SaleType       |  Type of sale                                                  |
| SaleCondition  |  Condition of sale                                             |


<p style="background-color:#93DEE4;
          color:#101010;
          font-family: Verdana, sans-serif;
          font-size:250%;
          text-align:justify;
          padding: 30px">2. Exploratory Data Analysis<a class="anchor" id="chapter2"></a>

Prior to importing the data, I will import the libraries required for the analysis:

In [2]:
!pip install tensorflow_decision_forests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow_decision_forests
  Downloading tensorflow_decision_forests-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.4/16.4 MB[0m [31m73.4 MB/s[0m eta [36m0:00:00[0m
Collecting wurlitzer
  Downloading wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)
Installing collected packages: wurlitzer, tensorflow_decision_forests
Successfully installed tensorflow_decision_forests-1.3.0 wurlitzer-3.0.3


In [3]:
#  Data Manipulation
import numpy as np
import pandas as pd

# Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt

# Data Preprocessing
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.feature_selection import SelectPercentile, chi2

# Algorithms
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import tensorflow as tf
import tensorflow_decision_forests as tfdf

# Evaluation Metrics
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score
from sklearn.metrics import precision_score, recall_score, f1_score

# Visualize Metrics
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import roc_curve
from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import PrecisionRecallDisplay

# Format Notebook
from IPython.display import display, HTML, display_html 

In [4]:
print("TensorFlow v" + tf.__version__)
print("TensorFlow Decision Forests v" + tfdf.__version__)

TensorFlow v2.12.0
TensorFlow Decision Forests v1.3.0


2.1 

<p style="background-color:#93DEE4;
          color:#101010;
          font-family: Verdana, sans-serif;
          font-size:250%;
          text-align:justify;
          padding: 30px">2.1 Exploratory Data Analysis<a class="anchor" id="chapter2.1"></a>