# **5. House Price Predictor Notebook**

## Objectives

* Develop a working model for the House Price Predictor based on the cleaned and feature engineered dataset

## Inputs

* The cleaned TrainSetCleaned, tested with TestSetClean. Path: /workspace/milestone-project-housing-issues/outputs/datasets/cleaned/TrainSetCleaned.csv

## Outputs

* A working model for the House Price predictor

## Additional Comments

* As per the business case, the required performance for the model is an R2 value of at least 0.75 for both train and test set


---

# Change working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os

# Get the current directory
current_dir = os.getcwd()
print("Current Directory:", current_dir)

# Change the directory to the new path
os.chdir('/workspace/milestone-project-housing-issues')

# Get the updated current directory
current_dir = os.getcwd()
print("New Current Directory:", current_dir)

Current Directory: /workspace/milestone-project-housing-issues/jupyter_notebooks
New Current Directory: /workspace/milestone-project-housing-issues


In [3]:
# Loading dataset HousePricesClean from /workspace/milestone-project-housing-issues/outputs/datasets/cleaned/HousePricesClean.csv

import pandas as pd
df_houseprices_trainmodel = pd.read_csv(f"/workspace/milestone-project-housing-issues/outputs/datasets/cleaned/TrainSetCleaned.csv")
df_houseprices_trainmodel.head()

Unnamed: 0,BedroomAbvGr,BsmtExposure,BsmtFinType1,GarageFinish,GrLivArea,KitchenQual,LotArea,LotFrontage,OverallCond,OverallQual,TotalBsmtSF,SalePrice,Has2ndFloor,HasMasVnr,HasOpenPorch,HasGarage,ModsMade,SalePriceBand
0,3.0,0,0,0,42.755117,0,11694.0,90.0,0,0,1822.0,314813,0,0,0,0,0,7
1,2.0,1,1,1,29.899833,1,6600.0,60.0,0,1,894.0,109500,0,1,1,0,0,2
2,2.0,1,2,0,31.048349,1,13360.0,80.0,1,1,876.0,163500,0,1,1,0,1,3
3,3.0,1,3,1,41.097445,0,13265.0,59.0,0,2,1568.0,271000,0,0,0,0,0,6
4,3.0,1,1,1,39.255573,0,13704.0,111.5,0,3,1541.0,205000,0,0,0,0,1,4


In [4]:
# df_houseprices_trainmodel dataset summary stats
original_data_for_modelling = df_houseprices_trainmodel.describe()
original_data_for_modelling

Unnamed: 0,BedroomAbvGr,BsmtExposure,BsmtFinType1,GarageFinish,GrLivArea,KitchenQual,LotArea,LotFrontage,OverallCond,OverallQual,TotalBsmtSF,SalePrice,Has2ndFloor,HasMasVnr,HasOpenPorch,HasGarage,ModsMade,SalePriceBand
count,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0,1168.0
mean,2.866438,1.226027,2.6875,0.889555,38.363696,0.726027,9646.455908,68.976027,1.063356,2.777397,1049.327055,180808.898973,0.412671,0.60274,0.446062,0.049658,0.479452,3.560788
std,0.755536,0.997138,1.654092,0.897847,6.142895,0.705183,3561.534341,19.309832,1.595654,1.550112,386.601452,78499.911304,0.492526,0.48954,0.497295,0.217329,0.499792,2.190516
min,0.5,0.0,0.0,0.0,18.275667,0.0,1571.5,27.5,0.0,0.0,82.5,34900.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,1.0,1.0,0.0,33.749064,0.0,7589.5,59.0,0.0,1.0,798.75,130000.0,0.0,0.0,0.0,0.0,0.0,2.0
50%,3.0,1.0,3.0,1.0,38.360135,1.0,9512.5,69.0,0.0,3.0,992.0,163000.0,0.0,1.0,0.0,0.0,0.0,3.0
75%,3.0,1.0,3.0,2.0,42.29066,1.0,11601.5,80.0,2.0,4.0,1276.25,215000.0,1.0,1.0,1.0,0.0,1.0,5.0
max,4.5,4.0,6.0,3.0,52.561868,3.0,17619.5,111.5,8.0,9.0,1992.5,755000.0,1.0,1.0,1.0,1.0,1.0,19.0


In [5]:
original_data_for_modelling.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   BedroomAbvGr   8 non-null      float64
 1   BsmtExposure   8 non-null      float64
 2   BsmtFinType1   8 non-null      float64
 3   GarageFinish   8 non-null      float64
 4   GrLivArea      8 non-null      float64
 5   KitchenQual    8 non-null      float64
 6   LotArea        8 non-null      float64
 7   LotFrontage    8 non-null      float64
 8   OverallCond    8 non-null      float64
 9   OverallQual    8 non-null      float64
 10  TotalBsmtSF    8 non-null      float64
 11  SalePrice      8 non-null      float64
 12  Has2ndFloor    8 non-null      float64
 13  HasMasVnr      8 non-null      float64
 14  HasOpenPorch   8 non-null      float64
 15  HasGarage      8 non-null      float64
 16  ModsMade       8 non-null      float64
 17  SalePriceBand  8 non-null      float64
dtypes: float64(18

---

# Section 1

Section 1 content

---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as they support your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* If you do not need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create here your folder
  # os.makedirs(name='')
except Exception as e:
  print(e)
