# Predict the Sale Price

## Objectives:

- Our objective for this notebook is to:
    - Train an ML pipeline using hyperparameter optimization.
    - Use the best features to predict a property sale price.

## Tasks:

- Loading the data.
- Creating the ML Regressor Pipelines.
- Split Train and Test sets.
- Grid Search CV - Sklearn. 

### Inputs:

- outputs/datasets/cleaned/clean_house_price_records.csv

### Outputs:

- Train set (Features and target).
- Test set (Features and target).
- ML Pipeline to predict the sale price.
- Feature importance Plot.

### Additional comments:

+ This notebook was written based on the guidelines provided in the walk through project 2: 'Churnometer'.
+ This notebook relates to the Data Understanding step of Crisp-DM methodology. 
+ This notebook and the following will represent the learning outcome after following the Code Institute - Predictive Analytics and Machine Learning module.

___

## Change the working directory:

- In the following steps we will modify the working directory from its current folder to its parent folder.
- Access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/PP5-Predictive-Analysis/jupyter_notebooks'

Make the parent of the current directory the new current directory:

- os.path.dirname() gets the parent directory.
- os.chir() defines the new current directory.

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory, congrats!")

You set a new current directory, congrats!


The following command will confirm the new current directory: 

In [3]:
current_dir = os.getcwd()
current_dir

'/workspace/PP5-Predictive-Analysis'

___

## Import the necessary packages and set environmental variables:

In [4]:
import numpy as np
import pandas as pd
pd.options.display.max_columns = None
pd.options.display.max_rows = None

____

## Load the house price records from our cleaned dataset.

- Transform the house_prices_records dataset into a pandas data frame.

In [5]:
df = pd.read_csv(f"outputs/datasets/cleaned/clean_house_price_records.csv")
print(df.shape)
df.head()

(1460, 22)


Unnamed: 0,1stFlrSF,2ndFlrSF,BedroomAbvGr,BsmtExposure,BsmtFinSF1,BsmtFinType1,BsmtUnfSF,GarageArea,GarageFinish,GarageYrBlt,GrLivArea,KitchenQual,LotArea,LotFrontage,MasVnrArea,OpenPorchSF,OverallCond,OverallQual,TotalBsmtSF,YearBuilt,YearRemodAdd,SalePrice
0,856,854,3,No,706,GLQ,150,548,RFn,2003,1710,Gd,8450,65,196,61,5,7,856,2003,2003,208500
1,1262,0,3,Gd,978,ALQ,284,460,RFn,1976,1262,TA,9600,80,0,0,8,6,1262,1976,1976,181500
2,920,866,3,Mn,486,GLQ,434,608,RFn,2001,1786,Gd,11250,68,162,42,5,7,920,2001,2002,223500
3,961,0,0,No,216,ALQ,540,642,Unf,1998,1717,Gd,9550,60,0,35,5,7,756,1915,1970,140000
4,1145,0,4,Av,655,GLQ,490,836,RFn,2000,2198,Gd,14260,84,350,84,5,8,1145,2000,2000,250000


___