# House Price Analytics

## 05 Data analysis for generating foresights

**Project:** Code Institute â€“ Capstone Project

---
### **Objectives**
- Load the final house dataset
- Build a Machine Learning model to predict house prices with high accuracy

### **Inputs**
- `/data/models/house_price_model.pkl`

### **Outputs**
- Trained and finetuned Model to power a "Price Estimator" dashboard feature that gives Buyers and Sellers a realistic price range (Min, Average, Max).
        
### **Additional Comments**
Confirm the final_house_data.csv is exisit under outputs/datasets. Run this notebook top-down.

---

### Setup the file and Load the Dataset
Import nesessary libraries

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib

# Scikit-Learn
from sklearn.model_selection import train_test_split, KFold, cross_val_score, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

# Feature Engine
from feature_engine.selection import DropCorrelatedFeatures, SmartCorrelatedSelection
from feature_engine.encoding import OneHotEncoder

# Ignore future warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning) 

Set the home directory. Need to change the working directory from its current folder to its parent folder. Access the current directory with os.getcwd()

In [3]:
PROJECT_DIR = os.path.join(os.getcwd()) # Define the project root directory
os.chdir(PROJECT_DIR) # Change the current working directory
# Uncomment the line below to verify the current working directory
# print("Working directory:", os.getcwd()) 

Load the data from the original data set reside within data directory under data/processed/ directory.

In [4]:
# LOAD DATASET
try:
    # Data directory paths
    data_path = os.path.join("..", "data", "processed")
    # Extract the original dataset
    df = pd.read_csv(os.path.join(data_path, "final_house_data.csv"))
    print("Dataset loaded successfully.")
except Exception as e:
    print(e)
    print("Error loading the dataset.")
    df = pd.DataFrame()  # Create an empty DataFrame if loading fails

print(f"Original dataset shape: {df.shape}")

Dataset loaded successfully.
Original dataset shape: (21596, 31)


---