# Real Estate Price Prediction Using Machine Learning

## Executive Summary
Accurately estimating house prices is a criticalaspect of the real estate industry. Understanding what drives property calue helps inform pricing startegies, investment decisions and negotiations. This project leverages machine learning techniques to predict residential house prices. The goal is to build a predictive model that captures market patterns and outputs realistic, data driven price estimates.

### Problem Statement

Accurately estimating the price of residential properties is a key challenge in the real estate market. Buyers want fair deals, sellers want competitive pricing, and investors deek undervalued opportunities. This project aims to build a machine learning model that predicts house prices based on key property features such as size, location, condition and amenities.

Using the "House Price Prediction" dataset from Kaggle, the goal is to develop and evaluate regression models that can learn from historical housing data and provide reliable price estimates for new, unseen properties.

### Objectives
1. To support real estate agents and property developers in pricing homes more accurately, reducing the risk of overpricing or undervaluing properties.

2. To help home buyers and sellers make informed decisions by providing data-driven estimates of property values based on key housing features and market factors.

3. To uncover and analyze the key drivers of property value such as location, square footage, number of bedrooms/bathrooms, and neighborhood conditions.

4. To reduce manual valuation time and subjectivity by offering an automated prediction system that complements or enhances traditional property appraisal methods.

5. To identify pricing trends and anomalies within a local housing market, assisting stakeholders in spotting investment opportunities or areas of concern.

6. To simulate the impact of property improvements (e.g., renovations or additional rooms) on house value, guiding property owners on which upgrades yield the highest return.

7. To build a predictive tool that can be used by real estate platforms to enhance customer experience by offering instant price estimates on property listings.



### This project will involve:
    A[Data Loading] --> B[Data Cleaning and Preprocessing]
    B --> C[Exploratory Data Analysis(EDA)]]
    C --> D[Feature engineering]
    D --> E[Model Training and Evaluation]
    E --> F[Conclusions & Recommendations]

In [1]:
#importing the necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns  
import matplotlib.pyplot as plt  
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, RandomizedSearchCV, cross_validate,cross_val_score
from sklearn.metrics import accuracy_score,precision_score, recall_score, f1_score, precision_recall_curve, precision_score, recall_score, roc_auc_score, confusion_matrix, ConfusionMatrixDisplay,make_scorer,roc_curve, auc
from sklearn.ensemble import RandomForestClassifier
#from imblearn.over_sampling import SMOTE
import joblib


### 1. Loading and inspecting the data

In [2]:
#Load the dataset in Python using pandas and inspect the first few rows

df=pd.read_csv("Data/HousingData.csv")
df.head(10)

Unnamed: 0,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,sqft_above,sqft_basement,yr_built,yr_renovated,street,city,statezip,country
0,2014-05-02 00:00:00,313000.0,3.0,1.5,1340,7912,1.5,0,0,3,1340,0,1955,2005,18810 Densmore Ave N,Shoreline,WA 98133,USA
1,2014-05-02 00:00:00,2384000.0,5.0,2.5,3650,9050,2.0,0,4,5,3370,280,1921,0,709 W Blaine St,Seattle,WA 98119,USA
2,2014-05-02 00:00:00,342000.0,3.0,2.0,1930,11947,1.0,0,0,4,1930,0,1966,0,26206-26214 143rd Ave SE,Kent,WA 98042,USA
3,2014-05-02 00:00:00,420000.0,3.0,2.25,2000,8030,1.0,0,0,4,1000,1000,1963,0,857 170th Pl NE,Bellevue,WA 98008,USA
4,2014-05-02 00:00:00,550000.0,4.0,2.5,1940,10500,1.0,0,0,4,1140,800,1976,1992,9105 170th Ave NE,Redmond,WA 98052,USA
5,2014-05-02 00:00:00,490000.0,2.0,1.0,880,6380,1.0,0,0,3,880,0,1938,1994,522 NE 88th St,Seattle,WA 98115,USA
6,2014-05-02 00:00:00,335000.0,2.0,2.0,1350,2560,1.0,0,0,3,1350,0,1976,0,2616 174th Ave NE,Redmond,WA 98052,USA
7,2014-05-02 00:00:00,482000.0,4.0,2.5,2710,35868,2.0,0,0,3,2710,0,1989,0,23762 SE 253rd Pl,Maple Valley,WA 98038,USA
8,2014-05-02 00:00:00,452500.0,3.0,2.5,2430,88426,1.0,0,0,4,1570,860,1985,0,46611-46625 SE 129th St,North Bend,WA 98045,USA
9,2014-05-02 00:00:00,640000.0,4.0,2.0,1520,6200,1.5,0,0,3,1520,0,1945,2010,6811 55th Ave NE,Seattle,WA 98115,USA


This dataset contains detailed information on residential properties, including their physical attributes, location and sale prices. Key attributes in the dataset include:
- `Price`: The target variable representing the sale price of the house.

- `Bedrooms` & `Bathrooms`: Number of bedrooms and bathrooms in the property.

- `Living Area` & `Lot Size`: Square footage of the interior living space and the overall lot.

- `Floors`: Number of floors in the house.

- `Waterfront`, `View`, and `Condition`: Qualitative indicators of whether the house has a waterfront view, general view quality, and condition rating.

- `Year Built` & `Year Renovated`: Construction year and the year of last major renovation (if any).

- `Location Information`: Street address, city, state, and ZIP code.

In [3]:
# Checking the shape of the dataset (rows, columns)

df.shape

(4600, 18)