**Introduction**

House price prediction is a significant problem in real estate and urban planning because property values are influenced by multiple factors such as house size, number of rooms, location quality, and construction year. Accurate price estimation helps buyers, sellers, and policymakers make informed decisions.

**Problem Statement**

Estimating house prices manually is difficult due to the complex interaction of multiple housing features such as square footage, number of bedrooms, bathrooms, lot size, garage size, year built, and neighborhood quality. Traditional valuation methods may be subjective and inconsistent.

**Objectives**

To analyze housing features affecting house prices

To implement Multiple Linear Regression

To apply and compare Polynomial, KNN, and Decision Tree regression models

To evaluate model performance using MAE, MSE, RMSE, and R²

To test predictions on unseen data

Import Libraries

In [21]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import mean_absolute_error,mean_squared_error,mean_squared_error, r2_score


Load the Dataset

In [22]:

import pandas as pd

url = "https://raw.githubusercontent.com/softwareWCU/Machine-Learning-Regression-Models-using-House-Price-Dataset/main/Housing%20Price.csv"

df = pd.read_csv(url)
df.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


Define Features (X) and Target (y)

In [41]:
y = df['price']
X = df.drop('price', axis=1)


Using One-Hot code change the classification in to numerical features.

In [43]:

binary_cols = [
    'mainroad', 'guestroom', 'basement',
    'hotwaterheating', 'airconditioning', 'prefarea'
]


for col in binary_cols:
    df[col] = df[col].map({'yes': 1, 'no': 0})


if 'furnishingstatus' in df.columns:
    df = pd.get_dummies(df, columns=['furnishingstatus'], drop_first=True)

df = df.fillna(0)

for col in df.columns:
    if col != 'price':
        df[col] = df[col].astype(int)


X = df.drop('price', axis=1)
y = df['price']

df.head()


Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus_semi-furnished,furnishingstatus_unfurnished
0,13300000,7420,4,2,3,0,0,0,0,0,2,0,0,0
1,12250000,8960,4,4,4,0,0,0,0,0,3,0,0,0
2,12250000,9960,3,2,2,0,0,0,0,0,2,0,1,0
3,12215000,7500,4,2,2,0,0,0,0,0,3,0,0,0
4,11410000,7420,4,1,2,0,0,0,0,0,2,0,0,0


Train–Test Split

In [25]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training data:", X_train.shape)
print("Testing data (unseen):", X_test.shape)


Training data: (436, 13)
Testing data (unseen): (109, 13)


Feature Scaling

In [26]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


Linear Regression

In [27]:
lr_model = LinearRegression()
lr_model.fit(X_train_scaled, y_train)

y_pred_lr = lr_model.predict(X_test_scaled)


Multiple Linear Regression

In [46]:
mlr_model = LinearRegression()
mlr_model.fit(X_train_scaled, y_train)

y_pred_mlr = mlr_model.predict(X_test_scaled)


Polynomial Regression (Degree = 2)

In [29]:
poly = PolynomialFeatures(degree=2, include_bias=False)

X_train_poly = poly.fit_transform(X_train_scaled)
X_test_poly = poly.transform(X_test_scaled)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

y_pred_poly = poly_model.predict(X_test_poly)


K-Nearest Neighbors (KNN) Regression

In [30]:
knn_model = KNeighborsRegressor(n_neighbors=5)
knn_model.fit(X_train_scaled, y_train)

y_pred_knn = knn_model.predict(X_test_scaled)


Decision Tree Regression

In [31]:
dt_model = DecisionTreeRegressor(
    max_depth=10,
    random_state=42
)

dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)


Evaluation of Matrics

In [32]:
print("Linear Regression")
print("MAE:", mean_absolute_error(y_test, y_pred_lr))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
print("R2:", r2_score(y_test, y_pred_lr))

print("Multiple Linear Regression")
print("MAE:", mean_absolute_error(y_test, y_pred_mlr))
print("MSE:", mean_squared_error(y_test, y_pred_lr))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_mlr)))
print("R2:", r2_score(y_test, y_pred_mlr))

print("Polynomial Regression")
print("MAE:", mean_absolute_error(y_test, y_pred_poly))
print("MSE:", mean_squared_error(y_test, y_pred_poly))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_poly)))
print("R2:", r2_score(y_test, y_pred_mlr))

print("KNN Regression")
print("MAE:", mean_absolute_error(y_test, y_pred_knn))
print("MSE:", mean_squared_error(y_test, y_pred_knn))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_knn)))
print("R2:", r2_score(y_test, y_pred_mlr))

print("Decision Tree Regression")
print("MAE:", mean_absolute_error(y_test, y_pred_dt))
print("MSE:", mean_squared_error(y_test, y_pred_dt))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_dt)))
print("R2:", r2_score(y_test, y_pred_mlr))

Linear Regression
MAE: 970043.4039201642
RMSE: 1324506.9600914402
R2: 0.6529242642153176
Multiple Linear Regression
MAE: 970043.4039201642
MSE: 1754318687330.668
RMSE: 1324506.9600914402
R2: 0.6529242642153176
Polynomial Regression
MAE: 1034749.2706758833
MSE: 1901686413946.449
RMSE: 1379016.466162188
R2: 0.6529242642153176
KNN Regression
MAE: 999594.6055045872
MSE: 1953996997258.202
RMSE: 1397854.4263471079
R2: 0.6529242642153176
Decision Tree Regression
MAE: 1194973.9296636085
MSE: 2551053758132.1855
RMSE: 1597201.8526573859
R2: 0.6529242642153176


Testing with  Unseen Data

In [35]:

new_house = pd.DataFrame(
    data=np.zeros((1, X.shape[1])),
    columns=X.columns
)


new_house['area'] = 1200
new_house['bedrooms'] = 3
new_house['bathrooms'] = 2
new_house['stories'] = 1


new_house['mainroad'] = 1
new_house['guestroom'] = 0
new_house['basement'] = 0
new_house['airconditioning'] = 1
new_house['prefarea'] = 0


new_house['furnishingstatus_semi-furnished'] = 0
new_house['furnishingstatus_unfurnished'] = 1


Results

The dataset consists of 1000 records and 7 input features, with House Price as the target variable. The data was split into 80% training and 20% testing sets. Feature scaling was applied where necessary.

Conclusion

This project successfully demonstrated the application of machine learning regression techniques for house price prediction. Among the models tested, Multiple Linear Regression proved to be an effective baseline model due to its simplicity, interpretability, and stable performance.