<a href="https://colab.research.google.com/github/CLcosep/House_Price_Prediction/blob/main/testdrive_nako_kay_gwapo_ko.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **House Pricing Prediction**

### **Import Library**

In [38]:
import pandas as pd
import numpy as np

**Load dataset**

In [52]:
!wget -O ph_house.csv https://raw.githubusercontent.com/CLcosep/House_Price_Prediction/refs/heads/main/ph_house.csv

--2025-01-03 10:20:29--  https://raw.githubusercontent.com/CLcosep/House_Price_Prediction/refs/heads/main/ph_house.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 480 [text/plain]
Saving to: ‘ph_house.csv’


2025-01-03 10:20:29 (25.5 MB/s) - ‘ph_house.csv’ saved [480/480]



In [39]:
# read csv file
data = pd.read_csv('ph_house.csv')
print(data.head())

      Location  Size (sqm)  Bedrooms  Bathrooms  Year Built  Lot Size (sqm)  \
0       Makati          50         1          1        2015              75   
1  Quezon City          80         2          2        2010             100   
2        Pasig         120         3          3        2005             150   
3       Taguig         200         4          3        2018             250   
4    Cebu City          90         3          2        2012             120   

   Price (PHP)  
0      6500000  
1      8500000  
2     12500000  
3     18000000  
4     10000000  


**Split dataset to X and Y variables**

In [40]:
# age feature for more accuracy
current_year = 2025
data['Age'] = current_year - data['Year Built']

features = ['Location', 'Size (sqm)', 'Bedrooms', 'Lot Size (sqm)', 'Age']
X = data[features] #feature
y = data['Price (PHP)'] #target

# perform one_hot_coding
X = pd.get_dummies(X, columns=['Location'], drop_first=True)

**Data split**

In [41]:
# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Import scikit-learn Library**

In [42]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

**Define the regression model**

In [43]:
# model training
model = LinearRegression()
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Coefficients: [ -62221.19563339  -28903.26113256  111777.41847168  -37778.80436661
 -368912.50043206 -755576.08733223 1277764.94541736  159990.7607005
  115497.55328645]
Intercept: 2764451.630566312


**Prediction results**

In [44]:
# run prediction
y_pred = model.predict(X_test)

# evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R2 Score: ", r2)

rmse = np.sqrt(mse)
print("Root Mean Squared: ", rmse)

Mean Squared Error: 12042239340599.031
R2 Score:  0.5993339960614
Root Mean Squared:  3470192.983192582


test run


In [45]:
# test run on the trained model
test_input = pd.DataFrame({
    'Location': ['Cebu'],
    'Size (sqm)': [112],
    'Bedrooms': [3],
    'Year Built': [2025],
    'Lot Size (sqm)': [120]
})

# test_input['Age'] = current_year - test_input['Year Built']

test_input = pd.get_dummies(test_input, columns=['Location'])

for col in X.columns:
    if col not in test_input.columns:
        test_input[col] = 0

test_input = test_input[X.columns]  # Reorder columns to match training data

# Predict the price
predicted_price = model.predict(test_input)
print("Predicted Price (PHP):", predicted_price[0])

Predicted Price (PHP): 9122258.152831446


**Save model**

In [49]:
# save model for reusability
from joblib import dump
from joblib import load
dump(model, 'house_price_model.joblib') #save model

['house_price_model.joblib']

In [50]:
# load saved model
loaded_model = load('house_price_model.joblib')
y_pred = loaded_model.predict(X_test)

Running model

In [51]:
# running to test saved model
# New data for multiple houses
new_data = pd.DataFrame({
    'Location': ['Makati'],
    'Size (sqm)': [450],
    'Bedrooms': [5],
    'Lot Size (sqm)': [315],
    'Age': [10]
})

# Preprocess the new data
new_data = pd.get_dummies(new_data, columns=['Location'], drop_first=True)

# Add missing columns and reorder to match the training data
missing_cols = set(loaded_model.feature_names_in_) - set(new_data.columns)
for col in missing_cols:
    new_data[col] = 0
new_data = new_data[loaded_model.feature_names_in_]

# Predict house prices
predictions = loaded_model.predict(new_data)

# Print results
print("Predicted House Prices:", predictions)


Predicted House Prices: [9452496.06479387]
