<a href="https://colab.research.google.com/github/Piripack/House-price-prediction/blob/main/Untitled22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Property Price Prediction Project

This project aims to predict the average property prices for different types of properties (Detached, Semi-Detached, Terraced, and Flat) in the UK using historical data. The dataset covers property prices from various regions of the UK over the past two decades. The goal of this project is to showcase my skills in data preprocessing, feature engineering, machine learning model development, and evaluation.

## Table of Contents

- [Project Overview](#project-overview)
- [Data Cleaning & Preprocessing](#data-cleaning--preprocessing)
- [Feature Engineering](#feature-engineering)
- [Machine Learning Model](#machine-learning-model)
- [Model Evaluation](#model-evaluation)
- [Visualizations](#visualizations)
- [Conclusion](#conclusion)
- [Future Work](#future-work)

## Data Cleaning & Preprocessing

The dataset is loaded, cleaned, and preprocessed to remove missing values and irrelevant rows. The following steps are carried out:

1. **Convert Date to datetime format**: The 'Date' column is converted to the proper datetime format for easier manipulation.
2. **Remove Data Before 2005**: Data from years prior to 2005 is removed to focus on more recent trends.
3. **Rolling Averages**: A 12-month rolling average is calculated for each property type to smooth out fluctuations and identify long-term trends.

## Feature Engineering

The dataset is enhanced by creating new features such as:
- **Rolling averages** for each property type (Detached, Semi-Detached, Terraced, Flat) to highlight long-term trends.
- **Regional Aggregation**: Regional average prices for property types are computed to examine the regional variations in property prices.

## Machine Learning Model

A Random Forest Regressor model is trained using historical data from 2005-2007 and tested on 2008 data. The model predicts property prices for detached houses, but the same methodology can be extended to other property types. The performance is evaluated using common regression metrics such as RMSE, MAE, and R².

## Model Evaluation

The model is evaluated using the test set (2008 data) and also through performance metrics including:
- **RMSE (Root Mean Squared Error)**
- **MAE (Mean Absolute Error)**
- **R² Score**: To evaluate how well the model fits the data.

## Visualizations

Several visualizations are included to showcase the trends and evaluation results, including:
- **Price Trend Over Time**: Showing the property price trends over time.
- **Feature Importance**: Visualizing the most important features used by the Random Forest model.
- **Actual vs Predicted Prices**: Scatter plot comparing actual vs predicted property prices.
- **Model Evaluation Metrics**: Bar chart comparing RMSE, MAE, and R² scores for the training and testing sets.
- **Residual Plot**: Visualizing the residuals (errors) of the model to check for any patterns.

## Conclusion

This project demonstrates the application of data preprocessing, feature engineering, and machine learning techniques to predict property prices. The model performed well with good evaluation scores and provides a solid foundation for further exploration and improvements.

## Future Work

Future work includes:
- **Hyperparameter tuning** to optimize the Random Forest model.
- **Expanding the model** to predict prices for other property types.
- **Time series forecasting** methods to predict future property prices.


# Project Code:

In [1]:
import pandas as pd

# Load the dataset
file_path = 'Average-prices-Property-Type-2023-12.csv'
df = pd.read_csv(file_path)

# Show the first few rows of the dataset to understand its structure
df.head()


Unnamed: 0,Date,Region_Name,Area_Code,Detached_Average_Price,Detached_Index,Detached_Monthly_Change,Detached_Annual_Change,Semi_Detached_Average_Price,Semi_Detached_Index,Semi_Detached_Monthly_Change,Semi_Detached_Annual_Change,Terraced_Average_Price,Terraced_Index,Terraced_Monthly_Change,Terraced_Annual_Change,Flat_Average_Price,Flat_Index,Flat_Monthly_Change,Flat_Annual_Change
0,1995-01-01,England,E92000001,86314.15895,28.257874,,,51533.22543,27.436474,,,41489.82431,25.279664,,,45218.54082,23.762969,,
1,1995-01-01,Wales,W92000004,66539.58684,32.491063,,,41043.45436,31.399881,,,32506.88477,30.777231,,,34061.27288,34.448112,,
2,1995-01-01,Inner London,E13000001,194483.5365,16.399257,,,121073.17,15.327414,,,87553.48096,14.627111,,,73707.69351,15.492239,,
3,1995-01-01,Outer London,E13000002,160329.9602,22.303302,,,94802.27143,21.065017,,,70087.65516,20.040752,,,58266.86811,21.764751,,
4,1995-01-01,London,E12000007,161449.3055,21.715622,,,95897.5293,20.321394,,,73705.96582,18.023197,,,64618.57236,17.858341,,
