SG-Property-Price-Prediction

About Project

In this personal project, I attempt the different stages of an end-to-end data science project. This project uses a dataset from the Urban Redevelopment Authority (URA) API. It contains over 100,000 private property transactions between 2017-2022 in Singapore.

Summary of Project

Outline:

Acquiring Dataset from REST API (see API_data_extraction.py)
Data Processing, Feature Extraction & Exploratory Data Analysis (see Exploratory_Data_Analysis.ipynb)
Model Training, Experimentation and Selection (see Model_selection_experimentation.ipynb)
Created an app on Streamlit that allows users to try generating predictions for different inputs using the model. (Clone repo and run streamlit run app.py)

Key Findings:

After pivoting the project to only focus on Condominiums with lease within 100 years, prediction accuracy improved significantly (MAE reduced by 82.4%). Focusing only on landed properties did not achieve any improvement. This could mean the prices of landed property rely more heavily on other features not considered in this dataset.
Best performing model type: After testing Tree-based (Random Forest and XGBoost), Linear (Ridge Regression) and Neural Network (Multi-layer Perceptron), Random Forest model performed the best.
Limitations: Some properties are highly unlikely to be transacted (E.g. Units that are too old), and will not be represented in the dataset. The final model trained did not perform well in predicting properties that are not well represented by data points (properties with lease left < 40 years). Due to this limitation, the model cannot perform reliably in extrapolating the value of property over time.

Possible future iterations:

Merging of dataset with data from OneMap API, to get additional features such as proximity of property to facilities and amenities.

Personal reflections:

Completing this project allowed me to get my hands dirty with multiple python data science packages. Finding a dataset myself instead of picking one from Kaggle allowed me to learn methods of obtaining data (tried webscraping and APIs), and methods of cleaning and processing the data myself.
While the decisions I made during the data processing and model training and selection may not be the best, they allowed me to practice searching the web and learning from other projects and articles to decide the next step I should try.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Data		Data
.gitattributes		.gitattributes
API_data_extraction.py		API_data_extraction.py
Exploratory_Data_Analysis.ipynb		Exploratory_Data_Analysis.ipynb
Model_selection_experimentation.ipynb		Model_selection_experimentation.ipynb
README.md		README.md
app.py		app.py
best_model.pkl		best_model.pkl
final_features.pkl		final_features.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SG-Property-Price-Prediction

About Project

Summary of Project

Outline:

Key Findings:

Possible future iterations:

Personal reflections:

About

Languages

RoydonTay/SG-Property-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

SG-Property-Price-Prediction

About Project

Summary of Project

Outline:

Key Findings:

Possible future iterations:

Personal reflections:

About

Topics

Resources

Stars

Watchers

Forks

Languages