Analysing the Condomium Market in Singapore

This repo contains the code for our final project in CS5228 at the School of Computing, National University of Singapore.

Our team is named Data Nerds and our members are Aiden Low, Raivat Shah and Reshma Jawale.

System Requirements

Python v3.9 and above
Jupyter Notebook (you might already have this if you have anaconda. Check this guide to install if you don't have it already)
If you're planning to run this on a remote machine, try to optimise/pick machines with a better CPU as against memory (RAM) or GPU as our evidence suggests RAM and GPU aren't as crucial to this code as much as the CPU. We trained using a CPU-optimised Droplet from Digital Ocean.

Setup

Clone the repo (replace the link if you prefer HTTPS links):

git clone git@github.com:aidenywl/data-mining-property-prices.git 
cd data-mining-property-prices

Setup a virtual environment and install the required libraries:

python -m venv env
source env/bin/activate 
pip install -r requirements.txt

Project Stucture

All the code is under the src folder. We have two types of code in this project, library and notebooks.

Notebooks:

We organize our notebooks around the tasks we complete in our report. Notebooks are in /src/. Each of our notebooks contain all the models code in one file.

Approach 1.ipnyb: contains the code for approach 1
Approach 2A.ipnyb: contains the code for approach 2A
Approach 2B.ipnyb: contains the code for approach 2B
Approach 3A.ipnyb: contains the code for approach 3A
Approach 3B.ipnyb: contains the code for approach 3A
EDA.ipnyb: contains the code for our exploratory data anlysis and for generating the charts used in our report
feature_analysis.ipnyb: contains the code for feature analysis

Library Code:

We abstract out common functions used in all the notebooks under a library package under /src/. Individual files are at /src/library_code.

auxiliary.py contains code for processing the auxiliary data
cleaning.py contains code for cleaning the primary dataset
constants.py contains the constant to store column names we ignore
imputation.py contains the code for imputation of values

Data:

All the data is found in /data/.

test.csv and train.csv are part of the primary dataset and are respectively the test and training sets.
/data/additional_data contains the data for consumer price index.
/data/auxiliary-data contains the data for distances to prominent places in Singapore

Predictions:

All predictions are by default stored in /predictions/ for clear distinctions between types of files.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
predictions		predictions
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysing the Condomium Market in Singapore

System Requirements

Setup

Project Stucture

About

Releases

Packages

Contributors 3

Languages

aidenywl/data-mining-property-prices

Folders and files

Latest commit

History

Repository files navigation

Analysing the Condomium Market in Singapore

System Requirements

Setup

Project Stucture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages