RealValue is a machine learning project for predicting home prices in Toronto. Using TensorFlow convolutional neural networks in conjunction with a dense network component, owners can take a couple of pictures of their home, enter a few simple details and they will be provided an accurate price range of what their home is worth. This ease of use allows homeowners to be confident about their residential decisions and be more informed about the real estate market than ever before.
For more details about our project, please take a look at our Medium article .
Also, check out our website at real-value.ca to try out some of our algorithms!
Motivation and Goal
In the field of real estate, the idea of predicting the "right price" for a property is growing heavily in interest. Most current algorithms solely use statistical information about given properties as a form of input to predict its right price. However, these algorithms fail to include a notable form of data that often influences the perception of a buyer: visual data of the house. Recently, convolutional neural networks (CNNs) have increased in prominence for their ability to generate strong feature representations out of images and use those representations to accurately map visual inputs to scalar/vectorized outputs.
Our goal was to create a custom convolutional neural network to accurately predict Toronto housing prices with less than 20% error.
Features and Overview
- Combined CNN and dense network model
- Easy to swap CNN model architectures
- Easy to change dense network size
- Transfer learning using California and Toronto housing datasets
- California Dataset
- Also modified dataset to use latitude/longitude values in place of postal codes
- Custom Toronto Dataset we collected in February 2021 (157 houses)
- Image data augmentation (crop, rotate, mirroring, saturation, brightness)
- Inputs to Network:
- 2x2 Mosaic image (bedroom, bathroom, kitchen, frontal view)
- Price, Number of bedrooms, bathrooms, square feet, and postal code
- Configurable training (hyperparameters, model architecture) with
Installation and Quick Setup
To download our code:
git clone “url”
tensorflow 2.3.0, matplotlib, opencv-python, numpy, pandas, keras, sklearn
To install the dependent packages, run:
pip install -r requirements.txt
We initially trained our network on a dataset of California houses, created by Ahmed and Moustafa, consisting of both structured data (statistical property information in tabular form) and unstructured data (images). This dataset contains information for over 500 houses, each with 4 images of a bathroom, bedroom, kitchen, and frontal view. Statistical information for each house includes the number of bedrooms, bathrooms, square footage, postal code, and price.
We created our own Toronto real estate dataset by compiling the images, prices, number of rooms, surface areas, and postal codes of houses on the Toronto Regional Real Estate Board (TRREB) website. A Python script was used to accurately calculate the area of the house from provided measurements of each room. Like the California dataset, in our Toronto dataset, we had four images for each house for the frontal view, bedroom, bathroom, and kitchen.
Training the model with
Our model’s hyperparameters are stored in a config.yaml file. To start training, modify the
config.yaml if needed and issue the following command
Since data augmentation can take considerable time, we can set the
config.yaml to skip augmentation to start training immediately.
On the first run, set
import_mode: False in
config.yaml to perform data augmentation. On future runs, you can set
import_mode: True to skip data augmentation and use previous augmented data. You can always use
import_mode: False without issues; it just might be slower.
Note: If you switch/modify the dataset or augmentation multiplier, make sure to use
import_mode: True for the first run.
Adding/choosing different model architectures and other hyperparameters
To change hyperparameters like learning rate, optimizer, etc change the parameters on the corresponding lines in the
In particular, the CNN model and dense model layers are set by the following lines
# Train using RegNet as CNN and a 2 layer dense network (8 units in first layer, 4 units in second layer) CNN_model: 'RegNet' dense_model: - 8 - 4
The number of dense layers and their size can be changed using
Changing the CNN network is more involved, but still straightforward. If you want to add your
CustomNet, follow the instructions below. As a basic working example, check out how we defined
LeNet as a CNN in
models/CNN_models/lenet.py and then used it in
Define a function that returns your custom CNN as a
tf.keras.Model in a new file at
models/__init__.py to call your new function with your custom CNN
config.yaml to have `CNN_model: 'CustomNet'
Initial training on California Dataset
To train on the California dataset, specify
directory: 'raw_dataset' in the first line of the
config.yaml file. The California dataset is located in the
Transfer learning on Toronto Dataset
To apply transfer learning on the Toronto Dataset, specify
directory: 'toronto_raw_dataset' in the first line of the
config.yaml file. The Toronto dataset is located in the
Remember to set
import_mode: False in between switching datasets.
We achieved a test error of 23% using a Zip Code approach on the California dataset, and a test error of 17% using a Latitude and Longitude approach.
The Zip Code accuracy is nearly 4% better compared to contemporary approaches such as https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/.