TITLE - What machine learning can tell us about the future of housing markets across Canada

AIM: This project aims to analyze how the housing market is going to evolve province by province.
To do so, data was collected on housing indexes across Canada from 1981-2023 from the following website - https://www.kaggle.com/datasets/noeyislearning/housing-price-indexes
We decided to narrow down our analysis on the last 5 years of data which will first be analyzed via visualizations. Finally, this data will be fed to a machine learning algorithm which will allow us to gain insight on the future.

The cleaned data was stored in a SQL database with the use of pgAdmin4.

VISUALIZATIONS: 1. 

PART 1 - DATA CLEANING

The dataset found in the "Resources" folder is cleaned so that it only contains the necessary information. The cleaned version is saved into the "Cleaned Resources" folder.

In [1]:
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt

In [3]:
#Load the csv files pertaining to housing indices across Canada
housing_index_path = Path("Resources/housing_price.csv") 
housing_index = pd.read_csv(housing_index_path, encoding='ISO-8859-1')

In [4]:
housing_index.head()

Unnamed: 0,REF_DATE,GEO,DGUID,New housing price indexes,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,1981-01,Canada,2016A000011124,Total (house and land),"Index, 201612=100",347,units,0,v111955442,1.1,38.2,,,,1
1,1981-01,Canada,2016A000011124,House only,"Index, 201612=100",347,units,0,v111955443,1.2,36.1,,,,1
2,1981-01,Canada,2016A000011124,Land only,"Index, 201612=100",347,units,0,v111955444,1.3,40.6,E,,,1
3,1981-01,Atlantic Region,2016A00011,Total (house and land),"Index, 201612=100",347,units,0,v111955445,2.1,,..,,,1
4,1981-01,Atlantic Region,2016A00011,House only,"Index, 201612=100",347,units,0,v111955446,2.2,,..,,,1


In [5]:
housing_index.columns

Index(['REF_DATE', 'GEO', 'DGUID', 'New housing price indexes', 'UOM',
       'UOM_ID', 'SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR', 'COORDINATE', 'VALUE',
       'STATUS', 'SYMBOL', 'TERMINATED', 'DECIMALS'],
      dtype='object')

In [7]:
#Drop all unnecessary columns
columns_kept = ["REF_DATE", "GEO", "New housing price indexes", "VALUE"]
housing_index_cleaned = housing_index[columns_kept]



housing_index_cleaned.head()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,VALUE
0,1981-01,Canada,Total (house and land),38.2
1,1981-01,Canada,House only,36.1
2,1981-01,Canada,Land only,40.6
3,1981-01,Atlantic Region,Total (house and land),
4,1981-01,Atlantic Region,House only,


In [11]:
#Rename the necessary columns
housing_index_renamed = housing_index_cleaned.rename(columns={"REF_DATE": "Date", "GEO": "Geography", "VALUE": "Index Value"})

housing_index_renamed.head()

Unnamed: 0,Date,Geography,New housing price indexes,Index Value
0,1981-01,Canada,Total (house and land),38.2
1,1981-01,Canada,House only,36.1
2,1981-01,Canada,Land only,40.6
3,1981-01,Atlantic Region,Total (house and land),
4,1981-01,Atlantic Region,House only,


In [12]:
#Filter the datasetb to begin in 2018 so that we have 5 years of data
housing_index_final = housing_index_renamed[housing_index_renamed['Date'] >= '2018-01']
housing_index_final.reset_index(drop=True, inplace=True)
housing_index_final

Unnamed: 0,Date,Geography,New housing price indexes,Index Value
0,2018-01,Canada,Total (house and land),103.3
1,2018-01,Canada,House only,103.2
2,2018-01,Canada,Land only,103.7
3,2018-01,Atlantic Region,Total (house and land),100.3
4,2018-01,Atlantic Region,House only,100.3
...,...,...,...,...
9835,2024-10,"Vancouver, British Columbia",House only,125.0
9836,2024-10,"Vancouver, British Columbia",Land only,123.5
9837,2024-10,"Victoria, British Columbia",Total (house and land),119.3
9838,2024-10,"Victoria, British Columbia",House only,124.8


In [13]:
#Save all dataset as csv file
housing_index_path = "housing_index_final.csv"
housing_index_final.to_csv(housing_index_path,index=False)

PART 2 - VISUALIZATIONS

The dataset is explored and analyzed via a variety of different visualizations