We will analyze the real estate market structure in Canada with the help of affinity propagation, an unsupervised clustering technique,
where we group local real estate markets that possess similar historical price fluctuations, the final result will be a market map revealing zones of strength and more risky areas as real estate investments.

Data was downloaded on the Canadian Gov website: https://open.canada.ca/data/en.
We also downloaded data from https://stats.crea.ca/en-CA/, https://www.crea.ca/housing-market-stats/mls-home-price-index/
where HPI -home price index- data is available. According to StatCan website, Home Price Index is the most advanced and accurate tool
to gauge home prices and trends.

(All this data might be used in the near future for a ML prediction bot of Canadian Real Estate price levels along with
unemployment, GDP growth, real interest rate, inflation % change, cost of materials/raw goods).

In [2]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

from sklearn import cluster, covariance, manifold
import seaborn as sns

In [3]:
#After comparing all the csv downloaded, we notice that the file labeled a_18100202.csv downloaded here: 
# https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1810020502&pickMembers%5B0%5D=1.1&cubeTimeFrame.startMonth=01&cubeTimeFrame.startYear=1982&referencePeriods=19820101%2C19820101
# does include more than 50000 lines with data from 1981 whereas the other files contain less than 1000 lines of data. 
# Therefore, this is the csv we will work with.

data = pd.read_csv(r'C:\Users\hp\Desktop\Projects Coding\Affinity_Propagation_Canada_Real_Estate_Market\Real Estate Data\a_18100205.csv')

# index, 201612 = 100

#Previewing the data
data.head(20)

Unnamed: 0,REF_DATE,GEO,DGUID,New housing price indexes,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,1981-01,Canada,2016A000011124,Total (house and land),"Index, 201612=100",347,units,0,v111955442,1.1,38.2,,,,1
1,1981-01,Canada,2016A000011124,House only,"Index, 201612=100",347,units,0,v111955443,1.2,36.1,,,,1
2,1981-01,Canada,2016A000011124,Land only,"Index, 201612=100",347,units,0,v111955444,1.3,40.6,E,,,1
3,1981-01,Atlantic Region,2016A00011,Total (house and land),"Index, 201612=100",347,units,0,v111955445,2.1,,..,,,1
4,1981-01,Atlantic Region,2016A00011,House only,"Index, 201612=100",347,units,0,v111955446,2.2,,..,,,1
5,1981-01,Atlantic Region,2016A00011,Land only,"Index, 201612=100",347,units,0,v111955447,2.3,,..,,,1
6,1981-01,Newfoundland and Labrador,2016A000210,Total (house and land),"Index, 201612=100",347,units,0,v111955448,3.1,,..,,,1
7,1981-01,Newfoundland and Labrador,2016A000210,House only,"Index, 201612=100",347,units,0,v111955449,3.2,,..,,,1
8,1981-01,Newfoundland and Labrador,2016A000210,Land only,"Index, 201612=100",347,units,0,v111955450,3.3,,..,,,1
9,1981-01,"St. John's, Newfoundland and Labrador",2011S05031,Total (house and land),"Index, 201612=100",347,units,0,v111955451,4.1,36.1,,,,1


In [4]:
"""
An index HPI is attributed to each region on a monthly basis since January 1981.
The GEO column includes string of specific Regions, cities or whole province, we extract unique values of this column here:
"""

#Extact unique string values in the GEO column
locations = data.GEO.unique()

#Converting the array to a list 
locations_list = locations.tolist()
print(locations_list)

['Canada', 'Atlantic Region', 'Newfoundland and Labrador', "St. John's, Newfoundland and Labrador", 'Prince Edward Island', 'Charlottetown, Prince Edward Island', 'Nova Scotia', 'Halifax, Nova Scotia', 'New Brunswick', 'Saint John, Fredericton, and Moncton, New Brunswick', 'Quebec', 'Québec, Quebec', 'Sherbrooke, Quebec', 'Trois-Rivières, Quebec', 'Montréal, Quebec', 'Ottawa-Gatineau, Quebec part, Ontario/Quebec', 'Ontario', 'Ottawa-Gatineau, Ontario part, Ontario/Quebec', 'Oshawa, Ontario', 'Toronto, Ontario', 'Hamilton, Ontario', 'St. Catharines-Niagara, Ontario', 'Kitchener-Cambridge-Waterloo, Ontario', 'Guelph, Ontario', 'London, Ontario', 'Windsor, Ontario', 'Greater Sudbury, Ontario', 'Prairie Region', 'Manitoba', 'Winnipeg, Manitoba', 'Saskatchewan', 'Regina, Saskatchewan', 'Saskatoon, Saskatchewan', 'Alberta', 'Calgary, Alberta', 'Edmonton, Alberta', 'British Columbia', 'Kelowna, British Columbia', 'Vancouver, British Columbia', 'Victoria, British Columbia']


In [5]:
#DGUID is Dissemination Geography Unique Identifier basically an identifier linking geospatial data with statistical data.
#We will still drop DGUID column as GEO column given enough information.
#Let's drop useless columns: 'DGUID','SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR', 'COORDINATE','STATUS','SYMBOL','TERMINATED', 'UOM_ID','DECIMALS'

df = data.drop(columns=['DGUID','SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR', 'COORDINATE','STATUS','SYMBOL','TERMINATED', 'UOM_ID','DECIMALS'], axis = 1)
df.head()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,VALUE
0,1981-01,Canada,Total (house and land),"Index, 201612=100",38.2
1,1981-01,Canada,House only,"Index, 201612=100",36.1
2,1981-01,Canada,Land only,"Index, 201612=100",40.6
3,1981-01,Atlantic Region,Total (house and land),"Index, 201612=100",
4,1981-01,Atlantic Region,House only,"Index, 201612=100",


In [6]:
#House Only Dataframe, as you may notice "New housing price indexes" includes three rows of data for each month, Total, House & Land
#We select only the rows including the "House only" data.

filter_list = ["House only"]
house_only_df = df[df["New housing price indexes"].isin(filter_list)]

# #We add '_' to Province names to simplify issues when filtering data 
# # to avoid confusion for example: Edmonton, Alberta VS Alberta
# house_only_df['GEO'] = house_only_df['GEO'].str.replace('Edmonton, Alberta', 'Edmonton, _Alberta')

house_only_df = house_only_df.reset_index()
#   house_only_df contains 20039 rows.
house_only_df.tail()

Unnamed: 0,index,REF_DATE,GEO,New housing price indexes,UOM,VALUE
20035,60106,2022-09,"Edmonton, Alberta",House only,"Index, 201612=100",113.5
20036,60109,2022-09,British Columbia,House only,"Index, 201612=100",128.3
20037,60112,2022-09,"Kelowna, British Columbia",House only,"Index, 201612=100",128.5
20038,60115,2022-09,"Vancouver, British Columbia",House only,"Index, 201612=100",127.6
20039,60118,2022-09,"Victoria, British Columbia",House only,"Index, 201612=100",132.1


We rely on the strings list extracted before to generate new dataframes for each region. 

The dataframes will subsequently be concatenated to aggregate all data.

List:  

['Atlantic Region', 'Newfoundland and Labrador', "St. John's, Newfoundland and Labrador", 'Prince Edward Island', 'Charlottetown, Prince Edward Island', 'Nova Scotia', 'Halifax, Nova Scotia', 'New Brunswick', 'Saint John, Fredericton, and Moncton, New Brunswick', 'Quebec', 'Québec, Quebec', 'Sherbrooke, Quebec', 'Trois-Rivières, Quebec', 'Montréal, Quebec', 'Ottawa-Gatineau, Quebec part, Ontario/Quebec', 'Ontario', 'Ottawa-Gatineau, Ontario part, Ontario/Quebec', 'Oshawa, Ontario', 'Toronto, Ontario', 'Hamilton, Ontario', 'St. Catharines-Niagara, Ontario', 'Kitchener-Cambridge-Waterloo, Ontario', 'Guelph, Ontario', 'London, Ontario', 'Windsor, Ontario', 'Greater Sudbury, Ontario', 'Prairie Region', 'Manitoba', 'Winnipeg, Manitoba', 'Saskatchewan', 'Regina, Saskatchewan', 'Saskatoon, Saskatchewan', 'Alberta', 'Calgary, Alberta', 'Edmonton, Alberta', 'British Columbia', 'Kelowna, British Columbia', 'Vancouver, British Columbia', 'Victoria, British Columbia']

In [7]:
#Filtering data of "Atlantic Region" and storing in atlantic_df

atlantic = house_only_df[house_only_df["GEO"].str.contains("Atlantic Region")]

atlantic.set_index("REF_DATE", inplace = True)
atlantic = atlantic.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
atlantic = atlantic.rename({'VALUE': 'HPI_atlantic'}, axis =1 )

#Index resetting
atlantic = atlantic.reset_index()

#Saving df in 'Processed Data' folder
atlantic.to_csv('Processed Data/atlantic.csv')
#Previewing
atlantic.tail()


Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_atlantic
496,2022-05,Atlantic Region,House only,"Index, 201612=100",120.9
497,2022-06,Atlantic Region,House only,"Index, 201612=100",121.1
498,2022-07,Atlantic Region,House only,"Index, 201612=100",121.1
499,2022-08,Atlantic Region,House only,"Index, 201612=100",121.1
500,2022-09,Atlantic Region,House only,"Index, 201612=100",121.2


In [8]:
#Filtering data of 'Newfoundland and Labrador' 

nfland_labrador = house_only_df[house_only_df["GEO"].str.contains("St. John's, Newfoundland and Labrador")]
nfland_labrador.set_index("REF_DATE", inplace = True)
nfland_labrador = nfland_labrador.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
nfland_labrador = nfland_labrador.rename({'VALUE': 'HPI_nfland_labrador'}, axis =1 )

#Index resetting
nfland_labrador = nfland_labrador.reset_index()

#Saving df in 'Processed Data' folder
nfland_labrador.to_csv('Processed Data/nfland_labrador.csv')
#Previewing
nfland_labrador.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_nfland_labrador
496,2022-05,"St. John's, Newfoundland and Labrador",House only,"Index, 201612=100",106.7
497,2022-06,"St. John's, Newfoundland and Labrador",House only,"Index, 201612=100",107.4
498,2022-07,"St. John's, Newfoundland and Labrador",House only,"Index, 201612=100",107.4
499,2022-08,"St. John's, Newfoundland and Labrador",House only,"Index, 201612=100",107.4
500,2022-09,"St. John's, Newfoundland and Labrador",House only,"Index, 201612=100",108.1


In [9]:
'Prince Edward Island'

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [10]:
#Filtering data of 'Charlottetown, Prince Edward Island' 

charlottetown_pei = house_only_df[house_only_df["GEO"].str.contains("Charlottetown, Prince Edward Island")]
charlottetown_pei.set_index("REF_DATE", inplace = True)
charlottetown_pei = charlottetown_pei.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
charlottetown_pei = charlottetown_pei.rename({'VALUE': 'HPI_charlottetown_pei'}, axis =1 )

#Index resetting
charlottetown_pei = charlottetown_pei.reset_index()

#Saving df in 'Processed Data' folder
charlottetown_pei.to_csv('Processed Data/nfland_labrador.csv')
#Previewing
charlottetown_pei.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_charlottetown_pei
496,2022-05,"Charlottetown, Prince Edward Island",House only,"Index, 201612=100",125.5
497,2022-06,"Charlottetown, Prince Edward Island",House only,"Index, 201612=100",126.9
498,2022-07,"Charlottetown, Prince Edward Island",House only,"Index, 201612=100",127.3
499,2022-08,"Charlottetown, Prince Edward Island",House only,"Index, 201612=100",127.3
500,2022-09,"Charlottetown, Prince Edward Island",House only,"Index, 201612=100",127.0


In [11]:
'Nova Scotia' 

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [12]:
#Filtering data of 'Halifax, Nova Scotia' 

halifax_ns = house_only_df[house_only_df["GEO"].str.contains("Halifax, Nova Scotia")]
halifax_ns.set_index("REF_DATE", inplace = True)
halifax_ns = halifax_ns.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
halifax_ns = halifax_ns.rename({'VALUE': 'HPI_halifax_ns'}, axis =1 )

#Index resetting
halifax_ns = halifax_ns.reset_index()

#Saving df in 'Processed Data' folder
halifax_ns.to_csv('Processed Data/halifax_ns.csv')
#Previewing
halifax_ns.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_halifax_ns
496,2022-05,"Halifax, Nova Scotia",House only,"Index, 201612=100",127.5
497,2022-06,"Halifax, Nova Scotia",House only,"Index, 201612=100",127.5
498,2022-07,"Halifax, Nova Scotia",House only,"Index, 201612=100",127.5
499,2022-08,"Halifax, Nova Scotia",House only,"Index, 201612=100",127.5
500,2022-09,"Halifax, Nova Scotia",House only,"Index, 201612=100",127.5


In [13]:
'New Brunswick'

"""
We will not take into consideration the Provinces data in our first attempt.
"""


'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [14]:
#Filtering data of 'Saint John, Fredericton, and Moncton, New Brunswick' 

stjohn_fredericton_moncton = house_only_df[house_only_df["GEO"].str.contains('Saint John, Fredericton, and Moncton, New Brunswick')]
stjohn_fredericton_moncton.set_index("REF_DATE", inplace = True)
stjohn_fredericton_moncton = stjohn_fredericton_moncton.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
stjohn_fredericton_moncton = stjohn_fredericton_moncton.rename({'VALUE': 'HPI_stjohn_fredericton_moncton'}, axis =1 )

#Index resetting
stjohn_fredericton_moncton = stjohn_fredericton_moncton.reset_index()

#Saving df in 'Processed Data' folder
stjohn_fredericton_moncton.to_csv('Processed Data/stjohn_fredericton_moncton.csv')
#Previewing
stjohn_fredericton_moncton.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_stjohn_fredericton_moncton
496,2022-05,"Saint John, Fredericton, and Moncton, New Brun...",House only,"Index, 201612=100",121.1
497,2022-06,"Saint John, Fredericton, and Moncton, New Brun...",House only,"Index, 201612=100",121.1
498,2022-07,"Saint John, Fredericton, and Moncton, New Brun...",House only,"Index, 201612=100",121.1
499,2022-08,"Saint John, Fredericton, and Moncton, New Brun...",House only,"Index, 201612=100",121.1
500,2022-09,"Saint John, Fredericton, and Moncton, New Brun...",House only,"Index, 201612=100",121.1


In [15]:
'Quebec' 

"""
We will not take into consideration the Provinces data in our first attempt.
"""


'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [16]:
#Filtering data of 'Québec, Quebec'

quebec_qc = house_only_df[house_only_df["GEO"].str.contains('Québec, Quebec')]
quebec_qc.set_index("REF_DATE", inplace = True)
quebec_qc = quebec_qc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
quebec_qc = quebec_qc.rename({'VALUE': 'HPI_quebec_qc'}, axis =1 )

#Index resetting
quebec_qc = quebec_qc.reset_index()

#Saving df in 'Processed Data' folder
quebec_qc.to_csv('Processed Data/quebec_qc.csv')
#Previewing
quebec_qc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_quebec_qc
496,2022-05,"Québec, Quebec",House only,"Index, 201612=100",132.6
497,2022-06,"Québec, Quebec",House only,"Index, 201612=100",133.3
498,2022-07,"Québec, Quebec",House only,"Index, 201612=100",133.3
499,2022-08,"Québec, Quebec",House only,"Index, 201612=100",133.3
500,2022-09,"Québec, Quebec",House only,"Index, 201612=100",133.3


In [17]:
#Filtering data of 'Sherbrooke, Quebec'

sherbrooke_qc = house_only_df[house_only_df["GEO"].str.contains('Sherbrooke, Quebec')]
sherbrooke_qc.set_index("REF_DATE", inplace = True)
sherbrooke_qc = sherbrooke_qc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
sherbrooke_qc = sherbrooke_qc.rename({'VALUE': 'HPI_sherbrooke_qc'}, axis =1 )

#Index resetting
sherbrooke_qc = sherbrooke_qc.reset_index()

#Saving df in 'Processed Data' folder
sherbrooke_qc.to_csv('Processed Data/sherbrooke_qc.csv')
#Previewing
sherbrooke_qc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_sherbrooke_qc
496,2022-05,"Sherbrooke, Quebec",House only,"Index, 201612=100",113.1
497,2022-06,"Sherbrooke, Quebec",House only,"Index, 201612=100",113.1
498,2022-07,"Sherbrooke, Quebec",House only,"Index, 201612=100",113.1
499,2022-08,"Sherbrooke, Quebec",House only,"Index, 201612=100",113.1
500,2022-09,"Sherbrooke, Quebec",House only,"Index, 201612=100",113.1


In [18]:
#Filtering data of 'Trois-Rivières, Quebec' 

troisriv_qc = house_only_df[house_only_df["GEO"].str.contains('Trois-Rivières, Quebec')]
troisriv_qc.set_index("REF_DATE", inplace = True)
troisriv_qc = troisriv_qc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
troisriv_qc = troisriv_qc.rename({'VALUE': 'HPI_troisriv_qc'}, axis =1 )

#Index resetting
troisriv_qc = troisriv_qc.reset_index()

#Saving df in 'Processed Data' folder
troisriv_qc.to_csv('Processed Data/troisriv_qc.csv')
#Previewing
troisriv_qc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_troisriv_qc
496,2022-05,"Trois-Rivières, Quebec",House only,"Index, 201612=100",112.6
497,2022-06,"Trois-Rivières, Quebec",House only,"Index, 201612=100",112.6
498,2022-07,"Trois-Rivières, Quebec",House only,"Index, 201612=100",112.6
499,2022-08,"Trois-Rivières, Quebec",House only,"Index, 201612=100",112.6
500,2022-09,"Trois-Rivières, Quebec",House only,"Index, 201612=100",112.6


In [19]:
#Filtering data of 'Montréal, Quebec'

mtl_qc = house_only_df[house_only_df["GEO"].str.contains('Montréal, Quebec')]
mtl_qc.set_index("REF_DATE", inplace = True)
mtl_qc = mtl_qc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
mtl_qc = mtl_qc.rename({'VALUE': 'HPI_mtl_qc'}, axis =1 )

#Index resetting
mtl_qc = mtl_qc.reset_index()

#Saving df in 'Processed Data' folder
mtl_qc.to_csv('Processed Data/mtl_qc.csv')
#Previewing
mtl_qc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_mtl_qc
496,2022-05,"Montréal, Quebec",House only,"Index, 201612=100",161.1
497,2022-06,"Montréal, Quebec",House only,"Index, 201612=100",160.9
498,2022-07,"Montréal, Quebec",House only,"Index, 201612=100",160.6
499,2022-08,"Montréal, Quebec",House only,"Index, 201612=100",160.7
500,2022-09,"Montréal, Quebec",House only,"Index, 201612=100",160.6


In [20]:
#Filtering data of 'Ottawa-Gatineau, Quebec part, Ontario/Quebec'

ottawa_gatineau_qc = house_only_df[house_only_df["GEO"].str.contains('Ottawa-Gatineau, Quebec part, Ontario/Quebec')]
ottawa_gatineau_qc.set_index("REF_DATE", inplace = True)
ottawa_gatineau_qc = ottawa_gatineau_qc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
ottawa_gatineau_qc = ottawa_gatineau_qc.rename({'VALUE': 'HPI_ottawa_gatineau_qc'}, axis =1 )

#Index resetting
ottawa_gatineau_qc = ottawa_gatineau_qc.reset_index()

#Saving df in 'Processed Data' folder
ottawa_gatineau_qc.to_csv('Processed Data/ottawa_gatineau_qc.csv')
#Previewing
ottawa_gatineau_qc.tail()


Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_ottawa_gatineau_qc
496,2022-05,"Ottawa-Gatineau, Quebec part, Ontario/Quebec",House only,"Index, 201612=100",120.5
497,2022-06,"Ottawa-Gatineau, Quebec part, Ontario/Quebec",House only,"Index, 201612=100",120.5
498,2022-07,"Ottawa-Gatineau, Quebec part, Ontario/Quebec",House only,"Index, 201612=100",120.5
499,2022-08,"Ottawa-Gatineau, Quebec part, Ontario/Quebec",House only,"Index, 201612=100",120.5
500,2022-09,"Ottawa-Gatineau, Quebec part, Ontario/Quebec",House only,"Index, 201612=100",120.5


In [21]:
'Ontario'

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [22]:
#Filtering data of 'Ottawa-Gatineau, Ontario part, Ontario/Quebec' 

ottawa_gatineau_ont = house_only_df[house_only_df["GEO"].str.contains('Ottawa-Gatineau, Ontario part, Ontario/Quebec')]
ottawa_gatineau_ont.set_index("REF_DATE", inplace = True)
ottawa_gatineau_ont = ottawa_gatineau_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
ottawa_gatineau_ont = ottawa_gatineau_ont.rename({'VALUE': 'HPI_ottawa_gatineau_ont'}, axis =1 )

#Index resetting
ottawa_gatineau_ont = ottawa_gatineau_ont.reset_index()

#Saving df in 'Processed Data' folder
ottawa_gatineau_ont.to_csv('Processed Data/ottawa_gatineau_ont.csv')
#Previewing
ottawa_gatineau_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_ottawa_gatineau_ont
496,2022-05,"Ottawa-Gatineau, Ontario part, Ontario/Quebec",House only,"Index, 201612=100",187.4
497,2022-06,"Ottawa-Gatineau, Ontario part, Ontario/Quebec",House only,"Index, 201612=100",188.9
498,2022-07,"Ottawa-Gatineau, Ontario part, Ontario/Quebec",House only,"Index, 201612=100",190.2
499,2022-08,"Ottawa-Gatineau, Ontario part, Ontario/Quebec",House only,"Index, 201612=100",189.4
500,2022-09,"Ottawa-Gatineau, Ontario part, Ontario/Quebec",House only,"Index, 201612=100",189.4


In [23]:
#Filtering data of 'Oshawa, Ontario'

oshawa_ont = house_only_df[house_only_df["GEO"].str.contains('Oshawa, Ontario')]
oshawa_ont.set_index("REF_DATE", inplace = True)
oshawa_ont = oshawa_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
oshawa_ont = oshawa_ont.rename({'VALUE': 'HPI_oshawa_ont'}, axis =1 )

#Index resetting
oshawa_ont = oshawa_ont.reset_index()

#Saving df in 'Processed Data' folder
oshawa_ont.to_csv('Processed Data/oshawa_ont.csv')
#Previewing
oshawa_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_oshawa_ont
496,2022-05,"Oshawa, Ontario",House only,"Index, 201612=100",125.3
497,2022-06,"Oshawa, Ontario",House only,"Index, 201612=100",125.3
498,2022-07,"Oshawa, Ontario",House only,"Index, 201612=100",125.3
499,2022-08,"Oshawa, Ontario",House only,"Index, 201612=100",125.3
500,2022-09,"Oshawa, Ontario",House only,"Index, 201612=100",125.3


In [24]:
#Filtering data of 'Toronto, Ontario' 

toronto_ont = house_only_df[house_only_df["GEO"].str.contains('Toronto, Ontario')]
toronto_ont.set_index("REF_DATE", inplace = True)
toronto_ont = toronto_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
toronto_ont = toronto_ont.rename({'VALUE': 'HPI_toronto_ont'}, axis =1 )

#Index resetting
toronto_ont = toronto_ont.reset_index()

#Saving df in 'Processed Data' folder
toronto_ont.to_csv('Processed Data/oshawa_ont.csv')
#Previewing
toronto_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_toronto_ont
496,2022-05,"Toronto, Ontario",House only,"Index, 201612=100",114.6
497,2022-06,"Toronto, Ontario",House only,"Index, 201612=100",114.6
498,2022-07,"Toronto, Ontario",House only,"Index, 201612=100",114.6
499,2022-08,"Toronto, Ontario",House only,"Index, 201612=100",114.6
500,2022-09,"Toronto, Ontario",House only,"Index, 201612=100",114.6


In [25]:
#Filtering data of 'Hamilton, Ontario'

hamilton_ont = house_only_df[house_only_df["GEO"].str.contains('Hamilton, Ontario')]
hamilton_ont.set_index("REF_DATE", inplace = True)
hamilton_ont = hamilton_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
hamilton_ont = hamilton_ont.rename({'VALUE': 'HPI_hamilton_ont'}, axis =1 )

#Index resetting
hamilton_ont = hamilton_ont.reset_index()

#Saving df in 'Processed Data' folder
hamilton_ont.to_csv('Processed Data/hamilton_ont.csv')
#Previewing
hamilton_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_hamilton_ont
496,2022-05,"Hamilton, Ontario",House only,"Index, 201612=100",121.9
497,2022-06,"Hamilton, Ontario",House only,"Index, 201612=100",121.9
498,2022-07,"Hamilton, Ontario",House only,"Index, 201612=100",121.9
499,2022-08,"Hamilton, Ontario",House only,"Index, 201612=100",121.9
500,2022-09,"Hamilton, Ontario",House only,"Index, 201612=100",121.3


In [26]:
#Filtering data of 'St. Catharines-Niagara, Ontario'

stcath_niagara_ont = house_only_df[house_only_df["GEO"].str.contains('St. Catharines-Niagara, Ontario')]
stcath_niagara_ont.set_index("REF_DATE", inplace = True)
stcath_niagara_ont = stcath_niagara_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
stcath_niagara_ont = stcath_niagara_ont.rename({'VALUE': 'HPI_stcath_niagara_ont'}, axis =1 )

#Index resetting
stcath_niagara_ont = stcath_niagara_ont.reset_index()

#Saving df in 'Processed Data' folder
stcath_niagara_ont.to_csv('Processed Data/stcath_niagara_ont.csv')
#Previewing
stcath_niagara_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_stcath_niagara_ont
496,2022-05,"St. Catharines-Niagara, Ontario",House only,"Index, 201612=100",133.0
497,2022-06,"St. Catharines-Niagara, Ontario",House only,"Index, 201612=100",133.0
498,2022-07,"St. Catharines-Niagara, Ontario",House only,"Index, 201612=100",133.0
499,2022-08,"St. Catharines-Niagara, Ontario",House only,"Index, 201612=100",133.0
500,2022-09,"St. Catharines-Niagara, Ontario",House only,"Index, 201612=100",132.7


In [27]:
#Filtering data of 'Kitchener-Cambridge-Waterloo, Ontario'

kitchener_camb_water_ont = house_only_df[house_only_df["GEO"].str.contains('Kitchener-Cambridge-Waterloo, Ontario')]
kitchener_camb_water_ont.set_index("REF_DATE", inplace = True)
kitchener_camb_water_ont = kitchener_camb_water_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
kitchener_camb_water_ont = kitchener_camb_water_ont.rename({'VALUE': 'HPI_kitchener_camb_water_ont'}, axis =1 )

#Index resetting
kitchener_camb_water_ont = kitchener_camb_water_ont.reset_index()

#Saving df in 'Processed Data' folder
kitchener_camb_water_ont.to_csv('Processed Data/kitchener_camb_water_ont.csv')
#Previewing
kitchener_camb_water_ont.tail()


Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_kitchener_camb_water_ont
496,2022-05,"Kitchener-Cambridge-Waterloo, Ontario",House only,"Index, 201612=100",154.3
497,2022-06,"Kitchener-Cambridge-Waterloo, Ontario",House only,"Index, 201612=100",155.2
498,2022-07,"Kitchener-Cambridge-Waterloo, Ontario",House only,"Index, 201612=100",154.8
499,2022-08,"Kitchener-Cambridge-Waterloo, Ontario",House only,"Index, 201612=100",155.4
500,2022-09,"Kitchener-Cambridge-Waterloo, Ontario",House only,"Index, 201612=100",154.7


In [28]:
#Filtering data of 'Guelph, Ontario' 

guelph_ont = house_only_df[house_only_df["GEO"].str.contains('Guelph, Ontario')]
guelph_ont.set_index("REF_DATE", inplace = True)
guelph_ont = guelph_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
guelph_ont = guelph_ont.rename({'VALUE': 'HPI_guelph_ont'}, axis =1 )

#Index resetting
guelph_ont = guelph_ont.reset_index()

#Saving df in 'Processed Data' folder
guelph_ont.to_csv('Processed Data/guelph_ont.csv')
#Previewing
guelph_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_guelph_ont
496,2022-05,"Guelph, Ontario",House only,"Index, 201612=100",134.1
497,2022-06,"Guelph, Ontario",House only,"Index, 201612=100",134.1
498,2022-07,"Guelph, Ontario",House only,"Index, 201612=100",134.1
499,2022-08,"Guelph, Ontario",House only,"Index, 201612=100",134.1
500,2022-09,"Guelph, Ontario",House only,"Index, 201612=100",134.1


In [29]:
#Filtering data of 'London, Ontario'

london_ont = house_only_df[house_only_df["GEO"].str.contains('London, Ontario')]
london_ont.set_index("REF_DATE", inplace = True)
london_ont = london_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
london_ont = london_ont.rename({'VALUE': 'HPI_london_ont'}, axis =1 )

#Index resetting
london_ont = london_ont.reset_index()

#Saving df in 'Processed Data' folder
london_ont.to_csv('Processed Data/london_ont.csv')
#Previewing
london_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_london_ont
496,2022-05,"London, Ontario",House only,"Index, 201612=100",156.1
497,2022-06,"London, Ontario",House only,"Index, 201612=100",156.1
498,2022-07,"London, Ontario",House only,"Index, 201612=100",156.1
499,2022-08,"London, Ontario",House only,"Index, 201612=100",156.1
500,2022-09,"London, Ontario",House only,"Index, 201612=100",156.1


In [30]:
#Filtering data of 'Windsor, Ontario'

windsor_ont = house_only_df[house_only_df["GEO"].str.contains('Windsor, Ontario')]
windsor_ont.set_index("REF_DATE", inplace = True)
windsor_ont = windsor_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
windsor_ont = windsor_ont.rename({'VALUE': 'HPI_windsor_ont'}, axis =1 )

#Index resetting
windsor_ont = windsor_ont.reset_index()

#Saving df in 'Processed Data' folder
windsor_ont.to_csv('Processed Data/windsor_ont.csv')
#Previewing
windsor_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_windsor_ont
496,2022-05,"Windsor, Ontario",House only,"Index, 201612=100",147.4
497,2022-06,"Windsor, Ontario",House only,"Index, 201612=100",149.0
498,2022-07,"Windsor, Ontario",House only,"Index, 201612=100",149.0
499,2022-08,"Windsor, Ontario",House only,"Index, 201612=100",149.0
500,2022-09,"Windsor, Ontario",House only,"Index, 201612=100",148.5


In [31]:
#Filtering data of 'Greater Sudbury, Ontario'

sudbury_ont = house_only_df[house_only_df["GEO"].str.contains('Greater Sudbury, Ontario')]
sudbury_ont.set_index("REF_DATE", inplace = True)
sudbury_ont = sudbury_ont.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
sudbury_ont = sudbury_ont.rename({'VALUE': 'HPI_sudbury_ont'}, axis =1 )

#Index resetting
sudbury_ont = sudbury_ont.reset_index()

#Saving df in 'Processed Data' folder
sudbury_ont.to_csv('Processed Data/sudbury_ont.csv')
#Previewing
sudbury_ont.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_sudbury_ont
496,2022-05,"Greater Sudbury, Ontario",House only,"Index, 201612=100",122.3
497,2022-06,"Greater Sudbury, Ontario",House only,"Index, 201612=100",122.5
498,2022-07,"Greater Sudbury, Ontario",House only,"Index, 201612=100",122.5
499,2022-08,"Greater Sudbury, Ontario",House only,"Index, 201612=100",122.5
500,2022-09,"Greater Sudbury, Ontario",House only,"Index, 201612=100",122.5


In [32]:
#Filtering data of 'Prairie Region'

prairie_region = house_only_df[house_only_df["GEO"].str.contains('Prairie Region')]
prairie_region.set_index("REF_DATE", inplace = True)
prairie_region = prairie_region.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
prairie_region = prairie_region.rename({'VALUE': 'HPI_prairie_region'}, axis =1 )

#Index resetting
prairie_region = prairie_region.reset_index()

#Saving df in 'Processed Data' folder
prairie_region.to_csv('Processed Data/prairie_region.csv')
#Previewing
prairie_region.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_prairie_region
496,2022-05,Prairie Region,House only,"Index, 201612=100",126.1
497,2022-06,Prairie Region,House only,"Index, 201612=100",126.2
498,2022-07,Prairie Region,House only,"Index, 201612=100",125.9
499,2022-08,Prairie Region,House only,"Index, 201612=100",126.5
500,2022-09,Prairie Region,House only,"Index, 201612=100",126.4


In [33]:
'Manitoba'

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [34]:
#Filtering data of 'Winnipeg, Manitoba' 

winnipeg_manitoba = house_only_df[house_only_df["GEO"].str.contains('Winnipeg, Manitoba')]
winnipeg_manitoba.set_index("REF_DATE", inplace = True)
winnipeg_manitoba = winnipeg_manitoba.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
winnipeg_manitoba = winnipeg_manitoba.rename({'VALUE': 'HPI_winnipeg_manitoba'}, axis =1 )

#Index resetting
winnipeg_manitoba = winnipeg_manitoba.reset_index()

#Saving df in 'Processed Data' folder
winnipeg_manitoba.to_csv('Processed Data/winnipeg_manitoba.csv')
#Previewing
winnipeg_manitoba.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_winnipeg_manitoba
496,2022-05,"Winnipeg, Manitoba",House only,"Index, 201612=100",158.6
497,2022-06,"Winnipeg, Manitoba",House only,"Index, 201612=100",158.6
498,2022-07,"Winnipeg, Manitoba",House only,"Index, 201612=100",158.6
499,2022-08,"Winnipeg, Manitoba",House only,"Index, 201612=100",158.6
500,2022-09,"Winnipeg, Manitoba",House only,"Index, 201612=100",158.6


In [35]:
'Saskatchewan'

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [36]:
#Filtering data of 'Regina, Saskatchewan'

regina_sask = house_only_df[house_only_df["GEO"].str.contains('Regina, Saskatchewan')]
regina_sask.set_index("REF_DATE", inplace = True)
regina_sask = regina_sask.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
regina_sask = regina_sask.rename({'VALUE': 'HPI_regina_sask'}, axis =1 )

#Index resetting
regina_sask = regina_sask.reset_index()

#Saving df in 'Processed Data' folder
regina_sask.to_csv('Processed Data/regina_sask.csv')
#Previewing
regina_sask.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_regina_sask
496,2022-05,"Regina, Saskatchewan",House only,"Index, 201612=100",103.0
497,2022-06,"Regina, Saskatchewan",House only,"Index, 201612=100",103.1
498,2022-07,"Regina, Saskatchewan",House only,"Index, 201612=100",103.1
499,2022-08,"Regina, Saskatchewan",House only,"Index, 201612=100",103.1
500,2022-09,"Regina, Saskatchewan",House only,"Index, 201612=100",103.0


In [37]:
#Filtering data of 'Saskatoon, Saskatchewan'

saskatoon_sask = house_only_df[house_only_df["GEO"].str.contains('Saskatoon, Saskatchewan')]
saskatoon_sask.set_index("REF_DATE", inplace = True)
saskatoon_sask = saskatoon_sask.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
saskatoon_sask = saskatoon_sask.rename({'VALUE': 'HPI_saskatoon_sask'}, axis =1 )

#Index resetting
saskatoon_sask = saskatoon_sask.reset_index()

#Saving df in 'Processed Data' folder
saskatoon_sask.to_csv('Processed Data/saskatoon_sask.csv')
#Previewing
saskatoon_sask.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_saskatoon_sask
496,2022-05,"Saskatoon, Saskatchewan",House only,"Index, 201612=100",110.0
497,2022-06,"Saskatoon, Saskatchewan",House only,"Index, 201612=100",110.7
498,2022-07,"Saskatoon, Saskatchewan",House only,"Index, 201612=100",110.5
499,2022-08,"Saskatoon, Saskatchewan",House only,"Index, 201612=100",118.4
500,2022-09,"Saskatoon, Saskatchewan",House only,"Index, 201612=100",118.3


In [38]:
'Alberta' 

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [39]:
#Filtering data of 'Calgary, Alberta'

calgary_alb = house_only_df[house_only_df["GEO"].str.contains('Calgary, Alberta')]
calgary_alb.set_index("REF_DATE", inplace = True)
calgary_alb = calgary_alb.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
calgary_alb = calgary_alb.rename({'VALUE': 'HPI_calgary_alb'}, axis =1 )

#Index resetting
calgary_alb = calgary_alb.reset_index()

#Saving df in 'Processed Data' folder
calgary_alb.to_csv('Processed Data/calgary_alb.csv')
#Previewing
calgary_alb.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_calgary_alb
496,2022-05,"Calgary, Alberta",House only,"Index, 201612=100",133.4
497,2022-06,"Calgary, Alberta",House only,"Index, 201612=100",133.6
498,2022-07,"Calgary, Alberta",House only,"Index, 201612=100",133.2
499,2022-08,"Calgary, Alberta",House only,"Index, 201612=100",133.6
500,2022-09,"Calgary, Alberta",House only,"Index, 201612=100",133.0


In [40]:
#Filtering data of 'Edmonton, Alberta'

edmonton_alb = house_only_df[house_only_df["GEO"].str.contains('Edmonton, Alberta')]
edmonton_alb.set_index("REF_DATE", inplace = True)
edmonton_alb = edmonton_alb.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
edmonton_alb = edmonton_alb.rename({'VALUE': 'HPI_edmonton_alb'}, axis =1 )

#Index resetting
edmonton_alb = edmonton_alb.reset_index()

#Saving df in 'Processed Data' folder
edmonton_alb.to_csv('Processed Data/edmonton_alb.csv')
#Previewing
edmonton_alb.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_edmonton_alb
496,2022-05,"Edmonton, Alberta",House only,"Index, 201612=100",113.5
497,2022-06,"Edmonton, Alberta",House only,"Index, 201612=100",113.6
498,2022-07,"Edmonton, Alberta",House only,"Index, 201612=100",113.2
499,2022-08,"Edmonton, Alberta",House only,"Index, 201612=100",113.4
500,2022-09,"Edmonton, Alberta",House only,"Index, 201612=100",113.5


In [41]:
'British Columbia'

"""
We will not take into consideration the Provinces data in our first attempt.
"""

'\nWe will not take into consideration the Provinces data in our first attempt.\n'

In [42]:
#Filtering data of 'Kelowna, British Columbia'

kelowna_bc = house_only_df[house_only_df["GEO"].str.contains('Kelowna, British Columbia')]
kelowna_bc.set_index("REF_DATE", inplace = True)
kelowna_bc = kelowna_bc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
kelowna_bc = kelowna_bc.rename({'VALUE': 'HPI_kelowna_bc'}, axis =1 )

#Index resetting
kelowna_bc = kelowna_bc.reset_index()

#Saving df in 'Processed Data' folder
kelowna_bc.to_csv('Processed Data/kelowna_bc.csv')
#Previewing
kelowna_bc.tail()


Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_kelowna_bc
496,2022-05,"Kelowna, British Columbia",House only,"Index, 201612=100",128.3
497,2022-06,"Kelowna, British Columbia",House only,"Index, 201612=100",128.3
498,2022-07,"Kelowna, British Columbia",House only,"Index, 201612=100",128.4
499,2022-08,"Kelowna, British Columbia",House only,"Index, 201612=100",128.5
500,2022-09,"Kelowna, British Columbia",House only,"Index, 201612=100",128.5


In [43]:
#Filtering data of 'Vancouver, British Columbia'

vancouver_bc = house_only_df[house_only_df["GEO"].str.contains('Vancouver, British Columbia')]
vancouver_bc.set_index("REF_DATE", inplace = True)
vancouver_bc = vancouver_bc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
vancouver_bc = vancouver_bc.rename({'VALUE': 'HPI_vancouver_bc'}, axis =1 )

#Index resetting
vancouver_bc = vancouver_bc.reset_index()

#Saving df in 'Processed Data' folder
vancouver_bc.to_csv('Processed Data/vancouver_bc.csv')
#Previewing
vancouver_bc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_vancouver_bc
496,2022-05,"Vancouver, British Columbia",House only,"Index, 201612=100",127.2
497,2022-06,"Vancouver, British Columbia",House only,"Index, 201612=100",127.2
498,2022-07,"Vancouver, British Columbia",House only,"Index, 201612=100",127.6
499,2022-08,"Vancouver, British Columbia",House only,"Index, 201612=100",127.6
500,2022-09,"Vancouver, British Columbia",House only,"Index, 201612=100",127.6


In [44]:
#Filtering data of 'Victoria, British Columbia'

victoria_bc = house_only_df[house_only_df["GEO"].str.contains('Victoria, British Columbia')]
victoria_bc.set_index("REF_DATE", inplace = True)
victoria_bc = victoria_bc.drop(columns=['index'], axis = 1)

#Renaming the VALUE column to HPI_region
victoria_bc = victoria_bc.rename({'VALUE': 'HPI_victoria_bc'}, axis =1 )

#Index resetting
victoria_bc = victoria_bc.reset_index()

#Saving df in 'Processed Data' folder
victoria_bc.to_csv('Processed Data/victoria_bc.csv')
#Previewing
victoria_bc.tail()

Unnamed: 0,REF_DATE,GEO,New housing price indexes,UOM,HPI_victoria_bc
496,2022-05,"Victoria, British Columbia",House only,"Index, 201612=100",132.1
497,2022-06,"Victoria, British Columbia",House only,"Index, 201612=100",132.1
498,2022-07,"Victoria, British Columbia",House only,"Index, 201612=100",132.1
499,2022-08,"Victoria, British Columbia",House only,"Index, 201612=100",132.1
500,2022-09,"Victoria, British Columbia",House only,"Index, 201612=100",132.1


All data pertaining to each major Canadian city has been cleaned/processed in unique csv files under folder "Processed Data".
In 2_ we will merge all processed data before runnin affinity propagation analysis.

In [52]:
from functools import reduce

dataframes = [atlantic, nfland_labrador, charlottetown_pei, halifax_ns, stjohn_fredericton_moncton, quebec_qc, sherbrooke_qc, troisriv_qc, mtl_qc,
ottawa_gatineau_qc, ottawa_gatineau_ont, oshawa_ont, toronto_ont,hamilton_ont, guelph_ont, london_ont, windsor_ont, sudbury_ont, prairie_region, 
winnipeg_manitoba, edmonton_alb, calgary_alb, victoria_bc, vancouver_bc, kelowna_bc]

#Merging dataframes on REF_DATE
merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)

#Dropping useless columns
merged_df.drop(columns = ['New housing price indexes_x'])


#Dropping columns to keep only HPI columns of cities
cities_hpi_1 = merged_df.drop(merged_df.filter(regex = 'New').columns, axis = 1)
cities_hpi_2 = cities_hpi_1.drop(merged_df.filter(regex = 'GEO').columns, axis = 1)
cities_hpi_df = cities_hpi_2.drop(merged_df.filter(regex = 'UOM').columns, axis = 1)

#Saving dataframe as csv
cities_hpi_df.to_csv(r"C:\Users\hp\Desktop\Projects Coding\Affinity_Propagation_Canada_Real_Estate_Market\Processed Data\cities_hpi_df.csv")

# index, 201612 = 100

# cities_hpi_realvalues_df = cities_hpi_df.iloc[2:-1] * (201.612)

#Previewing
cities_hpi_df.head()

  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'REF_DATE'), dataframes)
  merged_df = reduce(lambda left, right: pd.merge(left, right, on = 'R

Unnamed: 0,REF_DATE,HPI_atlantic,HPI_nfland_labrador,HPI_charlottetown_pei,HPI_halifax_ns,HPI_stjohn_fredericton_moncton,HPI_quebec_qc,HPI_sherbrooke_qc,HPI_troisriv_qc,HPI_mtl_qc,...,HPI_london_ont,HPI_windsor_ont,HPI_sudbury_ont,HPI_prairie_region,HPI_winnipeg_manitoba,HPI_edmonton_alb,HPI_calgary_alb,HPI_victoria_bc,HPI_vancouver_bc,HPI_kelowna_bc
0,1981-01,,37.5,,,61.4,34.9,,,30.0,...,27.0,64.9,54.6,,29.3,36.5,27.8,206.8,96.1,
1,1981-02,,37.5,,,62.1,35.4,,,30.2,...,27.5,64.9,55.6,,29.7,36.8,28.1,209.1,97.5,
2,1981-03,,37.5,,,62.1,35.4,,,30.5,...,28.2,64.1,55.6,,30.3,36.8,28.6,210.6,97.5,
3,1981-04,,37.5,,,62.1,35.7,,,30.8,...,28.6,63.9,57.0,,30.5,36.9,30.1,210.6,97.7,
4,1981-05,,37.7,,,63.3,36.1,,,31.1,...,28.6,63.9,57.0,,31.1,38.2,30.1,212.4,97.7,
