# Canadian Real Estate API - Get Statistics
## Overview
| Detail Tag            | Information                                                                                        |
|-----------------------|----------------------------------------------------------------------------------------------------|
| Originally Created By | Ariel Herrera, arielherrera@analyticsariel.com                                                      |
| External References   | <a href="https://rapidapi.com/apidojo/api/realtor-canadian-real-estate" target="_blank">Realtor API</a>|
| Input Datasets        | List for Sale |
| Output Datasets       | Table    |
| Input Data Source     | API |
| Output Data Source    | Pandas Dataframe |

## History
| Date         | Developed By  | Reason                                                |
|--------------|---------------|-------------------------------------------------------|
| 1st September 2020 | Ariel Herrera | Notebook created to get canadian demographic data. |
| 20th October 2020 | Ariel Herrera | Created prototype dashboard. |

## Other Details
This Notebook is a prototype.

## Widgets

In [0]:
# remove widgets
dbutils.widgets.removeAll()

In [0]:
dbutils.widgets.text("selectedCity", "Montreal", "01) City")

In [0]:
selectedCity = dbutils.widgets.get("selectedCity")

In [0]:
displayHTML("""Get GEO statistic information of surrounded area for <b><font color="blue">""" + selectedCity + "</font></b>")

## Imports

In [0]:
from datetime import datetime
import pandas as pd
import requests
import json
import plotly.express as px

## Functions

In [0]:
def save_pandas_df(df, file_name, file_path='dbfs:/FileStore/tables/'):
  # Create a Spark DataFrame from a pandas DataFrame using Arrow
  df_spark = spark.createDataFrame(df)
  df_spark.write.format("com.databricks.spark.csv").mode('overwrite').option("header", "true").save(file_path + file_name)
  print('Saved file!')

In [0]:
def get_lat_lon(location, google_api_key):
  """
  Get latitude and longitude for a city.

  Parameters
  ----------
  @location [string]: Locatin entered by user from widget
  @google_api_key [string]: Api Key

  Returns
  -------
  [string]: Latitude and longitude

  """
  # api-endpoint 
  URL = "https://maps.googleapis.com/maps/api/geocode/json"

  # defining a params dict for the parameters to be sent to the API 
  PARAMS = {'address':location, 'key':google_api_key} 

  # sending get request and saving the response as response object 
  r = requests.get(url = URL, params = PARAMS) 
  
  # get data from response
  data = r.json()
  
  # location
  location_dict = data['results'][0]['geometry']['location']
  lattitude = location_dict['lat']
  longitude = location_dict['lng']
  
  return lattitude, longitude

In [0]:
def get_api_key(api_key_id = "Realtor"):
  """
  Get the api key for website accessing.

  Table of key type and key value for privacy.

  Parameters
  ----------
  @api_key_id [string]: Key value in dataframe

  Returns
  -------
  [string]: API Key

  """
  # load api keys file
  df_api_keys = spark.read.format('csv').options(header='true', inferSchema='true').load('/FileStore/tables/api_keys.csv').toPandas()
  
  # return api key if in dataset
  try:
    # get api key from id
    api_key = df_api_keys.loc[df_api_keys['Id'] == api_key_id]['Key'].iloc[0] # get key by id
    # return api key
    return api_key
  except IndexError:
    # get api key id list
    api_key_id_list = df_api_keys['Id'].unique().tolist()
    # print error message
    print('Cannot map key. Api key id must be one of the following options {0}'.format(api_key_id_list))

In [0]:
def api_get_city_statistics(api_key, lat, lon):
  """
  Get city statistics from canadian realtor api.

  Parameters
  ----------
  @api_key_id [string]: Key API realtor id
  @lat [float]: Lattitude
  @lon [float]: Longitude

  Returns
  -------
  [json]: Dictionary of city statistics

  """
  # url for api
  url = "https://realtor-canadian-real-estate.p.rapidapi.com/properties/get-statistics"

  # enter parameters
  querystring = {
    "CultureId":"1", # return in english
    "Latitude": str(lat),
    "Longitude": str(lon)
  }

  # header
  headers = {
    'x-rapidapi-host': "realtor-canadian-real-estate.p.rapidapi.com",
    'x-rapidapi-key': api_key
  }

  # response
  response = requests.request("GET", url, headers=headers, params=querystring)
  return response.json() # json format

## Local Constants

In [0]:
df_api_keys = spark.read.format('csv').options(header='true', inferSchema='true').load('/FileStore/tables/api_keys.csv').toPandas()

In [0]:
display(df_api_keys)

Id,Key
Google,AIzaSyCLyzv_J7GLkfuDLrmbU_hZtDtxX7phM24
Plotly,pk.eyJ1IjoiaGVycmVyYS11cy1yZWkiLCJhIjoiY2pyejJtd3kxMTRvcDQ0bHZiM3BzdWVkMCJ9.bo7U4hgixCPZwdIyPeRqiA
Realtor,606b269419msh0ecb93ce6f56330p117eebjsn2d8e244b9945


In [0]:
google_api_key = get_api_key(api_key_id = "Google")
realtor_api_key = get_api_key(api_key_id = "Realtor")

## Data Exploration

In [0]:
# get latitude and longitude based on city entered
latitude, longitude = get_lat_lon(selectedCity, google_api_key)

# get city statistics
stats_response = api_get_city_statistics(api_key=realtor_api_key, lat=latitude, lon=longitude)

In [0]:
# view stats response
stats_response

In [0]:
# get file name
stats_date_str = stats_response['ErrorCode']['ProductName'].split("[")[-1].replace("]", "")
stats_date = datetime.strptime(stats_date_str, '%A, %B %d, %Y %I:%M:%S %p')
stats_data_concat_str = stats_date.strftime('%Y-%m-%d')
stats_file_name = selectedCity + "_stats-data_" + stats_data_concat_str

# save file path
file_dir = '/dbfs/FileStore/tables/canadian_stats/'
stats_file_path = file_dir + stats_file_name

print(stats_file_path)

In [0]:
# view all contents in the table
i = 1
for data_table in stats_response['Data']:
  print("Table {0}:".format(i), data_table['key'])
  i += 1

## Data Exploration

In [0]:
# notes:
# assumptions - (1) amount is in canadian dollar, (2) daytime population is an abreviation by 1K, (3) population is an abreviation by 1K
# daytime population = commuter adjusted
df_table_1 = pd.DataFrame(stats_response['Data'][0]['value'])
df_table_1

Unnamed: 0,key,value
0,Daytime Population,143546
1,Number of Businesses,3218
2,Population size,2741
3,Median age,32.1
4,Average Household Size,1.6
5,Average Household Income,"$88,926.81"
6,Households with Children (%),38
7,Households without Children (%),62
8,Number of Households,1715


In [0]:
# notes:
# https://tradingeconomics.com/canada/retail-sales
# questions - (1) what industry are the retail sales? (2) what do the values mean?
df_table_2 = pd.DataFrame(stats_response['Data'][1]['value'])
df_table_2

Unnamed: 0,key,value
0,Unknown,570
1,< 1,1475
2,1 - 4.9,797
3,5 - 19.9,234
4,20 - 99.9,116
5,100+,26


In [0]:
# notes:
# questions - (1) total does not equal population in table 1, why?
df_table_3 = pd.DataFrame(stats_response['Data'][2]['value'])
df_table_3['value'] = df_table_3.apply(lambda x: int(x['value']), axis = 1)
print('sum of total population:', df_table_3['value'].sum())
print('population from table 1:', df_table_1.loc[df_table_1['key'] == "Population size"]['value'].iloc[0])
df_table_3

Unnamed: 0,key,value
0,0 - 4 years old,90
1,5 - 9 years old,23
2,10 - 19 years old,185
3,20 - 34 years old,1330
4,35 - 49 years old,623
5,50 - 54 years old,149
6,55 - 64 years old,243
7,65 - 69 years old,98
8,70 - 79 years old,69
9,80 - 84 years old,13


In [0]:
# notes:
# good for forecasting
df_table_4 = pd.DataFrame(stats_response['Data'][3]['value'])
df_table_4

Unnamed: 0,key,value
0,2013,2248
1,2018,2741
2,2021,2899
3,2023,3000
4,2028,3256


In [0]:
# notes:
df_table_5 = pd.DataFrame(stats_response['Data'][4]['value'])
df_table_5

Unnamed: 0,key,value
0,No cert. / Diploma / Degree,63
1,High school,490
2,Apprenticeship / Trade cert. / Diploma,48
3,Non-university cert. / Diploma,250
4,University cert. / Diploma below bachelor,118
5,University degree,1642


In [0]:
# notes:
df_table_6 = pd.DataFrame(stats_response['Data'][5]['value'])
df_table_6

Unnamed: 0,key,value
0,Married,768
1,Common law,309
2,Single,1357
3,Separated,42
4,Divorced,102
5,Widowed,33


In [0]:
# notes:
# questions - (1) what could this be used for? trends over year? focus on marketing?
df_table_7 = pd.DataFrame(stats_response['Data'][6]['value'])
df_table_7['value'] = df_table_7.apply(lambda x: int(x['value']), axis = 1)
df_table_7 = df_table_7.sort_values(by=['value'], ascending=False)
print('Length of table:', len(df_table_7))
df_table_7.head(10)

Unnamed: 0,key,value
1,French,1086
0,English,580
6,Arabic,258
5,Spanish,109
15,Persian,99
11,Chinese n.o.s,97
28,Other Languages,56
27,Turkish,56
2,Italian,45
4,Cantonese,38


In [0]:
# notes:
# assumptions - (1) in canadian dollars
df_table_8 = pd.DataFrame(stats_response['Data'][7]['value'])
df_table_8

Unnamed: 0,key,value
0,"$0 - $29,999",568
1,"$30,000 - $59,999",407
2,"$60,000 - $79,999",182
3,"$80,000 - $99,999",127
4,"$100,000 - $149,999",246
5,"$150,000 - $199,999",93
6,"$200,000+",92


In [0]:
# notes:
# if 25+ starts to decline we could see more renters or home buyers in market
# questions - (1) not in prct, what is the base value?
df_table_9 = pd.DataFrame(stats_response['Data'][8]['value'])
df_table_9

Unnamed: 0,key,value
0,0 - 4 years old,88
1,5 - 9 years old,23
2,10 - 14 years old,16
3,15 - 19 years old,46
4,20 - 24 years old,57
5,25+ years old,60


In [0]:
# notes:
df_table_10 = pd.DataFrame(stats_response['Data'][9]['value'])
df_table_10

Unnamed: 0,key,value
0,Own,733
1,Rent,982


In [0]:
# notes:
df_table_11 = pd.DataFrame(stats_response['Data'][10]['value'])
df_table_11

Unnamed: 0,key,value
0,Before 1960,343
1,1961 - 1980,83
2,1981 - 1990,21
3,1991 - 2000,34
4,2001 - 2005,129
5,2006 - 2010,479
6,2011 - 2016,526
7,After 2016,100


In [0]:
# notes:
# diversity in occupations
df_table_12 = pd.DataFrame(stats_response['Data'][11]['value'])
df_table_12['value'] = df_table_12.apply(lambda x: int(x['value']), axis = 1)
df_table_12 = df_table_12.sort_values(by=['value'], ascending=False)
df_table_12

Unnamed: 0,key,value
2,"Business, Finance, Admin",196
1,Management,130
7,Sales and service,114
5,"Social Sciences, Education, Government, Religion",91
3,Sciences,86
4,Health,31
6,"Art, Culture, Recreation, Sport",30
0,Not Applicable,25


## Visualization

In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_1 = pd.DataFrame(stats_response['Data'][0]['value'])
plot_1 = plot_1.iloc[3:]
plot_1 = plot_1.rename(columns={"key": "City Stat", "value": "Value"})
display(plot_1)

City Stat,Value
Median age,32.1
Average Household Size,1.6
Average Household Income,"$88,926.81"
Households with Children (%),38
Households without Children (%),62
Number of Households,1715


In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_2 = pd.DataFrame(stats_response['Data'][2]['value'])
plot_2['key'] = plot_2.apply(lambda x: x['key'].split(" "), axis = 1)
plot_2['key'] = plot_2.apply(lambda x: x['key'][0] + x['key'][1] + x['key'][2], axis = 1)
plot_2['key'] = plot_2.apply(lambda x: "85+" if (x['key'] == "85+yearsold") else x['key'], axis = 1)
plot_2['value'] = plot_2.apply(lambda x: int(x['value']), axis = 1)
plot_2 = plot_2.rename(columns={"key": "Years Old", "value": "Value"})
fig = px.bar(plot_2, x='Years Old', y='Value')
fig.update_xaxes(title="")
fig

In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_3 = pd.DataFrame(stats_response['Data'][3]['value'])
plot_3['value'] = plot_3.apply(lambda x: int(x['value']), axis = 1)
plot_3 = plot_3.rename(columns={"key": "Year", "value": "Value"})
fig = px.line(plot_3, x="Year", y="Value")
fig.update_xaxes(title="", tickmode='linear')
fig

In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_4 = pd.DataFrame(stats_response['Data'][4]['value'])
plot_4['value'] = plot_4.apply(lambda x: int(x['value']), axis = 1)
plot_4 = plot_4.rename(columns={"key": "Education Level", "value": "Value"})
display(spark.createDataFrame(plot_4))
# px.pie(plot_4, values='Value', names='Education Level')

Education Level,Value
No cert. / Diploma / Degree,63
High school,490
Apprenticeship / Trade cert. / Diploma,48
Non-university cert. / Diploma,250
University cert. / Diploma below bachelor,118
University degree,1642


In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_5 = pd.DataFrame(stats_response['Data'][9]['value'])
plot_5['value'] = plot_5.apply(lambda x: int(x['value']), axis = 1)
plot_5 = plot_5.rename(columns={"key": "Status", "value": "Value"})
display(spark.createDataFrame(plot_5))
# px.pie(plot_5, values='Value', names='Status')

Status,Value
Own,733
Rent,982


In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_6 = pd.DataFrame(stats_response['Data'][10]['value'])
plot_6['value'] = plot_6.apply(lambda x: int(x['value']), axis = 1)
home_built_sum = plot_6['value'].sum()
plot_6['Prct'] = plot_6.apply(lambda x: str(int(round((x['value'] / home_built_sum) * 100))) + "%", axis = 1)
plot_6 = plot_6.rename(columns={"key": "Time Frame", "value": "Value"})
display(spark.createDataFrame(plot_6))

Time Frame,Value,Prct
Before 1960,343,20%
1961 - 1980,83,5%
1981 - 1990,21,1%
1991 - 2000,34,2%
2001 - 2005,129,8%
2006 - 2010,479,28%
2011 - 2016,526,31%
After 2016,100,6%


In [0]:
selectedCity = dbutils.widgets.get("selectedCity") # get widget for auto update
# generate plot
plot_7 = pd.DataFrame(stats_response['Data'][7]['value'])
plot_7['value'] = plot_7.apply(lambda x: int(x['value']), axis = 1)
plot_7 = plot_7.rename(columns={"key": "Income Bracket", "value": "Value"})
fig = px.bar(plot_7, x='Income Bracket', y='Value')
fig.update_xaxes(title="")
fig

## Write Output

# End Notebook