In [3]:
import openrouteservice
import folium
import pandas as pd
import geopandas as gpd
import pyspark
from pyspark.sql import SparkSession, functions as F
import json
import requests
import regex
from scipy import misc
import glob


For internal features, we took a look at the additional listing features that were either automatically generated or manually inputted. From all these features, we took the top 25 most common ones and tried to see which affected the price the most. 

In [5]:
from IPython.display import Image

In [6]:
rental_df = pd.read_csv("../data/curated/proximity_calc_final.csv", index_col=False)

In [7]:
img = '../plots/feat_import_external.png'
Image(url=img)

We can see that both internal and external features play a part in predicting the price of a rental property. with the most important internal being: the number of bathrooms, the number of bedrooms and whether or not the rental property is furnished. In contrast, external features that matter the most for rental price are: proximity to shopping centre, schools and recreational centres. This is quite suprising as shopping centres and schools seem to be vastly more important than the distance to the CBD, which is somewhat unexpected. Similarly, other additional features such as air-conditioning, dishwashers and heaters were not among the top 10 most important features contributing to the price prediction. It was thought that such features, would be much higher priority.

# Analyzing the affordability and livability of suburbs

In [8]:
ranked_suburbs = pd.read_csv("../data/curated/livability_suburbs_ranked")

We calculated the affordability based off the rent in each suburb divided by the income by postcode. The reason why income by postcode was used instead of by suburb was because there was no available datasets found and it was thought that assuming the income in the suburbs within a postcode should be relatively close to the income of the postcode

In [24]:
ranked_suburbs.sort_values('affordability').head(5)

Unnamed: 0.1,Unnamed: 0,SAL_NAME21,POSTCODE,cost_text,no_prop_scraped,train_prox_2km,school_prox_2_min,recre_prox_2_min,shopping_prox_2_min,hosp_closest,median income,affordability,train_score,school_score,recre_score,hosp_score,afford_score,Total_Score
102,413,Macleod (Vic.),3085,225.0,41,0.0,0.0,0.0,7.0,5.25,54252,0.21566,1,1,1,5,5,13
63,144,Caulfield East,3145,305.0,16,1.0,0.0,4.0,7.0,3.16,56871,0.278877,1,1,2,5,5,14
61,267,Flora Hill,3550,260.0,14,1.0,1.0,1.0,7.0,3.935,45529,0.296954,1,2,1,5,5,14
81,606,South Kingsville,3015,385.0,9,0.0,0.0,2.0,7.0,6.38,67405,0.297011,1,1,1,5,5,13
93,486,Myrtleford,3737,250.0,17,0.0,1.0,5.0,7.0,31.24,41646,0.312155,1,2,2,4,4,13


We can see that these are the top 5 most affordable suburbs based off our scraped data from domain. A limitation to these is that as seen in the number of properties scraped for the suburbs, there are not that many samples of properties for each suburb. This could lead to inaccuracies and incorrect rankings, however, the alternative datasets available for rent by suburb had poor data quality and mismatching suburb names when compared to the shapefiles provided by the government and also had significantly less suburbs, hence why we still continued with the scraped data.

In [25]:
ranked_suburbs.sort_values('Total_Score',ascending = False).head(10)

Unnamed: 0.1,Unnamed: 0,SAL_NAME21,POSTCODE,cost_text,no_prop_scraped,train_prox_2km,school_prox_2_min,recre_prox_2_min,shopping_prox_2_min,hosp_closest,median income,affordability,train_score,school_score,recre_score,hosp_score,afford_score,Total_Score
0,311,Hawthorn (Vic.),3122,450.0,143,3.0,4.0,7.0,7.0,4.25,61024,0.383456,3,5,3,5,3,19
1,312,Hawthorn East,3123,490.0,91,3.0,3.0,9.0,7.0,5.42,61518,0.414188,3,4,4,5,3,19
2,286,Glen Iris (Vic.),3123,430.0,81,2.0,4.0,6.0,7.0,4.83,61518,0.363471,2,5,3,5,3,18
3,264,Fitzroy North,3068,550.0,46,3.0,2.0,8.0,7.0,3.355,63406,0.451061,3,3,4,5,3,18
4,609,South Yarra,3141,530.0,211,3.0,2.0,8.0,7.0,3.34,65707,0.419438,3,3,4,5,3,18
9,36,Balaclava (Vic.),3183,537.5,38,3.0,3.5,3.0,7.0,3.295,56561,0.494157,3,4,2,5,3,17
11,618,St Kilda East,3183,490.0,82,2.0,4.5,5.0,7.0,3.185,56561,0.450487,2,5,2,5,3,17
10,688,Watsonia,3088,475.0,9,1.0,2.0,11.0,7.0,7.63,54240,0.455383,1,3,5,5,3,17
8,41,Ballarat North,3350,300.0,6,0.0,1.5,12.5,7.0,17.295,47466,0.328656,1,2,5,5,4,17
7,663,Travancore,3032,405.0,6,3.0,3.0,3.0,7.0,2.295,54220,0.388418,3,4,2,5,3,17


These were the top most livable suburbs based off binning the median proximities and affordability of rental properties across all suburbs. After scoring each metric out of 5, we summed up the metrics to get the total livability score (out of 25)

# Finding the highest growth suburbs

In [15]:
'Moorabbin Airport' 'Essendon Airport' 'Clyde North - South'
'Clyde North - North' 'Wollert' 'Craigieburn - North West'
'Cobblebank - Strathtulloh' 'Tarneit (West) - Mount Cottrell'
'Tarneit - North' 'Port Melbourne Industrial'

'Tarneit - NorthPort Melbourne Industrial'

# Recommendations

Looking at both the most livable suburbs and the suburbs with the highest growth in the future, there are no suburbs which are currently ranked highly and predicted to have the highest growth. Therefore we would reccommend that clients look at rental properties available in the currently most livable suburbs such as Hawthorn  and Glen Iris. To further bolster this, these suburbs have a reasonable amount of samples so there is a level of confidence behind this decision 