In [4]:
import pandas as pd
import glob
from IPython.display import Image
rental_df = pd.read_csv("../data/curated/proximity_calc_final.csv", index_col=False)
ranked_suburbs = pd.read_csv("../data/curated/livability_suburbs_ranked")

# Approach

Main Data collection from:
- Scraping domain website 
- Data ordered from Australian Bureau of Statistics (ABS) as well as Victorian Government (vicgov)

One of our key goals was to determine what factors / features affect the rental price of properties throughout the state.
We decided to split these features into internal and external factors as we believed it would be benefitial to see how both features within a house such as kitchen appliances and fetures surrounding the house such as school proximity, would affect the price.

### Internal Features

For internal features, in addition to the basic rental information such as: Number of bedrooms, bathrooms and parking, we took a look at the additional listing features which were scraped from the website "domain.com". 
- These features ranged from thigns such as walk-in wardrobes to airconditioners to balconies and so on.
- From these features, we took the top 25 most common ones and tried to see which affected the price the most. This is due to the gradual decrease in less frequent features, more would be done if we found that the top 25 was found to be very significant.

Here is the list of these top 25 most common additional features in descending order of number of appearances

- built in wardrobes         
- heating                     
- secure parking              
- dishwasher                 
- internal laundry            
- air conditioning            
- gas                         
- balcony  deck               
- no extra features listed    
- bath                        
- floorboards                 
- intercom                    
- fully fenced                
- study                       
- shed                        
- close to shops               
- ensuite                      
- close to transport           
- close to schools             
- ducted heating             
- furnished                   
- split system heating         
- remote garage               
- pets allowed                 
- garden  courtyard            

### External Features

As for external features, using the location data collected from ABS, vicgov and other third party sources like QGIS, we calculated the number of amenities within various proximities (2, 5, 10, 15) km to have a better understanding of which proximity would be more important. We calculated these counts for amenities we researched and also beleived were important when considering a rental property's value: Schools, recreational facilities, shopping centres, hospitals and train stations. As well as the distance to the CBD as many people travel there for work

After calculations, we fitted a basic XGBOOST model, which tried to predict the price of a rental by combining the estimations of a set of weaker models (Amazon AWS). This allowed us to find the importance of features to the rental price.

In [7]:
img = '../plots/feat_import_external.png'
Image(url=img)

- We can see that both internal and external features play a part in predicting the price of a rental property. 
- The most important internal being: the number of bathrooms, the number of bedrooms and whether or not the rental property is furnished. 
    -  Other additional features such as air-conditioning, dishwashers and heaters were not among the top 10 most important features contributing to the price prediction. It was thought that such features, would be much higher priority, especially for Victoria where it can be quite extreme in both hot and cold.
- External features that matter the most for rental price are: number of shopping centre, schools and recreational centres within various proximity 
    - This is quite suprising as shopping centres and schools seem to be vastly more important than the distance to the CBD, which is unexpected due to the originally hypothesised importance for travelling distance for working purposes.

### Choice of Granularity

We chose to use suburbs within postcodes over SA2 regions as they are more widely known and understood compared to SA2 which is a complex grouping based on multiple criteras. Suburbs are also significantly more likely to be listed on any rental property website.

### Analyzing the affordability and livability of suburbs

Another component which we believed plays a huge role in determining whether a person wants to rent a property is how affordable it is.
In order to quantify this we decided to follow a metric mentioned by the (ABS) of dividing rent by income. The threshold that they used for determining of a property was "affordable" was 30%.

It is noted that this metric has its limitations for higher income households as they are more capable of spending a larger percentage of their income on rent and still not be in financial stress


In [12]:
ranked_suburbs.sort_values('affordability').head(5)

Unnamed: 0.1,Unnamed: 0,SAL_NAME21,POSTCODE,cost_text,no_prop_scraped,train_prox_2km,school_prox_2_min,recre_prox_2_min,shopping_prox_2_min,hosp_closest,median income,affordability,train_score,school_score,recre_score,hosp_score,afford_score,Total_Score
102,413,Macleod (Vic.),3085,225.0,41,0.0,0.0,0.0,7.0,5.25,54252,0.21566,1,1,1,5,5,13
63,144,Caulfield East,3145,305.0,16,1.0,0.0,4.0,7.0,3.16,56871,0.278877,1,1,2,5,5,14
61,267,Flora Hill,3550,260.0,14,1.0,1.0,1.0,7.0,3.935,45529,0.296954,1,2,1,5,5,14
81,606,South Kingsville,3015,385.0,9,0.0,0.0,2.0,7.0,6.38,67405,0.297011,1,1,1,5,5,13
93,486,Myrtleford,3737,250.0,17,0.0,1.0,5.0,7.0,31.24,41646,0.312155,1,2,2,4,4,13


In media, we often hear about how Melbourne is the most liveable city, well, we wanted to take a look at how liveable the suburbs were, as this likely also pays a significant part in how people determine where to rent.

Our approach for this was to re-use those number of amenities within a proximity calculations and get the median for their suburbs. These amenities were decided after researching and finding that the main factors that contribute to liveability are: stability, healthscare, culture and environment, education and infrastructure (Economist Intelligist Unit). We believed that the amenities we have included reasonably cover these factor.

- After calculating median proximities and affordability we put them into a number of 1-5 based off how their number range compared to the rest of the suburbs in the state. 
- 1 being lowest and 5 being highest. 
- Then totalling this to a livability score out of 25:

In [25]:
ranked_suburbs.sort_values('Total_Score',ascending = False).head(10)

Unnamed: 0.1,Unnamed: 0,SAL_NAME21,POSTCODE,cost_text,no_prop_scraped,train_prox_2km,school_prox_2_min,recre_prox_2_min,shopping_prox_2_min,hosp_closest,median income,affordability,train_score,school_score,recre_score,hosp_score,afford_score,Total_Score
0,311,Hawthorn (Vic.),3122,450.0,143,3.0,4.0,7.0,7.0,4.25,61024,0.383456,3,5,3,5,3,19
1,312,Hawthorn East,3123,490.0,91,3.0,3.0,9.0,7.0,5.42,61518,0.414188,3,4,4,5,3,19
2,286,Glen Iris (Vic.),3123,430.0,81,2.0,4.0,6.0,7.0,4.83,61518,0.363471,2,5,3,5,3,18
3,264,Fitzroy North,3068,550.0,46,3.0,2.0,8.0,7.0,3.355,63406,0.451061,3,3,4,5,3,18
4,609,South Yarra,3141,530.0,211,3.0,2.0,8.0,7.0,3.34,65707,0.419438,3,3,4,5,3,18
9,36,Balaclava (Vic.),3183,537.5,38,3.0,3.5,3.0,7.0,3.295,56561,0.494157,3,4,2,5,3,17
11,618,St Kilda East,3183,490.0,82,2.0,4.5,5.0,7.0,3.185,56561,0.450487,2,5,2,5,3,17
10,688,Watsonia,3088,475.0,9,1.0,2.0,11.0,7.0,7.63,54240,0.455383,1,3,5,5,3,17
8,41,Ballarat North,3350,300.0,6,0.0,1.5,12.5,7.0,17.295,47466,0.328656,1,2,5,5,4,17
7,663,Travancore,3032,405.0,6,3.0,3.0,3.0,7.0,2.295,54220,0.388418,3,4,2,5,3,17


- A limitation to both livability and affordability is that as seen in the number of properties scraped for the suburbs (no_prop_scraped), there are not that many samples of properties for each suburb. 
- This could lead to inaccuracies and incorrect rankings, however, the alternative datasets available for rent by suburb had poor data quality and mismatching suburb names when compared to the shapefiles provided by the government and also had significantly less suburbs, hence why we still continued with the scraped data.
- In the end top livability ranking suburb still had minimum around 50 samples 

# Finding the highest growth suburbs

We also thought population would predict the highest growth suburbs and received these results by calculating the population increase ratio from 2010 to 2021

- Moorabbin Airport
- 'Essendon Airport'
- 'Clyde North - South'
- 'Clyde North 
- North' 'Wollert' 
- 'Craigieburn - North West'
- 'Cobblebank - Strathtulloh' 
- 'Tarneit (West) - Mount Cottrell'
- 'Tarneit - North' 
- 'Port Melbourne Industrial'

### Assumptions, Limitations and Difficulties

- Hard time finding income by suburbs so had to assume that income within postcodes was largley the same
- Although on-premise ORS give unlimited calculations per day, we were still limited by the processing power of our computers
- Datasets having the incorrect geolocations was quite common and even though we excluded ones we could find, there were possibly more, this affects proximity and hence other analysis
- Assumed that majority of the data outside of these errors was trustable.