In [2]:
# Necessary modules
import pandas as pd
from IPython.display import Image

### 0. Data Links
1. Postcodes Victoria, sourced from Postcodes Australia (https://postcodes-australia.com/state-postcodes/vic)
2. Victoria Suburbs Shapefile, sourced from Data.gov.au (https://data.gov.au/data/dataset/af33dd8c-0534-4e18-9245-fc64440f742e/resource/0a9f2827-49e9-4390-9ba4-89329736b16b/download/gda2020.zip)
3. Income and Population by SA2 Districts, sourced from Australian Bureau of Statistics ABS (https://www.abs.gov.au/methodologies/data-region-methodology/2011-23#data-downloads)


# Data and Pre-processing

### **Property dataset**

Collection:
* Extracted from Domain website using BeautifulSoup, generating URLs based on suburb names and postcodes. For each suburb, a maximum of 1000 pages of properties were scraped according to the postcodes. This totalled to 14459 links scraped, with 20 features such as price, address, nearby schools, and other property features. Note that this dataset already included nearby schools, so we did not have to compute school locations.

Pre-processing:
* Removed entries with missing prices
* Removed NBN details and Property Features columns as they were often empty on the website, thus we focus on relevant frequent features.
* Removed unit numbers from address to ensure accurate location and distance computation.
* Formatted price to floats to ensure numerical analysis
* Calculate coordinates of each property using Geocoder. 
* Removed entries with missing coordinates and located more than 600 km from Melbourne CBD. This eliminate properties belonging to NSW state.
* Calculate euclidean distance of each property to the nearest school. We assumed Euclidean distance is proportional to driving distance, differing by a consistent factor, and serves as an accurate approximation.

### **Supermarket dataset**

Collection:
* Query supermarket data from Overpass API within Victoria. This totalled to 967 supermarkets with features such as supermarket name, latitude, longitude, address, and more features.

Pre-processing:
* Calculate euclidean distance of each property to the nearest supermarket. This follows the same assumption of Euclidean and Driving distance above.

### **Public Transport Victoria (PTV) dataset**

Collection:
* Retrieved shapefiles from PTV Data VIC for transportation stations: train (metro and regional), bus (metro and regional), and trams.

Pre-processing:
* Calculate euclidean distance (in meters) to the nearest train, bus, and tram station from each property. This follows the same assumption of Euclidean and Driving distance above.

**Thus, the final properties with distances to the closest PTV stations, supermarkets, and schools are shown below:**


In [7]:
property_df = pd.read_parquet("../data/raw/property_details_w_distances.parquet")
property_df.head()

Unnamed: 0,title,description,street_address,suburb,postcode,price,bedrooms,bathrooms,parking,primary_property_type,...,floor_plans_count,virtual_tour,nearby_schools,latitude,longitude,distance_to_closest_train,distance_to_closest_bus,distance_to_closest_tram,distance_to_melbourne_cbd,distance_to_closest_supermarket
0,"60 Little Windrock Lane, Craigieburn VIC 3064 ...","View this 2 bedroom, 1 bathroom rental house a...","60 Little Windrock Lane, Craigieburn VIC 3064",Craigieburn,3064,450.0,2.0,1.0,1.0,House,...,0.0,False,589.584078,-37.588897,144.915516,3587.757681,241.893482,17783.424475,31577.387322,491.097565
1,"53 Were Street, Brighton VIC 3186 - House For ...","View this $1,500/week 4 bedroom, 2 bathroom re...","53 Were Street, Brighton VIC 3186",Brighton,3186,1490.0,4.0,2.0,2.0,House,...,2.0,True,279.20479,-37.92564,144.999904,1201.725436,263.466321,2485.958722,16794.359101,1910.765178
2,"43 Tackle Drive, Point Cook VIC 3030 - Townhou...","View this 3 bedroom, 2 bathroom rental townhou...","43 Tackle Drive, Point Cook VIC 3030",Point Cook,3030,550.0,3.0,2.0,2.0,Townhouse/Villa,...,0.0,True,845.354703,-37.906257,144.720254,3890.225343,713.624953,24933.663604,30206.465031,1381.56761
3,"3 Rostrevor Parade, Mont Albert VIC 3127 - Hou...","View this 5 bedroom, 2 bathroom rental house a...","3 Rostrevor Parade, Mont Albert VIC 3127",Mont Albert,3127,800.0,5.0,2.0,2.0,House,...,0.0,False,286.075947,-37.812918,145.10611,917.64228,424.603215,481.283363,15957.735572,1069.53936
4,"48 Roberts Street, Frankston VIC 3199 - Studio...","View this 9 bedroom, 3 bathroom rental studio ...","48 Roberts Street, Frankston VIC 3199",Frankston,3199,299.0,9.0,3.0,4.0,Apartment,...,1.0,False,941.545172,-38.154913,145.140409,420.672031,555.155741,36692.954646,52546.454397,612.948151


### **SA2 District Boundaries (ABS)**

We obtained the 2021 dataset for granularity Statistical Area Level 2 (SA2) from the Australian Bureau of Statistics (ABS). These are the following pre-processing steps: These are the following pre-processing steps:
* Filtered the SA2 data to contain Victoria 
* Converted relevant data types to features, such as strings for SA2 areas and integers for population.

In [10]:
sa2Areas = pd.read_csv('../data/curated/SA2_Victoria.csv')
sa2Areas

Unnamed: 0,SA2_CODE_2021,SA2_NAME_2021,STATE_NAME_2021
0,201011001,Alfredton,Victoria
1,201011002,Ballarat,Victoria
2,201011005,Buninyong,Victoria
3,201011006,Delacombe,Victoria
4,201011007,Smythes Creek,Victoria
...,...,...,...
519,217041478,Moyne - West,Victoria
520,217041479,Warrnambool - North,Victoria
521,217041480,Warrnambool - South,Victoria
522,297979799,Migratory - Offshore - Shipping (Vic.),Victoria


### **Population and Income by SA2**

The Population dataset is from 2018 to 2023, where features selected are suburb, year, and estimated resident population. The Income dataset is from 2016 to 2020 with selected features as suburb, year, and median employee income. We forecasted the population and income by SA2 districts from 2024 to 2026. 

Both *income and population* follows the *same* pre-processing steps:
* Removed entries not within the Victoria SA2 codes 
* Removed missing features for population and remove missing rows for income
* Removed duplicate rows based on 'Label' and 'Year'

Modelling growth rates:
We used three models—Linear Regression, Random Forest, and Gradient Boosting—to predict growth rates and chose the model with the lowest RMSE. We trained on data from 2019 to 2022 and tested it on 2023. Linear Regression had the best performance with an RMSE of 198, compared to 2182 for Random Forest and 2189 for Gradient Boosting. 

Therefore, Linear Regression was selected to predict population and income growth for 2024-2026, shown below

In [19]:
income_by_sa2 = pd.read_csv("../data/curated/income_by_sa2.csv")
income_by_sa2

Unnamed: 0,Label,median_income_2016,median_income_2017,median_income_2018,median_income_2019,median_income_2020,median_income_2021,median_income_2022,median_income_2023,median_income_2024,median_income_2025,median_income_2026
0,Abbotsford,56001,56242,59923,61918,66091,69104.795585,71219.795864,74636.644356,77788.073345,80427.684311,83781.391430
1,Airport West,54708,56619,58350,60992,62920,65268.165154,68154.903250,70695.851426,73394.212934,76451.213723,79397.004301
2,Albert Park,64048,62922,64900,65880,68836,71758.449466,73918.640871,77000.919533,80239.921519,83055.592588,86331.495226
3,Alexandra,38251,39811,41230,42447,44033,46010.900933,47833.306333,49778.505143,51958.291329,54092.617842,56294.447393
4,Alfredton,49071,50119,52709,53765,55019,57905.793758,60042.535187,62020.613758,64882.959903,67482.620117,69928.862355
...,...,...,...,...,...,...,...,...,...,...,...,...
394,Yarram,40266,39960,42650,43228,44904,47529.419325,49053.429695,50953.269734,53495.237847,55533.119845,57691.004992
395,Yarraville,61906,63996,66465,69379,72428,75150.947880,78146.977115,81342.059481,84431.867671,87696.030832,91150.354200
396,Yarrawonga,40234,40400,41809,43914,45708,47484.605261,49661.654459,51747.588040,53812.175594,56123.344343,58452.379863
397,Yarriambiack,35901,37036,39000,42076,43823,45495.709284,47986.025739,49976.076330,51873.803656,54280.164801,56537.349739


In [26]:
population_by_sa2 = pd.read_csv('../data/curated/population_by_sa2.csv')
population_by_sa2

Unnamed: 0,Label,estimated_population_2018,estimated_population_2019,estimated_population_2020,estimated_population_2021,estimated_population_2022,estimated_population_2023,estimated_population_2024,estimated_population_2025,estimated_population_2026
0,Abbotsford,9527,9594,9672,9258,9513,10008,10527.676824,10914.578045,11189.069984
1,Airport West,8169,8390,8362,8240,8295,8464,8683.181491,8896.540332,9093.092637
2,Albert Park,16728,17081,16955,16011,16177,16861,17665.062455,18280.665009,18706.847174
3,Alexandra,6646,6687,6690,6771,6794,6836,6915.938566,7038.105197,7186.668076
4,Alfredton,13537,14434,15507,16841,18002,18997,19771.949269,20395.671019,20940.982750
...,...,...,...,...,...,...,...,...,...,...
512,Yarram,5437,5474,5545,5555,5588,5580,5594.979256,5653.608883,5754.511887
513,Yarraville,15991,16092,16068,15651,15661,16020,16523.217961,16987.482063,17369.407751
514,Yarrawonga,8297,8418,8508,8593,8727,8812,8901.992448,9023.455120,9184.454667
515,Yarriambiack,6639,6617,6583,6453,6376,6327,6344.344429,6418.401396,6530.575937


# Questions and Analysis

#### **1. What are the most important internal and external features in predicting rental prices?**

Two graphs, such as feature importance from Random Forest and a correlation heatmap of property features, provide insights to the internal and external factors in predicting rental prices in Victoria, Australia.

**Internal Features:**
1. Highest: Bedrooms
* Feature importance graph shows highest importance (~0.25). Reason: more bedrooms suggest larger homes, more demand of land, thus higher rent. There might be more families or friend groups in Victoria population.
* Correlation heatmap shows the highest positive correlation (0.11). Reason: confirms that properties with more bedrooms results in higher rent.
* Property market expectation
2. Lowest: Bathrooms
* lowest importance (~0.20). Reason: Many properties have common number of bathrooms or families/roommates do not mind sharing shower spaces as long as they have their private sleeping area. 
* second highest correlation (0.09) with price, suggesting more bathrooms increases rental price. Reason: increased privacy, which increases demand for roommates, thus increase price.

**External Features:**
1. Highest: Distance to supermarkets 
* Lowest importance (~0.09). Reason: accessible and convenience for food and resources, higher quality of life, which increases rental demand
* Unexpected positive correlation (0.015), suggesting as distance to supermarket increase, the rental price increase. Reason: Suburb areas may offer more space and quietness compared to commercial areas, thus increases rental desirability. 
1. Lowest: Distance to schools 
* Lowest importance (~0.05). Reason: Families may prefer to rent than buy, and may consider school's reputation isntead of distance to homes. Rental properties may appear to individuals with no children such as university students or job-priority professionals. 
* Low correlation (0.057) supports this. 

In [18]:
Image(url="../plots/feature_importance.png", height=600)

In [41]:
Image(url="../plots/correlation_property_features.png", width=700, height=500)

#### **2.What are the top 10 suburbs with the highest predicted growth rate?**

#### **3. What are the most liveable and affordable suburbs according to your chosen metrics?**

**Affordability**

1. Barplot of Bottom 20 Suburbs by Average Property Price:
* Average property price in the 20 most affordable suburbs ranges between approximately $25 and $200.
* Nyora is the most affordable suburb with an average price below $50, followed by Newmerella around $75, and Kialla, Cannum, and others around the $150 to $200 range.
* Suggests that suburbs are highly affordable compared to Melbourne CBD areas.Areas attracts families looking for lower-cost living options outside of Victoria’s central cities.

2. Heatmap of Cheapest Suburbs:
* Lighter red to white regions indicate lower property prices.
* Southern and Eastern Victoria are the most affordable (white areas), this aligns with Nyora as the cheapest suburb above that is also located southeast of Melbourne CBD.
* Ideal for those prioritizing cost over city amenities access


In [39]:
Image(url='../plots/bottom_20_suburbs_property_price.png', height=500)

In [35]:
Image(url='../plots/reverse_property_price_heatmap.png', width=800, height=500)