In [2]:
# Necessary modules
import pandas as pd
from IPython.display import Image

## Target Audience

Our two target audience are:
1) **Real Estate Investors**, looking to maximise returns by targeting properties with high growth potential and affordability.
2) **Potential Rentals**, looking to live in affordable homes.

Our findings on regional affordability and growth for rental prices, income, population in Victoria support this strategy.

### 0. Data Links
1. Postcodes Victoria, sourced from Postcodes Australia (https://postcodes-australia.com/state-postcodes/vic)
2. Victoria Suburbs Shapefile, sourced from Data.gov.au (https://data.gov.au/data/dataset/af33dd8c-0534-4e18-9245-fc64440f742e/resource/0a9f2827-49e9-4390-9ba4-89329736b16b/download/gda2020.zip)
3. Income and Population by SA2 Districts, sourced from Australian Bureau of Statistics ABS (https://www.abs.gov.au/methodologies/data-region-methodology/2011-23#data-downloads)


# Data and Pre-processing

### **Property dataset**

Collection:
* Extracted from Domain website using BeautifulSoup, generating URLs based on suburb names and postcodes. For each suburb, a maximum of 1000 pages of properties were scraped according to the postcodes. This totalled to 14459 links scraped, with 20 features such as price, address, nearby schools, and other property features. Note that this dataset already included nearby schools, so we did not have to compute school locations.

Pre-processing:
* Removed entries with missing prices.
* Removed NBN details and Property Features columns as they were often empty on the website, thus we focus on relevant frequent features.
* Removed unit numbers from address to ensure accurate location and distance computation.
* Formatted price to floats to ensure numerical analysis.
* Calculate coordinates of each property using Geocoder. 
* Removed entries with missing coordinates and located more than 600 km from Melbourne CBD. This eliminate properties belonging to NSW state.
* Removed outliers using Z-scores threshold of sqrt(2 * log(N)) for a sample size > 100.
* Calculate euclidean distance of each property to the nearest school. We assumed Euclidean distance is proportional to driving distance, differing by a consistent factor, and serves as an accurate approximation.

### **Supermarket dataset**

Collection:
* Query supermarket data from Overpass API within Victoria. This totalled to 967 supermarkets with features such as supermarket name, latitude, longitude, address, and more features.

Pre-processing:
* Calculate euclidean distance of each property to the nearest supermarket. This follows the same assumption of Euclidean and Driving distance above.

### **Public Transport Victoria (PTV) dataset**

Collection:
* Retrieved shapefiles from PTV Data VIC for transportation stations: train (metro and regional), bus (metro and regional), and trams.

Pre-processing:
* Calculate euclidean distance (in meters) to the nearest train, bus, and tram station from each property. This follows the same assumption of Euclidean and Driving distance above.

**Thus, the final rental properties with distances to the closest PTV stations, supermarkets, and schools are shown below; 10,357 rows**


In [3]:
property_df = pd.read_parquet("../data/raw/property_details_w_distances.parquet")
print("Total entries:", len(property_df))
property_df.head()

Total entries: 10357


Unnamed: 0,title,description,street_address,suburb,postcode,price,bedrooms,bathrooms,parking,primary_property_type,...,floor_plans_count,virtual_tour,nearby_schools,latitude,longitude,distance_to_closest_train,distance_to_closest_bus,distance_to_closest_tram,distance_to_melbourne_cbd,distance_to_closest_supermarket
0,"60 Little Windrock Lane, Craigieburn VIC 3064 ...","View this 2 bedroom, 1 bathroom rental house a...","60 Little Windrock Lane, Craigieburn VIC 3064",Craigieburn,3064,450.0,2.0,1.0,1.0,House,...,0.0,False,589.584078,-37.588897,144.915516,3587.757681,241.893482,17783.424475,31577.387322,491.097565
1,"53 Were Street, Brighton VIC 3186 - House For ...","View this $1,500/week 4 bedroom, 2 bathroom re...","53 Were Street, Brighton VIC 3186",Brighton,3186,1490.0,4.0,2.0,2.0,House,...,2.0,True,279.20479,-37.92564,144.999904,1201.725436,263.466321,2485.958722,16794.359101,1910.765178
2,"43 Tackle Drive, Point Cook VIC 3030 - Townhou...","View this 3 bedroom, 2 bathroom rental townhou...","43 Tackle Drive, Point Cook VIC 3030",Point Cook,3030,550.0,3.0,2.0,2.0,Townhouse/Villa,...,0.0,True,845.354703,-37.906257,144.720254,3890.225343,713.624953,24933.663604,30206.465031,1381.56761
3,"3 Rostrevor Parade, Mont Albert VIC 3127 - Hou...","View this 5 bedroom, 2 bathroom rental house a...","3 Rostrevor Parade, Mont Albert VIC 3127",Mont Albert,3127,800.0,5.0,2.0,2.0,House,...,0.0,False,286.075947,-37.812918,145.10611,917.64228,424.603215,481.283363,15957.735572,1069.53936
4,"48 Roberts Street, Frankston VIC 3199 - Studio...","View this 9 bedroom, 3 bathroom rental studio ...","48 Roberts Street, Frankston VIC 3199",Frankston,3199,299.0,9.0,3.0,4.0,Apartment,...,1.0,False,941.545172,-38.154913,145.140409,420.672031,555.155741,36692.954646,52546.454397,612.948151


### **SA2 District Boundaries (ABS)**

We obtained the 2021 dataset for granularity Statistical Area Level 2 (SA2) from the Australian Bureau of Statistics (ABS). These are the following pre-processing steps: These are the following pre-processing steps:
* Filtered the SA2 data to contain Victoria 
* Converted relevant data types to features, such as strings for SA2 areas and integers for population.

In [4]:
sa2Areas = pd.read_csv('../data/curated/SA2_Victoria.csv')
print("Total entries:", len(sa2Areas))
sa2Areas.head()

Total entries: 524


Unnamed: 0,SA2_CODE_2021,SA2_NAME_2021,STATE_NAME_2021
0,201011001,Alfredton,Victoria
1,201011002,Ballarat,Victoria
2,201011005,Buninyong,Victoria
3,201011006,Delacombe,Victoria
4,201011007,Smythes Creek,Victoria


### **Population and Income by SA2**

The Population dataset is from 2018 to 2023, where features selected are suburb, year, and estimated resident population. The Income dataset is from 2016 to 2020 with selected features as suburb, year, and median employee income. We forecasted the population and income by SA2 districts from 2024 to 2026. 

Both *income and population* follows the *same* pre-processing steps:
* Removed entries not within the Victoria SA2 codes 
* Removed missing features for population and remove missing rows for income
* Removed duplicate rows based on 'Label' and 'Year'

Modelling growth rates:
We used three models—Linear Regression, Random Forest, and Gradient Boosting—to predict growth rates and chose the model with the lowest RMSE. We trained on data from 2019 to 2022 and tested it on 2023. Linear Regression had the best performance with an RMSE of 198, compared to 2182 for Random Forest and 2189 for Gradient Boosting. 

Therefore, Linear Regression was selected to predict population and income growth for 2024-2026, shown below

In [5]:
income_by_sa2 = pd.read_csv("../data/curated/income_by_sa2.csv")
print("Total entries:", len(income_by_sa2))
income_by_sa2.head()

Total entries: 399


Unnamed: 0,Label,median_income_2016,median_income_2017,median_income_2018,median_income_2019,median_income_2020,median_income_2021,median_income_2022,median_income_2023,median_income_2024,median_income_2025,median_income_2026
0,Abbotsford,56001,56242,59923,61918,66091,69104.795585,71219.795864,74636.644356,77788.073345,80427.684311,83781.39143
1,Airport West,54708,56619,58350,60992,62920,65268.165154,68154.90325,70695.851426,73394.212934,76451.213723,79397.004301
2,Albert Park,64048,62922,64900,65880,68836,71758.449466,73918.640871,77000.919533,80239.921519,83055.592588,86331.495226
3,Alexandra,38251,39811,41230,42447,44033,46010.900933,47833.306333,49778.505143,51958.291329,54092.617842,56294.447393
4,Alfredton,49071,50119,52709,53765,55019,57905.793758,60042.535187,62020.613758,64882.959903,67482.620117,69928.862355


In [6]:
population_by_sa2 = pd.read_csv('../data/curated/population_by_sa2.csv')
print("Total entries:", len(population_by_sa2))
population_by_sa2.head()

Total entries: 517


Unnamed: 0,Label,estimated_population_2018,estimated_population_2019,estimated_population_2020,estimated_population_2021,estimated_population_2022,estimated_population_2023,estimated_population_2024,estimated_population_2025,estimated_population_2026
0,Abbotsford,9527,9594,9672,9258,9513,10008,10527.676824,10914.578045,11189.069984
1,Airport West,8169,8390,8362,8240,8295,8464,8683.181491,8896.540332,9093.092637
2,Albert Park,16728,17081,16955,16011,16177,16861,17665.062455,18280.665009,18706.847174
3,Alexandra,6646,6687,6690,6771,6794,6836,6915.938566,7038.105197,7186.668076
4,Alfredton,13537,14434,15507,16841,18002,18997,19771.949269,20395.671019,20940.98275


# Questions and Analysis

## 1. What are the most important internal and external features in predicting rental prices?

Two graphs, such as feature importance from Random Forest and a correlation heatmap of property features, provide insights to the internal and external factors in predicting rental prices in Victoria, Australia.

**Internal Features:**
1. **Bathrooms**
* Point: bathrooms have highest importance in predicting rental prices.
* Evidence: feature importance (0.175), positive correlation (0.46) 
* Explain: more bathrooms suggests increased privacy, especially in shared/family homes, thus increased demand, thus increases rental price.
* Link: 
    *  Investors target properties with multiple bathrooms to attract families and friend groups, leading to higher rent. 
    * Renters are more aware that more bathrooms means higher rental costs, thus opportunity costs between privacy or affordability. 
2. **Bedrooms**
* Point: bedrooms have 2nd highest importance in predicting rental prices.
* Evidence: feature importance (0.12), positive correlation (0.31) 
* Explain: larger homes attract families/groups, higher demand for land area, so increases demand.
* Link: 
    *  Investors target properties with more bedrooms in areas with rising demand to maximise returns.
    * Renters understand that more bedrooms means higher rental costs but ideal for families or shared living.

**External Features:**
1. Highest: Distance to closest Tram 
* Point: proximity to tram stops influences rental price
* Evidence: Feature importance (0.125), negative correlation (-0.02). As the property's distance from the nearest tram stop decreases, the rental price increases. 
* Explanation: convenience to public transport, higher demand for renters who do not drive
* Link: 
    * Investors target properties close to tram stops, usually near CBD. 
    * Renters understand trade off between convenience going to work with nearby trams but higher rents.
2. Distance to Melbourne CBD 
* Point: proximity to City Centre are important in predicting rental prices. 
* Evidence: Feature importance (0.12), negative correlation (-0.2): As the distance to CBD decreases, the rental price increases.
* Explain: employment opportunities, amenities convenience, more universities, city life
* Link: 
    *  Investors target properties near CBD to provide highest growth potential.
    * Renters can budget by living further from CBD to get lower rents. 

Conclusion: 
* Investors should invest on properties near the CBD with many tram stops nearby, where properties have a large number of bathrooms or bedrooms. This is usually for student groups demographic who prefer to live in apartment or shared houses in city centre.
* Renters looking for affordable rent should live in suburban areas with limited tram access and be open to sharing bathrooms or bedrooms with roommates.

In [7]:
Image(url="../plots/feature_importance.png", height=400)

In [8]:
Image(url="../plots/correlation_heatmap.png", width=700, height=500)

## 2. What are the top 10 suburbs with the highest predicted growth rate?

We used a LASSO Regression to predict future median rental prices for 2024-2026. Our features selected were population and income.

We utilised historical trends in growth rate for population and income, assuming that these features will continue to predict future rental growth. 

From Table 1, the top 10 suburbs with highest predicted growth rate are:
1. Albert Park
2. Sunshine
3. Broadmeadows
4. Swan Hill
5. Fawkner
6. Elwood
7. Hurstbridge
8. Dandenong North
9. Thomastown
10. Buninyong

For real estate investors seeking for strong and consistent price growth:
* Target suburbs like Albert park, Sunshine, and Broadmeadows
* Albert Park has largest growth (22.56%) with median projected rent of $668 by 2026. Sunshine and Broadmeadows (22.42% and 21.56% growth respectively) offer high median rents around $453–$480.
* Average projected growth of 9.16% from 2023-2026
* May be due to population and income growth that increases urbanisation and spending in Melbourne's outer suburbs, thus increasing demand and increasing rental price. 

From Figure 2,
* Higher median incomes in suburbs for Albert Park, Elwood, and Camberwell, around $80,000
* Wealthier high-income areas attracts demand, more desirable for buyers, thus lead to higher investment potential.

In [21]:
# Table 1 - top 10 suburbs with highest predicted growth
data = pd.read_csv("../data/curated/suburb_level_results.csv")
data

Unnamed: 0,Label,Percentage Growth
0,Albert Park,22.563038
1,Sunshine,22.42013
2,Broadmeadows,21.563606
3,Swan Hill,19.766879
4,Fawkner,19.431414
5,Elwood,18.827259
6,Hurstbridge,17.629904
7,Dandenong North,17.100228
8,Thomastown,16.633983
9,Buninyong,16.53809


In [26]:
# Figure 2 - Income of the highest growth suburbs
Image(url="../plots/average_income_per_suburb_1.png")

In [11]:
# Figure 3 - Average predicted rent per week projections
Image(url='../plots/average_median_price_projected.png', width=600, height=400)

## 3. What are the most liveable and affordable suburbs according to your chosen metrics?

##### **Liveability**

Liveability Score is weighted and justified as:
1. Average Property Price: 50% weight, since affordability was assumed as top priority for renters seeking livability.
2. Distance to the Closest Supermarket: 16.6% weight, since convenience and accessibility to daily necessities improves quality of life.
3. Average Distance to Public Transport (PTV): 16.6% weight, because convenient access to public transport reduced commuting stress thus better quality of life.
4. Distance to the Closest School: 16.6% weight, as convenience to school is important for families with children. This factor has the same weight as transport and supermarket proximity.

Lower liveability score is better due to the distance negatively correlation with better living conditions. For example, there is an inverse relationship where less distance to supermarket means a property is more liveable. 

Our results are described below:
* Top 3 Livable Suburbs West Melbourne, Seddon, Footscray, from the bar plots for Top 20 Livable Suburbs
* Livable suburbs highly concentrated around Melbourne CBD to nearby Suburb areas, from the Heatmap
* Explain: Suburban areas near CBD has balance of affordability and convenient access to necessities, commuting options, and education.
West Melbourne, Seddon, and Footscray are 20 minutes driving distance from CBD, which is not too long, and their trams and service infrastructure are still well-developed, thus contributing to liveability. 
* Link: 
    * Investors can invest on cheaper suburbs compared to CBD centres, with more stable rental demand due to balance of affordability and convenience.
    * Renters can live affordably without sacrificing convenience and expensive costs of CBD's rent.

In [12]:
Image(url='../plots/top_20_suburbs_livability.png', width=600, height=400)

In [13]:
Image(url='../plots/property_livability_score_heatmap.png', width=600, height=400)

##### **Affordability**

Suburbs or regional areas provides a cheaper rent thus more affordable living options for renters.
Here, we compute suburbs that have greater than 5 properties in each area, then we find the top 20 most affordable suburbs.

Evidence:
* Bar plots shows Byaduk North and Hamilton (both 3.5 hours from CBD) have the lowest average rent prices, averaging $325 per week.
* Bar plots also shows Golden Point (1.5 hour from CBD) as top 3 lowest average rent prices, averaging $325 per week.
* Heatmap shows affordable rental areas highly concentrated outside Melbourne CBD, in more rural areas. 

Explanation:
* This supports how Byaduk North, Hamilton, and Golden Point are rural areas far from city centre and are the most affordable.
* Less competition in suburb areas thus more affordable than city-centres, and attract families looking for affordable and larger homes for their children.
* Potential for long-term returns since families may prefer cheaper rent with larger space.

Link: 
* Investors: invest in Byaduk North, Hamilton, and Golden Point that offers the cheapest suburbs thus a more stable rental demand from low prices. 
* Renters: If prioritising family and budget-friendly home, they can rent in Byaduk North and Hamilton for a larger housing. 

In [14]:
Image(url='../plots/bottom_20_suburbs_property_price.png', height=500)

In [15]:
Image(url='../plots/reverse_property_price_heatmap.png', width=700, height=400)