In [6]:
import pandas as pd
import geopandas as gpd


# Introduction
### Background
The RBA (Reserve bank of Australia) has been increasing interest rate since 2021, which significantly affect the real estate market. The house price has been dropping since then. Many house investors are considering selling their property to stop loss or leasing and waiting for a possible market recovery. On the other hand, renters may consider stopping renting and buying property to settle down. For both situations, rental prices are relevant because they can be viewed as a substantial opportunity cost.

### Target and Methodology
This project aims to offer overviews and predictions of the rental market. We approach it using mainly machine learning methods because of their high accuracy rate and advantage in trend analysis.

# ADD MODEL DESCRIPTIONS | OVERVIEW HERE


### Value
The result may help ordinary renters position their target property to meet their needs. It may also offer assistance for investors in calculating revenue or opportunity costs.

# Dataset information

## Methodology
Methods used to collect data are direct downloading, manual requesting, web crawling, and API requesting.

## Dataset detail
Here is a list of datasets we used and their brief summary:

### Time Related
| Name | Duration | Time unite | Area unite | Extra note|
| --- | --- | --- | --- | --- |
| ERP | 2010 - 2021 | Year | SA2 | Expected Residential Population |
| Residential Property Price Index | Q1/2021 - Q4/2021 | Quarter | AUS | Represent house price |
| 3-year bond | 07/2013 - 08/2022 | Month | AUS | Close to risk-free rate |
| Median Rent | Q2/1999 - Q1/2021 | Quarter | SA2 | Median rent and Deal count of each LGA district |
| Exchange Rate | 03/2010 - 06/2022 | Month | AUS | Exchange rate from AUD to USD |
| Immigration data | 2004 - 2019 | Year | Victoria | Victoria Immigration Data |
| Debt income ratio | 2009 - 2019 | 2 Year | AUS | Measured every two years |


### District / Area Related
| Name | Unite | Extra note|
| --- | --- | --- |
| School location | Longitude / Latitude | Longitude and Latitude of school |
| ERP | SA2 | Expected Residential Population |
| Median Household Income | SA2 | - |
| Meidan Rent | SA2 | Median Rent and Deal Count of each SA2 district |
| Distance to CBD | SA2 | Distance From centroid of each SA2 district to CBD |
| PTV Station | SA2 | number of stations (train, bus, coach, ...)

### Property Related
| Name | Extra note |
| --- | --- |
| Position | Latitude and Longitude of Property |
| Number of Rooms | room types are bedroom, bathroom, and parking spot |
| Distance to school | Distance to Nearest Schools measured in meters |
| Distance to Station | Distance to Nearest Train Station OR CBD measured in meters |


## Time related

For time-related data, take SA2 as part of the index.

#### Assumptions:
- **Constant increase/decrease rate** Throughout the Year. This applied to ERP, Immigration Data, and debt-income ratio data

<img src="../plots/Constant Growth.svg" alt="Constant Growth Rate" style="width: 800px;" title="Constant Growth Rate" />


#### Analysis
<img src="../plots/History_rent.png" alt="Median Rent & Deal Count" style="width: 500px;" title="Median Rent & Deal Count" />

From the above Victoria History Median Rent and Deal Count Graph, we can overserve that the rent price and deal count keep **increasing** at a relatively constant rate.

<img src="../plots/Part_his_rent.png" alt="Part of History Rent" style="width: 500px;" title="Part of History Rent" />

From the segment graph, we can observe that both transition count and rental price in **quarter 2** are the **lowest** in the whole year.


In [7]:
his_df = pd.read_csv(f"../data/curated/history_info.csv").drop(["Unnamed: 0"], axis=1)
his_df = his_df.query("SA2 == 201011001")
his_df.head()

Unnamed: 0,SA2,year,quarter,population,bond,price_index,deal_count,median_rent,to_USD,immi_count,debt_ratio
0,201011001,2013,3,9550,2.9,105.0,1027,280.0,0.9309,30375,0.85875
1,201011001,2013,4,9714,2.96,109.0,1050,290.0,0.8948,30562,0.86
2,201011001,2014,1,9870,2.97,110.0,1251,295.0,0.9221,30932,0.85875
3,201011001,2014,2,10026,2.8,112.0,1069,280.0,0.942,31302,0.8575
4,201011001,2014,3,10182,2.8,113.0,1035,300.0,0.8752,31672,0.85625


## Related to SA2 Area standard

For SA2 code-related data, we use the newest data in terms of years when their timeline is included in datasets.

#### Assumptions:
- We take **right join** when converting the LGA area standard to the SA2 standard

<img src="../plots/right join.svg" alt="Right Join" style="width: 800px;"/>

- The **minimal distance** between any two points is **350m** (graph retrieved from ../plots/Min_Distance.html)

<img src="../plots/min_distance.png" alt="Minimal Distance" style="width: 300px;"/>

- We shift the centroid of some SA2 areas within (+- 0.04) degree on **latitude** when we cannot find any waypoints near the original position (graph retrieved from ../plots/Distance_shift.html)

<img src="../plots/distance_shift.png" alt="Centroid Shift" style="width: 300px;"/>


#### Analysis

<img src="../plots/Top_10.png" alt="Centroid Shift" style="width: 800px;"/>

- (graph retrieved from ../plots/Top_10_rent.html)

From the above median rent distribution for all SA2 areas in quarter 1, 2021, it can be observed that properties **close to city centres** generally have high rent. A similar high rent level can be found in some **seashore towns** and **East Melbourne** as well. 



In [8]:
sa2_df = pd.read_csv(f"../data/curated/sa2_info.csv").drop(["Unnamed: 0"], axis=1)
sa2_df.head()

Unnamed: 0,SA2,school_count,ERP_population,median_income,metrobus_count,metrotrain_count,metrotram_count,regbus_count,regcoach_count,regtrain_count,skybus_count,recr_count,comm_count,deal_count,median_rent,cbd_dis
0,202011018,13,14951,1267,0,0,0,142,2,1,0,1,1,709,350.0,152998.1
1,202011022,9,21060,1238,0,0,0,130,3,1,0,0,0,709,350.0,144471.1
2,203011035,6,8065,1898,0,0,0,3,4,0,0,1,0,2478,378.0,123256.7
3,203031048,6,16716,1424,0,0,0,74,0,0,0,0,0,1970,380.0,98849.4
4,204011062,4,4142,1222,0,0,0,0,10,0,0,3,0,360,331.666667,350.0


## Related to Property conditions

We use web crawling to retrieve basic house conditions and API requests to find distance-related data.

#### Assumptions:
- We approximate the distance from CBD to property with the distance between CBD and the centroid of the local SA2 area (graph retrieved from ../plots/Route_Appro.html)

<img src="../plots/route_appro.png" alt="Route Approximation" style="width: 600px;"/>

In [9]:
rent_df = pd.read_csv(f"../data/curated/rent_distance.csv").drop(["Unnamed: 0"], axis=1)
rent_df.head()

Unnamed: 0,SA2,rent,bedroom,baths,parking,Latitude,Longitude,school_dis,station_dis,cbd_dis
0,201011001,490.0,4,2,2,-37.563073,143.793875,1651.7,5895.5,125495.6
1,201011001,420.0,4,2,2,-37.547241,143.770106,1249.5,7529.9,125495.6
2,201011001,520.0,4,2,2,-37.566319,143.800328,2094.0,6864.2,125495.6
3,201011001,440.0,4,2,2,-37.563453,143.789489,3988.4,7111.3,125495.6
4,201011001,440.0,4,2,2,-37.550549,143.786038,1120.8,6272.3,125495.6
