
# Summary of the Notebook

This notebook processes rental data from 2017 to 2024 and predicts rental prices for the years 2025 to 2027. The workflow includes the following steps:

1. **Data Import**:
    - Import necessary libraries and read the rental data and prediction data from CSV files.

2. **Data Preparation**:
    - Filter and select relevant columns from the rental data.
    - Separate the data into house and unit/apartment types.

3. **Yearly Data Aggregation**:
    - Define a function to aggregate rental prices yearly for both house and unit/apartment data.

4. **Prediction Merging**:
    - Define a function to merge the yearly aggregated data with prediction data for different models.
    - Combine the predictions with the confidence scores.

5. **Data Export**:
    - Export the final processed data to CSV files for houses and units/apartments.

This notebook provides a comprehensive approach to analyzing rental trends and predicting future rental prices using historical data and various prediction models.


# To yearly format for Growth

In [1]:
import pandas as pd
rental_df = pd.read_csv('../data/curated/rental-17-24.csv')
pred_unit = pd.read_csv('../data/curated/predict_unit.csv')
pred_house = pd.read_csv('../data/curated/predict_house.csv')

In [2]:
rental_df = rental_df[[
    'suburb', 'sa2_code', 'type', 'year', 'bed', 'bath', 'car', 'median_income',
    'population', 'cpi', 'unemployment_rate', 'time_city', 'avg_property_price',
    'rented_price'
]]
# rental_df = rental_df[rental_df["year"] ==2024]

house_df = rental_df[rental_df['type'] == 'House']
unit_df = rental_df[rental_df['type'] == 'Unit/apmt']

In [3]:
# take mean of every column group by suburb
# type2024 = type2024.drop('type',axis=1).groupby('suburb').mean().reset_index()

def get_yearly(type_df):
    type_all = None
    for year in range(2017,2025):
        type = type_df[type_df['year'] == year].drop('type',axis=1).groupby('suburb').mean().reset_index()[['suburb','rented_price']]
        if type_all is None:
            type_all = type.rename(columns={'rented_price':f'rented_price_{year}'})
        else:
            type_all = type_all.merge(type, on='suburb',how='left', suffixes=('', f'_{year}'))
    return type_all
house_all = get_yearly(house_df)
unit_all = get_yearly(unit_df)

In [4]:
def merge_pred(type_all, prediction):
    df_list = []
    for pred in ['pred_0r', 'pred_lr', 'pred_rf', 'pred_combine']: #
        df_cur = type_all.copy()
        for year in range(2025,2028):
            pred_cur = prediction[prediction['year'] == year][['suburb', pred]]
            pred_cur.rename(columns={pred:f"pred_{year}"}, inplace=True)
            df_cur = df_cur.merge(pred_cur, on=['suburb'],how='left', suffixes=('', ''))
        df_cur['model']=pred.split('_')[1]
        df_list.append(df_cur)
    confi_df = prediction[prediction['year'] == 2025][['suburb','confidence']]
    res_df = pd.concat(df_list)
    res_df = res_df.merge(confi_df, on='suburb', how='left')
    return res_df
house_final = merge_pred(house_all, pred_house)
unit_final = merge_pred(unit_all, pred_unit)

In [5]:
house_final.to_csv('../data/curated/house_yearly.csv')
unit_final.to_csv('../data/curated/unit_yearly.csv')