# Historical Rent Preprocessing
This Notebook outlines the preprocessing of historical rent data supplied from the department of family fairness and housing (DFFH).

Sections:
1. Rental Suburb Mapping
2. Data Restructuring & Cleaning
3. SAL Code Retrievial

Before running this notebook ensure that the file moving-annual-rent-suburb-march-quarter-2024.xlsx has been downloaded to the data/landing directory. This can be done by running the download datasets.py script.

In [238]:
import pandas as pd

### 1 Rental Suburb Mapping
Through an initial inspection of the historical rent dataset, it was clear that the dataset was in terrible shape (see the initial reading of the data below).  In particular the suburb groups were ambiguously defined and showed no pattern in their groupings.  Through a further inspection of the Homes Victoria Rental Report of this data (this can be downloaded from the page [https://www.dffh.vic.gov.au/publications/rental-report] and download the 2024 march quarter rental report), it was found that neighbouring suburbs with similar rent characteristics were grouped together.  However, the DFFH website supplies no shapefile for these groupings, only a figure of the report of how the suburbs were split in Melbourne.

We emailled the DFFH to see if they could supply a shapefile, or some sort of data, which defined these groupings of suburbs specifcally and clearly.  However, they did not response.

This was very unfortunate.  Our study is directly on rent in suburbs of Victoria.  It was crucial to have some measure of historical rent prices in Victoria, and since no other dataset on this could be found, this dataset was our best choice and was important to get these groupings correct.  The report also stated the suburb boundaries were defined from gazetted localities shapefile supplied from [https://discover.data.vic.gov.au/dataset/vicmap-admin].  So the first step of preprocessing was to manually visually inspect the suburb group boundaries and line them up with the gazetted localities to find out which suburbs are in each group, although this is not ideal but necessary since the DFFH could not directly supply the data.

A final important note is that the data is from a rental report.  Any other rental data of suburbs not found in the mapped out lists are deemed irrelevant to our study as the DFFH found the rental data here to be non existent or irrelevant to report it.



INSERT FIGURE?

In [239]:
# inital reading of the historical rent data
historical_rent_df = pd.read_excel("../../data/landing/moving-annual-rent-suburb-march-quarter-2024.xlsx", sheet_name="All properties")
historical_rent_df

Unnamed: 0,Moving annual rent by suburb,Unnamed: 1,Lease commenced in year ending,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 186,Unnamed: 187,Unnamed: 188,Unnamed: 189,Unnamed: 190,Unnamed: 191,Unnamed: 192,Unnamed: 193,Unnamed: 194,Unnamed: 195
0,All properties,,Mar 2000,,Jun 2000,,Sep 2000,,Dec 2000,,...,Mar 2023,,Jun 2023,,Sep 2023,,Dec 2023,,Mar 2024,
1,,,Count,Median,Count,Median,Count,Median,Count,Median,...,Count,Median,Count,Median,Count,Median,Count,Median,Count,Median
2,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,796,545,740,550,730,600,720,600,671,650
3,,Armadale,733,200,737,200,738,205,739,210,...,757,490,687,500,639,525,594,560,566,560
4,,Carlton North,864,260,814,260,799,265,736,270,...,497,620,495,630,467,650,418,670,384,680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
156,,Wanagaratta,705,125,671,125,631,130,623,130,...,535,380,555,390,565,390,593,395,580,400
157,,Warragul,385,130,367,135,382,135,366,135,...,507,440,542,450,558,450,543,460,541,470
158,,Warrnambool,1266,130,1229,135,1204,135,1135,135,...,881,420,861,430,846,450,844,460,840,460
159,,Wodonga,1446,145,1439,145,1468,150,1449,150,...,1205,410,1187,420,1164,420,1155,430,1139,450


#### 1.1 The Mapping Method
This was done on an excel worksheet.

Through copying and pasting the suburb groups of the historical rent data in one column labeled Rental Suburbs, and in the column next to it named "SAL suburbs (gazetted localities)" the list of all suburbs in that group found, through analysing the borders of suburbs in Melbourne in figure 2 of the report, of the data with the gazetted localities map to find which suburbs are in which group, entered in format specified below in excel:

Suburb Cluster	| Rental suburbs | SAL suburbs (gazetted localities)

Inner Melbourne | Albert Park-Middle Park-West St Kilda | albert park - middle park - st kilda west

An additional column suburb suburb cluster was used to seperate the suburb groups geographically, this was also copied and pasted in from historical rental data.  Each suburb group was manually inspected as stated above the boundaries following the gazetted localities data which lined up almost exactly one-to-one with the figure in the report.  Since the figure is only of Melbourne and its outer regions. 

The "Other regional centres" cluster was not able to be expanded into smaller suburbs like the other suburb groups in the data, so it was left as it is.

Once the mapping was finished the excel file was saved with the file name "rental_suburbs_to_SAL_mapping" to the data/raw/ directory

In [240]:
# read in the mapped out rental groups
rental_groups_df = pd.read_excel('../../data/raw/rental_suburbs_to_SAL_mapping.xlsx')
print(rental_groups_df.shape)
rental_groups_df

(159, 3)


Unnamed: 0,Suburb Cluster,Rental suburbs,SAL suburbs (gazetted localities)
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,albert park - middle park - st kilda west
1,,Armadale,armadale
2,,Carlton North,carlton north - princes hill
3,,Carlton-Parkville,parkville - carlton
4,,CBD-St Kilda Rd,melbourne cbd
...,...,...,...
154,,Wangaratta,Wangaratta
155,,Warragul,Warragul
156,,Warrnambool,Warrnambool
157,,Wodonga,Wodonga


### 2 Data Restructuring & Cleaning

In [241]:
# this cell creates new columns of the historical rent df that make the dataframe more readable are more readable

# fill all NaN columns in the first row with that of the cell left of it
for col in historical_rent_df.columns:
    column_list = list(historical_rent_df.columns)
    col_index = column_list.index(col)
    if col_index > 0:
        prev_col = column_list[col_index - 1]
        if not pd.isna(historical_rent_df.at[0, col]):
            historical_rent_df.at[0, col] = historical_rent_df.at[0, col] 
        else:
            historical_rent_df.at[0, col] = historical_rent_df.at[0, prev_col]

# merge first row and second row together
merged_row = historical_rent_df.iloc[0].astype(str) + ' ' + historical_rent_df.iloc[1].astype(str)
historical_rent_df.loc[1] = merged_row

# rename the first two columns
historical_rent_df.loc[1,"Moving annual rent by suburb"] = "Suburb Cluster"
historical_rent_df.loc[1,"Unnamed: 1"] = "Suburb(s)"

# set second row as the new columns titles
historical_rent_df.columns = historical_rent_df.iloc[1]

historical_rent_df

1,Suburb Cluster,Suburb(s),Mar 2000 Count,Mar 2000 Median,Jun 2000 Count,Jun 2000 Median,Sep 2000 Count,Sep 2000 Median,Dec 2000 Count,Dec 2000 Median,...,Mar 2023 Count,Mar 2023 Median,Jun 2023 Count,Jun 2023 Median,Sep 2023 Count,Sep 2023 Median,Dec 2023 Count,Dec 2023 Median,Mar 2024 Count,Mar 2024 Median
0,All properties,All properties,Mar 2000,Mar 2000,Jun 2000,Jun 2000,Sep 2000,Sep 2000,Dec 2000,Dec 2000,...,Mar 2023,Mar 2023,Jun 2023,Jun 2023,Sep 2023,Sep 2023,Dec 2023,Dec 2023,Mar 2024,Mar 2024
1,Suburb Cluster,Suburb(s),Mar 2000 Count,Mar 2000 Median,Jun 2000 Count,Jun 2000 Median,Sep 2000 Count,Sep 2000 Median,Dec 2000 Count,Dec 2000 Median,...,Mar 2023 Count,Mar 2023 Median,Jun 2023 Count,Jun 2023 Median,Sep 2023 Count,Sep 2023 Median,Dec 2023 Count,Dec 2023 Median,Mar 2024 Count,Mar 2024 Median
2,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,796,545,740,550,730,600,720,600,671,650
3,,Armadale,733,200,737,200,738,205,739,210,...,757,490,687,500,639,525,594,560,566,560
4,,Carlton North,864,260,814,260,799,265,736,270,...,497,620,495,630,467,650,418,670,384,680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
156,,Wanagaratta,705,125,671,125,631,130,623,130,...,535,380,555,390,565,390,593,395,580,400
157,,Warragul,385,130,367,135,382,135,366,135,...,507,440,542,450,558,450,543,460,541,470
158,,Warrnambool,1266,130,1229,135,1204,135,1135,135,...,881,420,861,430,846,450,844,460,840,460
159,,Wodonga,1446,145,1439,145,1468,150,1449,150,...,1205,410,1187,420,1164,420,1155,430,1139,450


In [242]:
# drop irrelevant columns
historical_rent_df.drop([0,1], inplace=True)
historical_rent_df.reset_index(drop=True, inplace=True)
historical_rent_df.shape

(159, 196)

In [243]:
# label clusters of suburbs for all rows
current_cluser = historical_rent_df.iloc[0]["Suburb Cluster"]
for i in historical_rent_df.index:
    row = historical_rent_df.iloc[i]
    if type(row["Suburb Cluster"]) == str:
        current_cluser = row["Suburb Cluster"]
    else:
        row["Suburb Cluster"] = current_cluser
historical_rent_df

1,Suburb Cluster,Suburb(s),Mar 2000 Count,Mar 2000 Median,Jun 2000 Count,Jun 2000 Median,Sep 2000 Count,Sep 2000 Median,Dec 2000 Count,Dec 2000 Median,...,Mar 2023 Count,Mar 2023 Median,Jun 2023 Count,Jun 2023 Median,Sep 2023 Count,Sep 2023 Median,Dec 2023 Count,Dec 2023 Median,Mar 2024 Count,Mar 2024 Median
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,796,545,740,550,730,600,720,600,671,650
1,Inner Melbourne,Armadale,733,200,737,200,738,205,739,210,...,757,490,687,500,639,525,594,560,566,560
2,Inner Melbourne,Carlton North,864,260,814,260,799,265,736,270,...,497,620,495,630,467,650,418,670,384,680
3,Inner Melbourne,Carlton-Parkville,1303,251,1278,260,1280,260,1301,260,...,2953,500,2755,530,2687,550,2662,550,2543,570
4,Inner Melbourne,CBD-St Kilda Rd,2132,320,2264,320,2358,320,2361,320,...,13568,550,13505,580,13552,600,13564,620,13582,640
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,Other Regional Centres,Wanagaratta,705,125,671,125,631,130,623,130,...,535,380,555,390,565,390,593,395,580,400
155,Other Regional Centres,Warragul,385,130,367,135,382,135,366,135,...,507,440,542,450,558,450,543,460,541,470
156,Other Regional Centres,Warrnambool,1266,130,1229,135,1204,135,1135,135,...,881,420,861,430,846,450,844,460,840,460
157,Other Regional Centres,Wodonga,1446,145,1439,145,1468,150,1449,150,...,1205,410,1187,420,1164,420,1155,430,1139,450


In [244]:
# remove group total rows for the suburb clusters - can perform a groupby if we need this data later
null_rows = historical_rent_df[historical_rent_df["Suburb(s)"] == "Group Total"].index
historical_rent_df = historical_rent_df.drop(null_rows)
historical_rent_df.shape

(146, 196)

In [245]:
# do the same as the cell above for the mapped out suburbs file for consisent size of the columns
null_rows = rental_groups_df[rental_groups_df["SAL suburbs (gazetted localities)"].isnull()].index
rental_groups_df = rental_groups_df.drop(null_rows)
rental_groups_df.shape

(146, 3)

In [246]:
# add the list of suburbs in each suburb group as a column to the historical rent data
historical_rent_df['SAL suburbs (gazetted localities)'] = rental_groups_df['SAL suburbs (gazetted localities)']
historical_rent_df.shape

(146, 197)

In [247]:
historical_rent_df = historical_rent_df.reset_index()
historical_rent_df = historical_rent_df.drop(columns='index')

In [248]:
# transforming the mapped out suburb groups into a python list to make the column more transformable
historical_rent_df['Suburb_List'] = historical_rent_df['SAL suburbs (gazetted localities)'].apply(lambda x: x.split('-'))
historical_rent_df = historical_rent_df.drop(columns='SAL suburbs (gazetted localities)')
historical_rent_df = historical_rent_df.rename(columns={"Suburb_List":'SAL suburbs (gazetted localities)'})
historical_rent_df.head(5)

1,Suburb Cluster,Suburb(s),Mar 2000 Count,Mar 2000 Median,Jun 2000 Count,Jun 2000 Median,Sep 2000 Count,Sep 2000 Median,Dec 2000 Count,Dec 2000 Median,...,Mar 2023 Median,Jun 2023 Count,Jun 2023 Median,Sep 2023 Count,Sep 2023 Median,Dec 2023 Count,Dec 2023 Median,Mar 2024 Count,Mar 2024 Median,SAL suburbs (gazetted localities)
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,545,740,550,730,600,720,600,671,650,"[albert park , middle park , st kilda west]"
1,Inner Melbourne,Armadale,733,200,737,200,738,205,739,210,...,490,687,500,639,525,594,560,566,560,[armadale]
2,Inner Melbourne,Carlton North,864,260,814,260,799,265,736,270,...,620,495,630,467,650,418,670,384,680,"[carlton north , princes hill]"
3,Inner Melbourne,Carlton-Parkville,1303,251,1278,260,1280,260,1301,260,...,500,2755,530,2687,550,2662,550,2543,570,"[parkville , carlton]"
4,Inner Melbourne,CBD-St Kilda Rd,2132,320,2264,320,2358,320,2361,320,...,550,13505,580,13552,600,13564,620,13582,640,[melbourne cbd]


In [249]:
# loops through all rows and all SAL suburbs in suburb groups get rid of any begining or trailing whitespace and capitalising
# the first letter of every word
for row_id in historical_rent_df.index:
    historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)']
    for i in range(len(historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'])):
        s = historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i]
        if s[0] == ' ':
            historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i]  = historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i][1:]
        historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i] = historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i].rstrip()
        historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i] = historical_rent_df.loc[row_id,'SAL suburbs (gazetted localities)'][i].title()
historical_rent_df.head(5)

1,Suburb Cluster,Suburb(s),Mar 2000 Count,Mar 2000 Median,Jun 2000 Count,Jun 2000 Median,Sep 2000 Count,Sep 2000 Median,Dec 2000 Count,Dec 2000 Median,...,Mar 2023 Median,Jun 2023 Count,Jun 2023 Median,Sep 2023 Count,Sep 2023 Median,Dec 2023 Count,Dec 2023 Median,Mar 2024 Count,Mar 2024 Median,SAL suburbs (gazetted localities)
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,545,740,550,730,600,720,600,671,650,"[Albert Park, Middle Park, St Kilda West]"
1,Inner Melbourne,Armadale,733,200,737,200,738,205,739,210,...,490,687,500,639,525,594,560,566,560,[Armadale]
2,Inner Melbourne,Carlton North,864,260,814,260,799,265,736,270,...,620,495,630,467,650,418,670,384,680,"[Carlton North, Princes Hill]"
3,Inner Melbourne,Carlton-Parkville,1303,251,1278,260,1280,260,1301,260,...,500,2755,530,2687,550,2662,550,2543,570,"[Parkville, Carlton]"
4,Inner Melbourne,CBD-St Kilda Rd,2132,320,2264,320,2358,320,2361,320,...,550,13505,580,13552,600,13564,620,13582,640,[Melbourne Cbd]


In [250]:
# explode the df by the "SAL suburbs (gazetted localities)" column
# each row now is an individual suburb containing the median rent of that suburb of that month between 2000-2024
# SIDENOTE: it is safe to assume here that the median rent of the suburb group is approximately equal to the median rent of suburbs
# in that group
all_suburbs_rent_df = historical_rent_df.explode("SAL suburbs (gazetted localities)")
all_suburbs_rent_df.rename(columns=lambda x: x + ' (of suburb group)' if x.endswith('Count') else x, inplace=True)

# Rename the exploded column for clarity
all_suburbs_rent_df = all_suburbs_rent_df.rename(columns={
    "SAL suburbs (gazetted localities)": "SAL suburb",
    "Suburb(s)": "Suburb Group"
})
all_suburbs_rent_df

1,Suburb Cluster,Suburb Group,Mar 2000 Count (of suburb group),Mar 2000 Median,Jun 2000 Count (of suburb group),Jun 2000 Median,Sep 2000 Count (of suburb group),Sep 2000 Median,Dec 2000 Count (of suburb group),Dec 2000 Median,...,Mar 2023 Median,Jun 2023 Count (of suburb group),Jun 2023 Median,Sep 2023 Count (of suburb group),Sep 2023 Median,Dec 2023 Count (of suburb group),Dec 2023 Median,Mar 2024 Count (of suburb group),Mar 2024 Median,SAL suburb
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,545,740,550,730,600,720,600,671,650,Albert Park
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,545,740,550,730,600,720,600,671,650,Middle Park
0,Inner Melbourne,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,1178,275,...,545,740,550,730,600,720,600,671,650,St Kilda West
1,Inner Melbourne,Armadale,733,200,737,200,738,205,739,210,...,490,687,500,639,525,594,560,566,560,Armadale
2,Inner Melbourne,Carlton North,864,260,814,260,799,265,736,270,...,620,495,630,467,650,418,670,384,680,Carlton North
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,Other Regional Centres,Traralgon,851,125,823,120,831,125,807,125,...,385,922,390,910,390,880,395,842,410,Traralgon
142,Other Regional Centres,Wanagaratta,705,125,671,125,631,130,623,130,...,380,555,390,565,390,593,395,580,400,Wangaratta
143,Other Regional Centres,Warragul,385,130,367,135,382,135,366,135,...,440,542,450,558,450,543,460,541,470,Warragul
144,Other Regional Centres,Warrnambool,1266,130,1229,135,1204,135,1135,135,...,420,861,430,846,450,844,460,840,460,Warrnambool


### 3 SAL Code Retrievial
In this section we find the SAL (Suburbs and Localities) code for each suburb in the exploded dataframe

In [251]:
# read the converter df for suburb name to the SAL code
SAL_converter_df = pd.read_csv("../../data/landing/CG_SSC_2016_SAL_2021.csv")
SAL_converter_df

Unnamed: 0,SSC_CODE_2016,SSC_NAME_2016,SAL_CODE_2021,SAL_NAME_2021,RATIO_FROM_TO,INDIV_TO_REGION_QLTY_INDICATOR,OVERALL_QUALITY_INDICATOR,BMOS_NULL_FLAG
0,10001.0,Aarons Pass,10001,Aarons Pass,1.0,Good,Good,0
1,10002.0,Abbotsbury,10002,Abbotsbury,1.0,Good,Good,0
2,10003.0,Abbotsford (NSW),10003,Abbotsford (NSW),1.0,Good,Good,0
3,10004.0,Abercrombie,10004,Abercrombie,1.0,Good,Good,0
4,10005.0,Abercrombie River,10005,Abercrombie River,1.0,Good,Good,0
...,...,...,...,...,...,...,...,...
15751,90005.0,West Island,90005,West Island,1.0,Good,Good,0
15752,99494.0,No usual address (OT),99494,No usual address (OT),1.0,Good,Good,0
15753,99797.0,Migratory - Offshore - Shipping (OT),99797,Migratory - Offshore - Shipping (OT),1.0,Good,Good,0
15754,,,51265,Prince Regent River,1.0,Good,Good,3


In [252]:
# search all suburbs in the historical rent for the SAL code in the SAL converter df
# print the suburbs to see how many that can not be found
all_suburbs = all_suburbs_rent_df['SAL suburb']
mis_match = []
SAL_codes_list = list(SAL_converter_df["SAL_NAME_2021"])

for suburb in all_suburbs:
    if not suburb in SAL_codes_list:
        # perform a second check specifying the suburb is in Victoria
        adj_search_name = suburb + ' (Vic.)'
        if not adj_search_name in SAL_codes_list:
            mis_match.append(suburb)
print(len(mis_match))
mis_match

8


['Melbourne Cbd',
 'Mckinnon',
 'Hillside',
 'Fieldstone',
 'Bellfield',
 'Mccrae',
 'Newtown',
 'Ballarat']

Exception Handling
- 'Melbourne Cbd': different name in converter -handle as 'Melbourne'
- 'Mckinnon': typo - handle as 'McCrae'
- 'Hillside': different name in converter -handle Hillside melton
- 'Fieldstone': does not exist as SAL -remove
- 'Bellfield': different name in converter -handle Bellfield Banyule
- 'Mccrae': typo - handle McCrae
- 'Newtown': different name in converter -handle Newtown Greater Geelong
- 'Ballarat': different name in converter -handle as Ballarat Central

In [253]:
all_suburbs_rent_df[all_suburbs_rent_df["SAL suburb"]=="Mccrae"] = "McCrae"
all_suburbs_rent_df[all_suburbs_rent_df["SAL suburb"]=="Mckinnon"] = "McKinnon"

In [254]:
# retrieve SAL codes from the converter of all suburbs and print those that are not found
SAL_codes = []
missing_count = 0
for suburb in all_suburbs:
    # check if suburb name is in the converter

    # suburb is in the converter
    if suburb in list(SAL_converter_df['SAL_NAME_2021']):
        suburb_SAL_2021 = SAL_converter_df[SAL_converter_df['SAL_NAME_2021'] == suburb]['SAL_CODE_2021'].values[0]
        SAL_codes.append(suburb_SAL_2021)

    # suburb is in the converter under a different name
    elif suburb + ' (Vic.)' in list(SAL_converter_df['SAL_NAME_2021']):
        adj_search_name = suburb + ' (Vic.)'
        suburb_SAL_2021 = SAL_converter_df[SAL_converter_df['SAL_NAME_2021'] == adj_search_name]['SAL_CODE_2021'].values[0]
        SAL_codes.append(suburb_SAL_2021)
        
    # suburb not found in the converter
    else:
        # exception handling of suburbs
        if suburb == "Melbourne Cbd":
            SAL_codes.append(21640)
        elif suburb == "Hillside":
            SAL_codes.append(21193)
        elif suburb == "Bellfield":
            SAL_codes.append(20198)
        elif suburb == "Newtown":
            SAL_codes.append(21938)
        elif suburb == "Ballarat":
            SAL_codes.append(20111)
        # both search and exception hanlding failed to find a SAL code for the suburb
        else:
            print(suburb + ' Not found')
            missing_count += 1
            SAL_codes.append(-1)

missing_count

Fieldstone Not found


1

In [255]:
# append SAL codes to the dataset
all_suburbs_rent_df['SAL_CODE_2021'] = SAL_codes
all_suburbs_rent_df.shape

(569, 198)

In [256]:
list(all_suburbs_rent_df.columns)

['Suburb Cluster',
 'Suburb Group',
 'Mar 2000 Count (of suburb group)',
 'Mar 2000 Median',
 'Jun 2000 Count (of suburb group)',
 'Jun 2000 Median',
 'Sep 2000 Count (of suburb group)',
 'Sep 2000 Median',
 'Dec 2000 Count (of suburb group)',
 'Dec 2000 Median',
 'Mar 2001 Count (of suburb group)',
 'Mar 2001 Median',
 'Jun 2001 Count (of suburb group)',
 'Jun 2001 Median',
 'Sep 2001 Count (of suburb group)',
 'Sep 2001 Median',
 'Dec 2001 Count (of suburb group)',
 'Dec 2001 Median',
 'Mar 2002 Count (of suburb group)',
 'Mar 2002 Median',
 'Jun 2002 Count (of suburb group)',
 'Jun 2002 Median',
 'Sep 2002 Count (of suburb group)',
 'Sep 2002 Median',
 'Dec 2002 Count (of suburb group)',
 'Dec 2002 Median',
 'Mar 2003 Count (of suburb group)',
 'Mar 2003 Median',
 'Jun 2003 Count (of suburb group)',
 'Jun 2003 Median',
 'Sep 2003 Count (of suburb group)',
 'Sep 2003 Median',
 'Dec 2003 Count (of suburb group)',
 'Dec 2003 Median',
 'Mar 2004 Count (of suburb group)',
 'Mar 2004 Medi

In [257]:
# reorder the columns so SAL suburb name and SAL code is closer to the leftmost columns of the dataframe for readability
new_col_order = ['Suburb Cluster',
 'SAL_CODE_2021',
 'SAL suburb',
 'Suburb Group',
 'Mar 2000 Count (of suburb group)',
 'Mar 2000 Median',
 'Jun 2000 Count (of suburb group)',
 'Jun 2000 Median',
 'Sep 2000 Count (of suburb group)',
 'Sep 2000 Median',
 'Dec 2000 Count (of suburb group)',
 'Dec 2000 Median',
 'Mar 2001 Count (of suburb group)',
 'Mar 2001 Median',
 'Jun 2001 Count (of suburb group)',
 'Jun 2001 Median',
 'Sep 2001 Count (of suburb group)',
 'Sep 2001 Median',
 'Dec 2001 Count (of suburb group)',
 'Dec 2001 Median',
 'Mar 2002 Count (of suburb group)',
 'Mar 2002 Median',
 'Jun 2002 Count (of suburb group)',
 'Jun 2002 Median',
 'Sep 2002 Count (of suburb group)',
 'Sep 2002 Median',
 'Dec 2002 Count (of suburb group)',
 'Dec 2002 Median',
 'Mar 2003 Count (of suburb group)',
 'Mar 2003 Median',
 'Jun 2003 Count (of suburb group)',
 'Jun 2003 Median',
 'Sep 2003 Count (of suburb group)',
 'Sep 2003 Median',
 'Dec 2003 Count (of suburb group)',
 'Dec 2003 Median',
 'Mar 2004 Count (of suburb group)',
 'Mar 2004 Median',
 'Jun 2004 Count (of suburb group)',
 'Jun 2004 Median',
 'Sep 2004 Count (of suburb group)',
 'Sep 2004 Median',
 'Dec 2004 Count (of suburb group)',
 'Dec 2004 Median',
 'Mar 2005 Count (of suburb group)',
 'Mar 2005 Median',
 'Jun 2005 Count (of suburb group)',
 'Jun 2005 Median',
 'Sep 2005 Count (of suburb group)',
 'Sep 2005 Median',
 'Dec 2005 Count (of suburb group)',
 'Dec 2005 Median',
 'Mar 2006 Count (of suburb group)',
 'Mar 2006 Median',
 'Jun 2006 Count (of suburb group)',
 'Jun 2006 Median',
 'Sep 2006 Count (of suburb group)',
 'Sep 2006 Median',
 'Dec 2006 Count (of suburb group)',
 'Dec 2006 Median',
 'Mar 2007 Count (of suburb group)',
 'Mar 2007 Median',
 'Jun 2007 Count (of suburb group)',
 'Jun 2007 Median',
 'Sep 2007 Count (of suburb group)',
 'Sep 2007 Median',
 'Dec 2007 Count (of suburb group)',
 'Dec 2007 Median',
 'Mar 2008 Count (of suburb group)',
 'Mar 2008 Median',
 'Jun 2008 Count (of suburb group)',
 'Jun 2008 Median',
 'Sep 2008 Count (of suburb group)',
 'Sep 2008 Median',
 'Dec 2008 Count (of suburb group)',
 'Dec 2008 Median',
 'Mar 2009 Count (of suburb group)',
 'Mar 2009 Median',
 'Jun 2009 Count (of suburb group)',
 'Jun 2009 Median',
 'Sep 2009 Count (of suburb group)',
 'Sep 2009 Median',
 'Dec 2009 Count (of suburb group)',
 'Dec 2009 Median',
 'Mar 2010 Count (of suburb group)',
 'Mar 2010 Median',
 'Jun 2010 Count (of suburb group)',
 'Jun 2010 Median',
 'Sep 2010 Count (of suburb group)',
 'Sep 2010 Median',
 'Dec 2010 Count (of suburb group)',
 'Dec 2010 Median',
 'Mar 2011 Count (of suburb group)',
 'Mar 2011 Median',
 'Jun 2011 Count (of suburb group)',
 'Jun 2011 Median',
 'Sep 2011 Count (of suburb group)',
 'Sep 2011 Median',
 'Dec 2011 Count (of suburb group)',
 'Dec 2011 Median',
 'Mar 2012 Count (of suburb group)',
 'Mar 2012 Median',
 'Jun 2012 Count (of suburb group)',
 'Jun 2012 Median',
 'Sep 2012 Count (of suburb group)',
 'Sep 2012 Median',
 'Dec 2012 Count (of suburb group)',
 'Dec 2012 Median',
 'Mar 2013 Count (of suburb group)',
 'Mar 2013 Median',
 'Jun 2013 Count (of suburb group)',
 'Jun 2013 Median',
 'Sep 2013 Count (of suburb group)',
 'Sep 2013 Median',
 'Dec 2013 Count (of suburb group)',
 'Dec 2013 Median',
 'Mar 2014 Count (of suburb group)',
 'Mar 2014 Median',
 'Jun 2014 Count (of suburb group)',
 'Jun 2014 Median',
 'Sep 2014 Count (of suburb group)',
 'Sep 2014 Median',
 'Dec 2014 Count (of suburb group)',
 'Dec 2014 Median',
 'Mar 2015 Count (of suburb group)',
 'Mar 2015 Median',
 'Jun 2015 Count (of suburb group)',
 'Jun 2015 Median',
 'Sep 2015 Count (of suburb group)',
 'Sep 2015 Median',
 'Dec 2015 Count (of suburb group)',
 'Dec 2015 Median',
 'Mar 2016 Count (of suburb group)',
 'Mar 2016 Median',
 'Jun 2016 Count (of suburb group)',
 'Jun 2016 Median',
 'Sep 2016 Count (of suburb group)',
 'Sep 2016 Median',
 'Dec 2016 Count (of suburb group)',
 'Dec 2016 Median',
 'Mar 2017 Count (of suburb group)',
 'Mar 2017 Median',
 'Jun 2017 Count (of suburb group)',
 'Jun 2017 Median',
 'Sep 2017 Count (of suburb group)',
 'Sep 2017 Median',
 'Dec 2017 Count (of suburb group)',
 'Dec 2017 Median',
 'Mar 2018 Count (of suburb group)',
 'Mar 2018 Median',
 'Jun 2018 Count (of suburb group)',
 'Jun 2018 Median',
 'Sep 2018 Count (of suburb group)',
 'Sep 2018 Median',
 'Dec 2018 Count (of suburb group)',
 'Dec 2018 Median',
 'Mar 2019 Count (of suburb group)',
 'Mar 2019 Median',
 'Jun 2019 Count (of suburb group)',
 'Jun 2019 Median',
 'Sep 2019 Count (of suburb group)',
 'Sep 2019 Median',
 'Dec 2019 Count (of suburb group)',
 'Dec 2019 Median',
 'Mar 2020 Count (of suburb group)',
 'Mar 2020 Median',
 'Jun 2020 Count (of suburb group)',
 'Jun 2020 Median',
 'Sep 2020 Count (of suburb group)',
 'Sep 2020 Median',
 'Dec 2020 Count (of suburb group)',
 'Dec 2020 Median',
 'Mar 2021 Count (of suburb group)',
 'Mar 2021 Median',
 'Jun 2021 Count (of suburb group)',
 'Jun 2021 Median',
 'Sep 2021 Count (of suburb group)',
 'Sep 2021 Median',
 'Dec 2021 Count (of suburb group)',
 'Dec 2021 Median',
 'Mar 2022 Count (of suburb group)',
 'Mar 2022 Median',
 'Jun 2022 Count (of suburb group)',
 'Jun 2022 Median',
 'Sep 2022 Count (of suburb group)',
 'Sep 2022 Median',
 'Dec 2022 Count (of suburb group)',
 'Dec 2022 Median',
 'Mar 2023 Count (of suburb group)',
 'Mar 2023 Median',
 'Jun 2023 Count (of suburb group)',
 'Jun 2023 Median',
 'Sep 2023 Count (of suburb group)',
 'Sep 2023 Median',
 'Dec 2023 Count (of suburb group)',
 'Dec 2023 Median',
 'Mar 2024 Count (of suburb group)',
 'Mar 2024 Median'
]
all_suburbs_rent_df = all_suburbs_rent_df[new_col_order]
all_suburbs_rent_df.head(3)

1,Suburb Cluster,SAL_CODE_2021,SAL suburb,Suburb Group,Mar 2000 Count (of suburb group),Mar 2000 Median,Jun 2000 Count (of suburb group),Jun 2000 Median,Sep 2000 Count (of suburb group),Sep 2000 Median,...,Mar 2023 Count (of suburb group),Mar 2023 Median,Jun 2023 Count (of suburb group),Jun 2023 Median,Sep 2023 Count (of suburb group),Sep 2023 Median,Dec 2023 Count (of suburb group),Dec 2023 Median,Mar 2024 Count (of suburb group),Mar 2024 Median
0,Inner Melbourne,20018,Albert Park,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,...,796,545,740,550,730,600,720,600,671,650
0,Inner Melbourne,21677,Middle Park,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,...,796,545,740,550,730,600,720,600,671,650
0,Inner Melbourne,22345,St Kilda West,Albert Park-Middle Park-West St Kilda,1143,260,1134,260,1177,270,...,796,545,740,550,730,600,720,600,671,650


In [258]:
# output the dataset
all_suburbs_rent_df.to_csv("../../data/curated/historical_rent_cleaned.csv", index=False)