### Session set up

Import required modules

In [1]:
import pandas as pd
import re

### Read in and format data

Read in data from [Brisbane City Council](https://www.brisbane.qld.gov.au/clean-and-green/rubbish-tips-and-bins/rubbish-collections/kerbside-large-item-collection-service) (kerbside collection 2025-26, data from BCC open data), [QGSO](https://www.qgso.qld.gov.au/geographies-maps/concordances/place-names-concordance) (Place Names Concordance 2022, used under CC-BY), and [ABS](https://www.abs.gov.au/statistics/people/people-and-communities/socio-economic-indexes-areas-seifa-australia/2021) (SEIFA SA2 Indexes, used under CC-BY). Index of Relative Socio-economic Advantage and Disadvantage used as interested in areas that have high access to material and social resources.

In [2]:
kerbside = pd.read_csv("../data/kerbside-large-item-collection-schedule 2025-26.csv", usecols=["Suburb", "Date of Collection"])
place_concord = pd.read_excel("../data/place-names-concordance-2022-edn.xlsx", sheet_name = "Place Names Concordance 2022")
seifa_21 = pd.read_excel("../data/Statistical Area Level 2, Indexes, SEIFA 2021.xlsx", sheet_name = "Table 3", skiprows = 4)

Check data read in as expected 

In [3]:
kerbside.head()

Unnamed: 0,Suburb,Date of Collection
0,ALGESTER,2025-07-14
1,CALAMVALE,2025-07-14
2,PARKINSON,2025-07-14
3,TARINGA,2025-07-21
4,AUCHENFLOWER,2025-07-21


In [4]:
place_concord.head()

Unnamed: 0,Place name (2021),Alternative place name (2021),Place type (2021),Place name longitude (GDA2020),Place name latitude (GDA2020),Suburb (2022),Postcode (2019),LGA code (2021),LGA name (2021),SA1 code (2016),...,CED name (2019),ILOC code (2021),ILOC name (2021),IARE code (2021),IARE name (2021),IREG code (2021),IREG name (2021),UCL code (2021),UCL name (2021),HHS name (2015)
0,A Creek,,STRM,151.584442,-24.648333,Gindoran,4676,33360,Gladstone (R),30805153006,...,Flynn,30500602,Gladstone - South Coast,305006,Gladstone,305,Rockhampton,331001,Remainder of State/Territory (Qld),Wide Bay
1,A Creek,,STRM,151.2,-25.5,Coonambula,4626,35760,North Burnett (R),31902150304,...,Flynn,30500803,North Burnett - Rural,305008,North Burnett,305,Rockhampton,331001,Remainder of State/Territory (Qld),Wide Bay
2,A Flat Creek,,STRM,152.317146,-26.374427,Manumbar,4601,33620,Gympie (R),31903151506,...,Wide Bay,30601101,Nanango - Kilkivan,306011,Nanango - Kilkivan,306,Toowoomba - Roma,331001,Remainder of State/Territory (Qld),Sunshine Coast
3,A W Creek,,STRM,142.94972,-13.00361,Lockhart River,4892,34570,Lockhart River (S),31501139615,...,Leichhardt,30300601,Lockhart River,303006,Lockhart River,303,Cape York,331001,Remainder of State/Territory (Qld),Torres and Cape
4,Aarons Folly Gully,,STRM,148.66667,-21.8,Strathfield,4742,33980,Isaac (R),31201133807,...,Capricornia,30500703,Nebo - Clermont,305007,Nebo - Clermont,305,Rockhampton,331001,Remainder of State/Territory (Qld),Mackay


Update names that will be on output

In [3]:
place_concord.rename(columns={"Place name (2021)": "suburb"
                             , "SA2 code (2021)": "sa2_21_code"}
                    , inplace=True)

In [None]:
kerbside.rename(columns={"Suburb": "suburb"
                         , "Date of Collection": "collection_date"}
                , inplace=True)

In [7]:
seifa_21.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Ranking within Australia,Unnamed: 6,Unnamed: 7,Unnamed: 8,Ranking within State or Territory,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15
0,2021 Statistical Area Level 2 (SA2) 9-Digit Code,2021 Statistical Area Level 2 (SA2) Name,Usual Resident Population,Score,,Rank,Decile,Percentile,,State,Rank,Decile,Percentile,Minimum score for SA1s in area,Maximum score for SA1s in area,% Usual Resident Population without an SA1 lev...
1,101021007,Braidwood,4343,1000.677063,,1219,6,52,,NSW,306,5,49,948.45379,1072.300317,0
2,101021008,Karabar,8517,982.313373,,1029,5,44,,NSW,274,5,44,752.918555,1115.451518,0
3,101021009,Queanbeyan,11342,998.123224,,1193,6,51,,NSW,301,5,49,926.97065,1080.489718,0.001587
4,101021010,Queanbeyan - East,5085,1014.994386,,1357,6,58,,NSW,341,6,55,927.615874,1136.308168,0.006686


Make required updates to data for easier referencing. Will only rename columns to be kept in code.

In [5]:
seifa_21.rename(
    columns={"Unnamed: 0": "sa2_21_code"
            , "Unnamed: 1": "sa2_21_name"
            , "Ranking within State or Territory": "state"
            , "Unnamed: 10": "state_rank"
            , "Unnamed: 11": "state_decile"
            , "Unnamed: 12": "state_percentile"
            , "Unnamed: 13": "min_sa1_score"
            , "Unnamed: 14": "max_sa1_score"}
    , inplace=True
)
seifa_21.dropna(subset=["sa2_21_name"], how = "all", inplace=True)

Confirm through visual inspection that the desired columns and names have been updated.

In [9]:
seifa_21.tail()

Unnamed: 0,sa2_21_code,sa2_21_name,Unnamed: 2,Unnamed: 3,Unnamed: 4,Ranking within Australia,Unnamed: 6,Unnamed: 7,Unnamed: 8,state,state_rank,state_decile,state_percentile,min_sa1_score,max_sa1_score,Unnamed: 15
2349,801111141,Namadgi,63,932.283577,,527,3,23,,ACT,2.0,1.0,2.0,932.283577,932.283577,0.0
2350,901011001,Christmas Island,1692,972.374332,,934,4,40,,OT,,,,952.023988,1011.096241,0.199173
2351,901021002,Cocos (Keeling) Islands,593,903.126462,,288,2,13,,OT,,,,842.198729,1082.17939,0.023609
2352,901031003,Jervis Bay,310,905.141211,,303,2,13,,OT,,,,783.240038,1064.184148,0.035484
2353,901041004,Norfolk Island,2188,957.778736,,799,4,34,,OT,,,,918.656647,990.531554,0.003199


Reduce SEIFA information to just QLD for more reliable matching as there are duplicate localities within and outside QLD. Also confirm that data filtered correctly by checking that the first digit of SA2 is 3 (ABS state code for QLD). 

In [6]:
seifa_21 = seifa_21[seifa_21["state"] == "QLD"]
all((seifa_21["sa2_21_code"] >= 300000000) & (seifa_21["sa2_21_code"] < 400000000))

True

Reduce place name concordance to just Brisbane localities. The SA2 code on the file is tied to the place name column. Place name will be used to add on the SA2 code for the scraped Kerbside list.

In [7]:
place_concord = place_concord[place_concord["LGA code (2021)"] == 31000]

### Clean data

There were some suburbs that had malformed coordinates that spanned over to the next row. Inspect and remove if no other information on that line.

In [8]:
kerbside["suburb"] = kerbside["suburb"].str.title()

Identify if there are any localities that don't share the same name. Want to match kerbside suburbs against the SEIFA deciles based on locality name.

In [9]:
locality_check = kerbside["suburb"].isin(place_concord["suburb"])
kerbside[~locality_check]

Unnamed: 0,suburb,collection_date
9,Chuwar,2025-07-28
39,Macgregor,2025-09-01
70,Albion,2025-10-27
79,Ascot,2025-11-03
90,Mcdowall,2025-12-01
108,The Gap,2026-02-02
120,Red Hill,2026-02-16
147,West End,2026-04-20
172,Mackenzie,2026-06-08


Identify why these names don't match to the list of Kerbside collection suburbs

In [10]:
missing_localities = place_concord["suburb"].str.contains("chuwar|macgregor|albion|ascot|mcdowall|gap|red hill|west end|mackenzie", flags=re.IGNORECASE)
place_concord[missing_localities]

Unnamed: 0,suburb,Alternative place name (2021),Place type (2021),Place name longitude (GDA2020),Place name latitude (GDA2020),Suburb (2022),Postcode (2019),LGA code (2021),LGA name (2021),SA1 code (2016),...,CED name (2019),ILOC code (2021),ILOC name (2021),IARE code (2021),IARE name (2021),IREG code (2021),IREG name (2021),UCL code (2021),UCL name (2021),HHS name (2015)
210,Albion (Brisbane City),,SUB,153.04417,-27.43361,Albion,4010,31000,Brisbane (C),30503111913,...,Brisbane,30100203,Brisbane City - Inner North-West,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
787,Ascot (Brisbane City),,SUB,153.06389,-27.42972,Ascot,4007,31000,Brisbane (C),30503112112,...,Brisbane,30100203,Brisbane City - Inner North-West,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
6656,Chuwar (Brisbane City),,LOCB,152.77278,-27.54917,Chuwar,4306,31000,Brisbane (C),31003128811,...,Blair,30100209,Brisbane City - Outer West,301002,Brisbane City,301,Brisbane,331001,Remainder of State/Territory (Qld),West Moreton
12238,Gap Creek,,STRM,152.915584,-27.497826,Brookfield,4069,31000,Brisbane (C),30402108714,...,Ryan,30100209,Brisbane City - Outer West,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
18735,MacGregor,,SUB,153.07583,-27.565,MacGregor,4109,31000,Brisbane (C),30303106110,...,Moreton,30100208,Brisbane City - Outer South,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro South
18766,Mackenzie (Brisbane City),,SUB,153.13028,-27.53583,Mackenzie,4156,31000,Brisbane (C),30303106401,...,Bonner,30100212,Brisbane City - Outer East,301002,Brisbane City,301,Brisbane,331001,Remainder of State/Territory (Qld),Metro South
19529,McDowall,,SUB,152.99389,-27.37889,McDowall,4053,31000,Brisbane (C),30201102610,...,Lilley,30100207,Brisbane City - Outer North,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
27359,Red Hill (Brisbane City),,SUB,153.00389,-27.45111,Red Hill,4059,31000,Brisbane (C),30504113609,...,Brisbane,30100202,Brisbane City - Inner North,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
32747,The Gap (Brisbane City),,SUB,152.94444,-27.44167,The Gap,4061,31000,Brisbane (C),30404110309,...,Ryan,30100207,Brisbane City - Outer North,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro North
35608,West End (Brisbane City),,SUB,153.00667,-27.48306,West End,4101,31000,Brisbane (C),30501111208,...,Griffith,30100204,Brisbane City - Inner South,301002,Brisbane City,301,Brisbane,301001,Brisbane,Metro South


Remove text in brackets so the place names match to the Kerbside localities.
MacGregor and McDowall will need to be updated to all lower case to match the kerbside file.

In [14]:
place_concord["suburb"] = place_concord["suburb"].str.replace("\\(.+\\)", "", regex=True)
#place_concord["suburb"] = place_concord["suburb"].str.replace(r"\bMt\b", "Mount", regex=True)
place_concord.loc[place_concord["suburb"] == "MacGregor", "suburb"] = "Macgregor"
place_concord.loc[place_concord["suburb"] == "McDowall", "suburb"] = "Mcdowall"
place_concord["suburb"] = place_concord["suburb"].str.strip()

Repeat check to confirm that place names were updated correctly

In [18]:
locality_check = kerbside["suburb"].isin(place_concord["suburb"])
kerbside[~locality_check]

Unnamed: 0,suburb,collection_date


Combine kerbside locality information with place concordance locality information to get the SA2 code for each area. SA2 code required for SEIFA decile. Left join used as all suburbs in the kerbside list should be kept for completion.

In [19]:
kerbside_sa2 = kerbside.merge(place_concord, on="suburb", how="left")


Confirm that merge was successful

In [21]:
print(kerbside_sa2.head(10))
print(kerbside_sa2.dtypes)
print(kerbside_sa2.shape)

           suburb collection_date Alternative place name (2021)  \
0        Algester      2025-07-14                           NaN   
1       Calamvale      2025-07-14                           NaN   
2       Parkinson      2025-07-14                           NaN   
3         Taringa      2025-07-21                           NaN   
4    Auchenflower      2025-07-21                           NaN   
5        St Lucia      2025-07-21                           NaN   
6          Milton      2025-07-21                           NaN   
7  Pinjarra Hills      2025-07-28                           NaN   
8      Bellbowrie      2025-07-28                           NaN   
9          Chuwar      2025-07-28                           NaN   

  Place type (2021)  Place name longitude (GDA2020)  \
0               SUB                       153.03361   
1               SUB                       153.04806   
2               SUB                       153.02917   
3               SUB                       

Check if any SA2 codes have been excluded from the SEIFA advantage/disadvantage index as SA2 will be used to combine the collection list with the SEIFA information.

In [23]:
chk = kerbside_sa2["sa2_21_code"].isin(seifa_21["sa2_21_code"])
print(kerbside_sa2[~chk][["suburb", "sa2_21_code"]])
print(seifa_21[seifa_21["sa2_21_name"].str.contains("Kholo|Lake Manchester|Pinkenba|Lytton")])

              suburb  sa2_21_code
11             Kholo    310021279
16   Lake Manchester    310021279
17   Lake Manchester    310021279
78          Pinkenba    302031036
142           Lytton    301031014
     sa2_21_code            sa2_21_name Unnamed: 2   Unnamed: 3  Unnamed: 4  \
1175   302031037  Eagle Farm - Pinkenba       2076  1087.427601         NaN   

     Ranking within Australia Unnamed: 6 Unnamed: 7  Unnamed: 8 state  \
1175                     1965          9         84         NaN   QLD   

     state_rank state_decile state_percentile min_sa1_score max_sa1_score  \
1175        475            9               89    934.991758    1118.57369   

     Unnamed: 15  
1175           0  


Checking against the ABS community profiles supports updating the Kholo and Pinkenba to populated neighbouring SA2s

In [24]:
kerbside_sa2.loc[kerbside_sa2["suburb"] == "Pinkenba", "sa2_21_code"] = 302031037 #Eagle Farm - Pinkenba code
kerbside_sa2.loc[kerbside_sa2["suburb"] == "Kholo", "sa2_21_code"] = 310031290 #Karana Downs

Add the SEIFA deciles to the kerbside dates. Left join used as all suburbs in the kerbside list should be kept for completion.

In [None]:
kerbside_sa2 = kerbside_sa2.merge(seifa_21, on="sa2_21_code", how="left")

print(kerbside_sa2.head(10))
print(kerbside_sa2.dtypes)
print(kerbside_sa2.shape)

Check for any duplicates introduced from the merge

In [26]:
possible_duplicates = kerbside_sa2.duplicated(
    ["sa2_21_code", "suburb", "state_rank"]
    , keep = False)
kerbside_sa2.loc[possible_duplicates, ["suburb", "Postcode (2019)", "state_rank", "sa2_21_code"]]
# All true duplicates so first or last pattern doesn't matter
kerbside_sa2.drop_duplicates(["sa2_21_code", "suburb", "state_rank"], keep="first", inplace=True)

Manually drop the duplicates that don't share the same SA2 name. Using a string distance measure on the suburb against the SA2 name might be a way to improve this.

In [27]:
kerbside_sa2["key"] = kerbside_sa2["suburb"] + "~" + kerbside_sa2["sa2_21_code"].astype("str")

kerbside_sa2.loc[kerbside_sa2["suburb"].duplicated(keep=False)
    , ["suburb", "Postcode (2019)", "state_rank", "sa2_21_name", "key"]]

Unnamed: 0,suburb,Postcode (2019),state_rank,sa2_21_name,key
18,Moggill,4069,530,Pinjarra Hills - Pullenvale,Moggill~304021091
19,Moggill,4070,482,Bellbowrie - Moggill,Moggill~304021086
43,Oxley,4074,215,Darra - Sumner,Oxley~310011271
44,Oxley,4075,436,Oxley (Qld),Oxley~310011275
91,Nundah,4017,423,Sandgate - Shorncliffe,Nundah~302041044
92,Nundah,4012,417,Nundah,Nundah~302031040
94,Kedron,4031,470,Kedron - Gordon Park,Kedron~302021031
95,Kedron,4032,429,Chermside West,Kedron~302021029
138,Bulimba,4152,519,Camp Hill,Bulimba~303011047
139,Bulimba,4171,527,Bulimba,Bulimba~305021114


In [28]:
keys_remove = pd.Series(["Moggill~304021091", "Oxley~310011271", "Nundah~302041044" 
                         , "Kedron~302021029", "Bulimba~303011047", "Tingalpa~301011002", "South Brisbane~305011107"
                         , "Yeerongpilly~303061078"])
kerbside_sa2 = kerbside_sa2.loc[~(kerbside_sa2["key"].isin(keys_remove))]

### User requirements and export data

Beneficial to sort by date to know what suburbs are coming up

In [30]:
kerbside_sa2["collection_date"] = pd.to_datetime(kerbside_sa2["collection_date"], format = "%Y-%m-%d")
clean_kerb = kerbside_sa2.sort_values("collection_date")

Export as csv for quick sharing. In addition to state SA2 SEIFA, have kept the minimum and maximum SA1 scores to reflect the variability in suburbs level of advantage or disadvantage.

In [31]:
clean_kerb = clean_kerb[["suburb", "collection_date", "state_rank", "state_decile", "state_percentile", "min_sa1_score", "max_sa1_score"]]
clean_kerb.to_csv(
    "../data/kerbside collection 2025-26 with state seifa 2021 decile and rank.csv"
    , index=False
)