# Planning Permits Analysis

This analysis uses data from https://discover.data.vic.gov.au/dataset?q=planning%20permits.


'The Victorian Building Authority (VBA) collects information from building surveyors on the number, value and type of building permits issued each month as part of its functions under the Building Act 1993'
 
Firstly there is also a building permit activity monthly summary dataset which has been updated to April 2021 (https://discover.data.vic.gov.au/dataset/building-permit-activity-monthly-summaries). This is an aggregated dataset which includes data visualisations which track building use, costs, 


The summary dataset aggregates data from separate annual datesets which run from to 2020 (https://discover.data.vic.gov.au/dataset/building-permit-activity-data-2020). ** up to 2021 on building vic autority (VBA) site ** Within these datasets, each record or row represents a single permit.

These annual datasets include over 40 pieces of information per record, such as details of what is to be built (or demolished),the intended use of the building, the ownership sector, and the building costs. 

In addition, the location of the building can be viewed down to the street name level, with postal codes, suburbs and regions also included.

A comprehensive data dictionary for the building permit datsets can be found on the VBA site here (https://www.vba.vic.gov.au/about/data), as well as a detailed data quality statement PDF which includes clear summaries about what the data represents.

In [1]:
import numpy as np
import pandas as pd
import requests
import json
import geopandas as gpd
import os
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable


In [2]:
# VBA DataVic data, hosted on VBA website
dataset_urls = {
    '2021':'https://www.vba.vic.gov.au/__data/assets/excel_doc/0004/143572/VBA-DataVic-Building-Permits-2021.xlsx',
    '2020':'https://www.vba.vic.gov.au/__data/assets/file/0012/110028/VBA-DataVic-Building-Permits-2020.xlsb',
    '2019':'https://www.vba.vic.gov.au/__data/assets/file/0015/103515/VBA-DataVic-Building-Permits-2019.xlsb'
}

dfs = {}

for dataset in dataset_urls:
    if dataset_urls[dataset].endswith('xlsb'):
        # to read xlsb, you need to install pyxlsb using pip at command prompt (pip install pyxlsb)
        dfs[dataset] = pd.read_excel(dataset_urls[dataset],sheet_name=1,engine='pyxlsb') 
    else:
        dfs[dataset] = pd.read_excel(dataset_urls[dataset],sheet_name=1) 

KeyboardInterrupt: 

In [None]:
# Do the datasets contain the same number of variables?
for year in dfs:
    print(f'{year}: {len(dfs[year].columns)}')
    
len(dfs['2019'].columns)== len(dfs['2020'].columns) == len(dfs['2021'].columns)

In [None]:
# The 2019 dataset has more variables than the subsequent years; let's come back to 2019 

# First, let's check that the 37 variables in the 2020 and 2021 datasets are the same

for i,assertion in enumerate(dfs['2020'].columns == dfs['2021'].columns):
    if assertion == False:
        print(f"Year  Mis-matched column")
        for year in ['2020','2021']:
            print(f'{year}: {dfs[year].columns[i]}')


In [None]:
# The column 'BASIS_BCA' is mis-spelt in the 2020 dataset; let's correct that
dfs['2020'].rename(columns={'BASIS_ BCA':'BASIS_BCA'},inplace=True)

In [None]:
# let's confirm that's fixed now:
for i,assertion in enumerate(dfs['2020'].columns == dfs['2021'].columns):
    if assertion == False:
        print(f"Year  Mis-matched column")
        for year in ['2020','2021']:
            print(f'{year}: {dfs[year].columns[i]}')

# yes, all good!

In [None]:
# Let's look at the 2019 and 2020 variables
print(dfs['2019'].columns)
print(dfs['2020'].columns)


The 2019 variables are in a different order, and spelt differently.

To determine how to proceed, lets compare

- The VBA data dictonary (last modified 2015, at time of writing)
- The 2019 dataset columns
- The consolidated 2020/21 columns (following space correction of BASIS_BCA variable name, above)
- A combined proposed plain text variable name without special characters 

| ID | Data dictionary        |          2019          |               2020/21               | Proposed variable name        |
|----|------------------------|:----------------------:|:-----------------------------------:|-------------------------------|
| 1  | permit_stage_number    | permit_stage_number    | permit_stage_number                 | Permit Stage Number           |
| 2  | permit_date            | permit_date            | permit_date                         | Permit Date                   |
| 3  | BASIS_Month_Y          | BASIS_Month_Y          | BASIS_Month_Y                       | Year                          |
| 4  | BASIS_Month_M          | BASIS_Month_M          | BASIS_Month_M                       | Month                         |
| 5  | Reported_Levy_amount   | Reported_Levy_amount   |                                     | Reported Levy Amount          |
| 6  | Calculated_Levy_amount | Calculated_Levy_amount |                                     | Calculated Levy Amount        |
| 7  |                        |                        | Original_Levy_Paid__c               | Original Levy Paid            |
| 8  | Reported_Cost_of_works | Reported_Cost_of_works | Reported_Cost_of_works              | Reported Cost Of Works        |
| 9  | Site_street            | Site_street            | site_street_name__c                 | Site Street                   |
| 10 | Site_suburb            | Site_suburb            | site_town_suburb__c                 | Site Suburb                   |
| 11 | site_pcode             | site_pcode             | site_postcode__c                    | Site Postcode                 |
| 12 | Municipal name         | Municipal Name         | Site_Municipality                   | Municipal Name                |
| 13 | Municipal full name    | Municipal Full Name    | Municipal Full Name                 | Municipal Full Name           |
| 14 | Region                 | Region                 | Region                              | Region                        |
| 15 | Sub_Region             | Sub_Region             | Sub_Region                          | Sub Region                    |
| 16 | Sub_Region1            | Sub_Region1            | Sub_Region1                         | Sub Region1                   |
| 17 | Allotment_Area         | Allotment_Area         | Allotment_Area__c                   | Allotment Area                |
| 18 | Builder_suburb         | Builder_suburb         | Builder_Town_Suburb__c              | Builder Suburb                |
| 19 | Builder_state          | Builder_state          | Builder_State__c                    | Builder State                 |
| 20 | Builder_pcode          | Builder_pcode          | Builder_Postcode__c                 | Builder Postcode              |
| 21 | Material_Code_Floor    | Material_Code_Floor    | Floor_Material__c                   | Material Code Floor           |
| 22 | Material_Code_Frame    | Material_Code_Frame    | Frame_Material__c                   | Material Code Frame           |
| 23 | Material_Code_Roof     | Material_Code_Roof     | Roof_Cladding_Material__c           | Material Code Roof            |
| 24 | Material_Code_Walls    | Material_Code_Walls    | External_Wall_Material__c           | Material Code Walls           |
| 25 | dwellings_before_work  | dwellings_before_work  | Number_of_Existing_Dwellings__c     | Existing Dwellings            |
| 26 | dwellings_after_work   | dwellings_after_work   | Number_of_New_Dwellings__c          | New Dwellings                 |
| 27 | Number_of_storeys      | Number_of_storeys      | Number_of_Storeys__c                | Storeys                       |
| 28 | number_demolished      | number_demolished      | Number_of_Dwellings_Demolished__c   | Dwellings Demolished          |
| 29 | Floor_area             | Floor_area             | Total_Floor_Area__c                 | Floor Area                    |
| 30 | Multiple_Dwellings     | Multiple_Dwellings     |                                     | Multiple Dwellings            |
| 31 | cost_of_works_domestic | cost_of_works_domestic |                                     | Cost Of Works Domestic        |
| 32 | Permit_app_date        | Permit_app_date        | Building_Permit_Application_Date__c | Permit Application Date       |
| 33 | BACV_applicable_flag   | BACV_applicable_flag   |                                     | BACV Applicable Flag          |
| 34 | Calculated_levy_BACV   | Calculated_levy_BACV   |                                     | Calculated Levy BACV          |
| 35 |                        |                        | DBDRV Levy                          | DBDRV Levy                    |
| 36 | solar_hot_water        | solar_hot_water        | Solar_Hot_Water_Indicator__c        | Solar Hot Water               |
| 37 | rainwater_tank         | rainwater_tank         | Rainwater_Tank_Indicator__c         | Rainwater Tank                |
| 38 | est_cost_project       | est_cost_project       | Total_Estimated_Cost_of_Works__c    | Total Estimated Cost of Works |
| 39 | BASIS_Zone             | BASIS_Building_Use     | BASIS_Building_Use                  | BASIS Building Use            |
| 40 | BASIS_NOW              | BASIS_NOW              | BASIS_NOW                           | BASIS NOW                     |
| 41 | BASIS_BCA              | BASIS_BCA              | BASIS_BCA                           | BASIS BCA                     |
| 42 | BASIS_OwnershipSector  | BASIS_OwnershipSector  | BASIS_Ownership_Sector              | BASIS Ownership Sector        |
| 43 | BASIS_OwnerBuilder     | BASIS_OwnerBuilder     | BASIS_Owner_Builder                 | BASIS Owner Builder           |

Through this comparison it is apparent that,

- The 2019 dataset is mostly in accord with the data dictionary
- The data dictionary variable 'BASIS_ZONE' appears to have been framed as 'BASIS_NOW' in the 2019, 2020 and 2021 datasets
- Reported and calculated levy amounts are not recorded in 2020/21; there was an original levy paid variable instead
- Multiple dwellings and domestic cost of works was not recorded in 2020/21
- The BACV applicable flag and calculated levy were not recorded in 2020/21; there is however a 'DBDRV Levy'

Now we will ensure that each year shares the same variables and names, with missing values where these variables do not directly correspond, as per the proposed variable names in the table above.

In [None]:
# create columns which did not exist with null values
dfs['2019']['Original Levy Paid'] = np.nan
dfs['2019']['DBDRV Levy'] = np.nan
for year in ['2020','2021']:
    dfs[year]['Reported Levy Amount'] = np.nan
    dfs[year]['Calculated Levy Amount'] = np.nan
    dfs[year]['Multiple Dwellings'] = np.nan
    dfs[year]['Cost Of Works Domestic'] = np.nan
    dfs[year]['BACV Applicable Flag'] = np.nan
    dfs[year]['Calculated Levy BACV'] = np.nan

# rename columns
rename_2019_to_proposed = {'permit_stage_number':'Permit Stage Number','permit_date':'Permit Date','BASIS_Month_Y':'Year','BASIS_Month_M':'Month','Reported_Levy_amount':'Reported Levy Amount','Calculated_Levy_amount':'Calculated Levy Amount','Reported_Cost_of_works':'Reported Cost Of Works','Site_street':'Site Street','Site_suburb':'Site Suburb','site_pcode':'Site Postcode','Municipal Name':'Municipal Name','Municipal Full Name':'Municipal Full Name','Region':'Region','Sub_Region':'Sub Region','Sub_Region1':'Sub Region1','Allotment_Area':'Allotment Area','Builder_suburb':'Builder Suburb','Builder_state':'Builder State','Builder_pcode':'Builder Postcode','Material_Code_Floor':'Material Code Floor','Material_Code_Frame':'Material Code Frame','Material_Code_Roof':'Material Code Roof','Material_Code_Walls':'Material Code Walls','dwellings_before_work':'Existing Dwellings','dwellings_after_work':'New Dwellings','Number_of_storeys':'Storeys','number_demolished':'Dwellings Demolished','Floor_area':'Floor Area','Multiple_Dwellings':'Multiple Dwellings','cost_of_works_domestic':'Cost Of Works Domestic','Permit_app_date':'Permit Application Date','BACV_applicable_flag':'BACV Applicable Flag','Calculated_levy_BACV':'Calculated Levy BACV','solar_hot_water':'Solar Hot Water','rainwater_tank':'Rainwater Tank','est_cost_project':'Total Estimated Cost of Works','BASIS_Building_Use':'BASIS Building Use','BASIS_NOW':'BASIS NOW','BASIS_BCA':'BASIS BCA','BASIS_OwnershipSector':'BASIS Ownership Sector','BASIS_OwnerBuilder':'BASIS Owner Builder'}
rename_202x_to_proposed = {'permit_stage_number':'Permit Stage Number','permit_date':'Permit Date','BASIS_Month_Y':'Year','BASIS_Month_M':'Month','Original_Levy_Paid__c':'Original Levy Paid','Reported_Cost_of_works':'Reported Cost Of Works','site_street_name__c':'Site Street','site_town_suburb__c':'Site Suburb','site_postcode__c':'Site Postcode','Site_Municipality':'Municipal Name','Municipal Full Name':'Municipal Full Name','Region':'Region','Sub_Region':'Sub Region','Sub_Region1':'Sub Region1','Allotment_Area__c':'Allotment Area','Builder_Town_Suburb__c':'Builder Suburb','Builder_State__c':'Builder State','Builder_Postcode__c':'Builder Postcode','Floor_Material__c':'Material Code Floor','Frame_Material__c':'Material Code Frame','Roof_Cladding_Material__c':'Material Code Roof','External_Wall_Material__c':'Material Code Walls','Number_of_Existing_Dwellings__c':'Existing Dwellings','Number_of_New_Dwellings__c':'New Dwellings','Number_of_Storeys__c':'Storeys','Number_of_Dwellings_Demolished__c':'Dwellings Demolished','Total_Floor_Area__c':'Floor Area','Building_Permit_Application_Date__c':'Permit Application Date','DBDRV Levy':'DBDRV Levy','Solar_Hot_Water_Indicator__c':'Solar Hot Water','Rainwater_Tank_Indicator__c':'Rainwater Tank','Total_Estimated_Cost_of_Works__c':'Total Estimated Cost of Works','BASIS_Building_Use':'BASIS Building Use','BASIS_NOW':'BASIS NOW','BASIS_BCA':'BASIS BCA','BASIS_Ownership_Sector':'BASIS Ownership Sector','BASIS_Owner_Builder':'BASIS Owner Builder'}

dfs['2019'].rename(columns = rename_2019_to_proposed,inplace=True)
dfs['2020'].rename(columns = rename_202x_to_proposed,inplace=True)
dfs['2021'].rename(columns = rename_202x_to_proposed,inplace=True)

# order columns
columns = ['Permit Stage Number','Permit Date','Year','Month','Reported Levy Amount','Calculated Levy Amount','Original Levy Paid','Reported Cost Of Works','Site Street','Site Suburb','Site Postcode','Municipal Name','Municipal Full Name','Region','Sub Region','Sub Region1','Allotment Area','Builder Suburb','Builder State','Builder Postcode','Material Code Floor','Material Code Frame','Material Code Roof','Material Code Walls','Existing Dwellings','New Dwellings','Storeys','Dwellings Demolished','Floor Area','Multiple Dwellings','Cost Of Works Domestic','Permit Application Date','BACV Applicable Flag','Calculated Levy BACV','DBDRV Levy','Solar Hot Water','Rainwater Tank','Total Estimated Cost of Works','BASIS Building Use','BASIS NOW','BASIS BCA','BASIS Ownership Sector','BASIS Owner Builder']
for year in ['2019','2020','2021']:
    dfs[year] = dfs[year][columns]
    


In [None]:
# Confirm that the year variable correctly indexes each year, and year is not missing
# This is important for when we join these seperate datasets to ensure they can be 
# correctly distinguished
years = ['2019','2020','2021']
for year in ['2019','2020','2021']:
    print(f'\n{year}')
    print(dfs[year]['Year'].describe())
    print(f'Missing year: {dfs[year]["Year"].isna().sum()}')


In [None]:
# in the data (noted in exploratory analysis below), it looks like solar hot water and rain water indicators are missing for 2020/21
# looking at the data, its clear that this is because for these years they were coded not as 0/1 binary indicators
# (as per data dictionary), but instead as N/Y string indicators; this impacted concatenation of results for these variables.

# ie. if we map 'N' and 'Y' to corresponding integers, we see values for 2020/21

# so, let's fix these values

for var in ['Solar Hot Water','Rainwater Tank']:
    for year in ['2020','2021']:
        dfs[year][var] = dfs[year][var].map({'N':0,'Y':1})
        


In [None]:
# Combine the datasets!
df = pd.concat(dfs)

In [None]:
# Label factor variables
factor_variable_labels = {
'Permit Stage Number':{0:'no stages applicable',1:'stage 1',2:'stage 2'},
'Material Code Floor':{20:'Concrete or stone',40:'Timber',80:'Other'},
'Material Code Frame':{40:'Timber',60:'Steel',70:'Aluminium',80:'Other'},
'Material Code Roof':{10:'Tiles',20:'Concrete or slate',30:'Fibre cement',60:'Steel',70:'Aluminium',80:'Other'},
'Material Code Walls':{11:'Brick, double',12:'Brick, veneer',20:'Concrete or stone',30:'Fibre cement',40:'Timber',50:'Curtain glass',60:'Steel',70:'Aluminium',80:'Other'},
'Solar Hot Water':{0:'No',1:'Yes'},
'Rainwater Tank':{0:'No',1:'Yes',},
'BASIS NOW':{1:'New building',2:'Re-erection',3:'Extension',4:'Alteration',5:'Change of Use',6:'Demolition',7:'Removal',8:'Other'},
'BASIS Ownership Sector':{'P':'private','L':'Local Government','S':'State Government','C':'Commonwealth Government'},
'BASIS Owner Builder':{0:'registered builder',-1:'owner builder', 2:'owner builder registered',np.nan:'non-domestic'}
}

for var in factor_variable_labels:
    df[var] = df[var].map(factor_variable_labels[var])

In [None]:
# Describe the combined data:
df.describe().astype(np.int64).transpose()

In [None]:
for var in factor_variable_labels:
    print("")
    print((100*(pd.crosstab(df[var],
                      df['Year'],
                      margins=True,
                      margins_name='Total',
                     normalize='columns'))).round(1))
    

In [None]:
df.columns

# Spatial data - attempt 1, using AURIN API
Long story short, this approach was abandoned as it became apparent that suburb boundaries from the 2016 census time point were not appropriate due to rezoning for new suburbs.  

# Get ABS ASGS 2016 suburb data from AURIN open api https://aurin.org.au/resources/aurin-apis/
suburbs = 'https://openapi.aurin.org.au//public/wfs?request=getFeature&version=1.0.0&outputFormat=json&typename=aurin:datasource-AU_Govt_ABS-UoM_AURIN_DB_GeoLevel_ssc_2016_aust'
r = requests.get(suburbs)
data = r.json()


gdf = gpd.GeoDataFrame.from_features(data['features'])
gdf = gdf[gdf['state_name_2016']=='Victoria']
gdf.plot()

gdf.info()

gdf['Suburb'] = gdf.ssc_name_2016.str.rstrip(' (Vic.)')
gdf[['ssc_name_2016','Suburb']].head()

# the data isn't quite cleaned... there's a postcode here... but just one, so that's nice!
df['Suburb'] = df['Site Suburb'].str.rstrip(' VIC').str.title()
df[['Site Suburb','Suburb']].head()


df_merge = df.merge(gdf, on = ['Suburb'], how="left")

df_merge

# Lets enumerate how many did and did not successfully match up with the official suburb data!
df_merge['ssc_name_2016'].isna().value_counts()

# As a percentage

100*(df_merge['ssc_name_2016'].isna().value_counts())/len(df_merge)

Approximately 5% of suburbs were unable to be matched, most likely due to inconsistencies in spelling and format. 
For the purposes of this analysis, this discrepency is acceptable, however with manual cleaning or similarity mapping this discrepency could be lessened. 

# Let's look at the top unmatched suburbs.

# Looking at these, the top unmatched suburbs were gazetted after 2016 when this suburb data was current.
# Let's find some more current suburb data!

df_merge[df_merge['ssc_name_2016'].isna()]['Suburb'].sort_values().value_counts().head(50)

# Spatial data linkage using ASGS 2021 Suburbs and Localities data
Suburbs and localities data from the 2021 Australia Statistical Geography Standard release were retrieved via the Australian Bureau of Statistics ESRI MapServer API in JSON (ESRIJSON) format.  Note that for this to be read a version of fiona with the appropriate driver for reading ESRIJSON (ie. > 1.8.5) is required; this analysis was run using fiona 1.8.20.  The environment.yml file used to create the Conda environment used for this analysis will be made available.

suburbs = "https://geo.abs.gov.au/arcgis/rest/services/ASGS2021/SAL/MapServer/0/query?where=UPPER(STATE_NAME_2021)%20LIKE%20%27%25VICTORIA%25%27%20&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=7899&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=json"

# read in the suburbs
gdf = gpd.read_file(suburbs)

# confirm the spatial coordinate reference system
gdf.crs

# list the available columns
gdf.columns

# plot the 2021 Victorian suburb boundaries
gdf.plot()

In [None]:
# That does not look right!  (its incomplete)

# Spatial attempt 3

In [None]:
suburbs_url = "https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/SAL_2021_AUST_GDA2020_SHP.zip"
suburbs_shp_zip = 'ABS_ASGS_2021_SAL.shp.zip'
def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

if not os.path.exists(os.path.abspath(suburbs_shp_zip)):
    download_url(suburbs_url,suburbs_shp_zip)


In [None]:
# read in the suburbs
gdf = gpd.read_file(f"zip://{os.path.abspath(suburbs_shp_zip)}")
gdf = gdf[gdf['STE_NAME21']=='Victoria']
gdf.crs = {'init' :'epsg:7844'}
gdf

In [None]:
gdf.loc[:,'Suburb'] = gdf.SAL_NAME21.str.rstrip(' (Vic.)').copy()
gdf[['SAL_NAME21','Suburb']].head()


In [None]:
# Clean planning permits data suburbs (following a previous in depth look to inform this replacement strategy)
# # In order:
#   - remove leading and trailing white space if any (doesn't appear to be, but could have been)    
#   - remove a common unnecessary suffix that won't match
#   - ensure title case (mostly helpful, but causes some complciations, eg with 'McCrae', but deal with later)
df.loc[:,'Suburb'] = df['Site Suburb'] \
    .str.strip() \
    .str.rstrip(' VIC') \
    .str.title()

df[['Site Suburb','Suburb']].sort_values('Suburb').head(20)
# The building permits data isn't quite cleaned... there's a postcode here... but just one, so that's nice!  Also a few empties


In [None]:
# Ensure that the title case for "Mc" type suburbs includes uppercase following this (ie. McCrae, not Mccrae)
df.loc[df['Suburb'].str.startswith('Mc', na=False),'Suburb'] = df.loc[df['Suburb'].str.startswith('Mc', na=False),'Suburb'].apply(lambda x: f"{x[:2]}{x[2:].title()}")

df.loc[df['Suburb'].str.startswith('Mc', na=False),['Site Suburb','Suburb']]


In [None]:
# join the permits data with digital boundaries for suburbs and localities
gdf_suburbs = gdf[['SAL_NAME21','Suburb','geometry']].merge(df, on = ['Suburb'], how="right")
gdf_suburbs[['Suburb','geometry']].drop_duplicates().plot()


In [None]:
# Lets enumerate how many did and did not successfully match up with the official suburb data!
print(f"Suburbs not matched with geometry\n")
print(f"Counts:\n{gdf_suburbs['SAL_NAME21'].isna().value_counts()}\n")
print(f"Percentages:\n{100*(gdf_suburbs['SAL_NAME21'].isna().value_counts())/len(gdf_suburbs)}")
# About 1.4% not matched

In [None]:
# Let's look at the top unmatched suburbs -- many seem be due to ambiguities...
gdf_suburbs[gdf_suburbs['SAL_NAME21'].isna()]['Suburb'].sort_values().value_counts()

In [None]:
len(gdf_suburbs[gdf_suburbs['SAL_NAME21'].isna()]['Suburb'].unique())

In [None]:
df.loc[df['Suburb']=='Newtown',['Site Suburb', 'Site Postcode',
       'Municipal Name', 'Municipal Full Name', 'Region', 'Sub Region',
       'Sub Region1']].drop_duplicates()

In [None]:
# gdf_suburbs.loc[gdf_suburbs['SAL_NAME21'].isna()].apply(lambda x: f'{x["Suburb"]} ({x["Municipal Name"]} - Vic.)')

#gdf_suburbs[gdf_suburbs['SAL_NAME21'].isna()]['Suburb','Site Suburb', 'Site Postcode',
#       'Municipal Name', 'Municipal Full Name', 'Region', 'Sub Region',
#       'Sub Region1']

#df.reset_index(-2)

In [None]:
gdf[gdf.SAL_NAME21.str.startswith("Newtown")]

In [None]:
gdf[gdf.SAL_NAME21.str.startswith("Hillside")]

In [None]:
# Create a constant as a non NA utility variable to use for aggregation
gdf_suburbs['constant'] = 1
# Aggregate by Year and Suburb, getting counts of permits by suburb for each year, and dropping any NA values
suburb_year_counts = gdf[['SAL_NAME21','Suburb','geometry']].merge(gdf_suburbs.groupby(['Year','Suburb'])['constant'].sum().reset_index(), on = ['Suburb'], how="left")
suburb_year_counts.rename(columns={'constant':'Permits'},inplace=True)
#suburb_year_counts = suburb_year_counts.dropna()

In [None]:
# OPTIONAL: Display using geopandas
for year in suburb_year_counts.Year.unique():
    fig, ax = plt.subplots(1,1, figsize=(20,20))
    divider = make_axes_locatable(ax)
    cax = divider.append_axes("right", size="3%", pad=-1) #resize the colorbar
    suburb_year_counts.loc[suburb_year_counts['Year']==year].plot(
        column='Permits', 
        ax=ax,
        cax=cax,  
        legend=True,
        legend_kwds={'label': f"Planning permits {year}"})
    #suburb_year_counts.geometry.boundary.plot(color='#BABABA', ax=ax, linewidth=0.3) #Add some borders to the geometries
    ax.axis('off')

In [None]:
#fig = px.choropleth(suburb_year_counts,
#                   locations=suburb_year_counts.index,
#                   geojson=suburb_year_counts['geometry'].to_crs(epsg=4326).__geo_interface__,
#                   animation_frame=suburb_year_counts['Year'],
#                   color=suburb_year_counts["Permits"])
#fig.write_html('test.html')

In [None]:
## Set the data for the map
#data = go.Choropleth(
#        geojson = suburb_year_counts['geometry'].__geo_interface__,             #this is your GeoJSON
#        locations = suburb_year_counts.index,    #the index of this dataframe should align with the 'id' element in your geojson
#        z = df_merged.percent_unemployed, #sets the color value
#        text = df_merged.LGA_NAME20,    #sets text for each shape
#        colorbar=dict(thickness=20, ticklen=3, tickformat='%',outlinewidth=0), #adjusts the format of the colorbar
#        marker_line_width=1, marker_opacity=0.7, colorscale="Viridis", #adjust format of the plot
#        zmin=zmin, zmax=zmax,           #sets min and max of the colorbar
#        hovertemplate = "<b>%{text}</b><br>" +
#                    "%{z:.0%}<br>" +
#                    "<extra></extra>")  # sets the format of the text shown when you hover over each shape
#
## Set the layout for the map
#layout = go.Layout(
#    title = {'text': f"Population of Victoria, Australia",
#            'font': {'size':24}},       #format the plot title
#    mapbox1 = dict(
#        domain = {'x': [0, 1],'y': [0, 1]}, 
#        center = dict(lat=-36.5 , lon=145.5),
#        accesstoken = MAPBOX_ACCESSTOKEN, 
#        zoom = 6),                      
#    autosize=True,
#    height=650,
#    margin=dict(l=0, r=0, t=40, b=0))
#
## Generate the map
#fig=go.Figure(data=data, layout=layout)
#fig.show()

In [None]:
#fig

In [None]:
# project to meters (GDA94 VicGrid 94, EPSG 3111), then simplify to nearest meter, and reproject to lat lon for plotting
#suburb_year_counts.geometry