# The following code is to download external data

Census Dataset can be found here:
- https://www.abs.gov.au/census/find-census-data/datapacks?release=2021&product=GCP&geography=POA&header=S

In [1]:
import requests, zipfile
from urllib.request import urlretrieve
from io import BytesIO
import wget
import os
import pandas as pd
import geopandas as gpd

In [8]:
def mkdir(dir):
    if not os.path.exists(dir):
        os.makedirs(dir)

# Census data

### General Community Profile Datapack

In [9]:
output_dir = '../data/raw/census/'

# check if it exists as it makedir will raise an error if it does exist
mkdir(output_dir)

In [10]:
# Download and extract zip file
response = requests.get("https://www.abs.gov.au/census/find-census-data/datapacks/download/2021_GCP_POA_for_AUS_short-header.zip")
file = zipfile.ZipFile(BytesIO(response.content))
file.extractall(output_dir)

In [11]:

# Reads in specified tables in table_codes and joins on post code
def merge_tables(table_names, index):   
    input_dir = '../data/raw/census/2021 Census GCP Postal Areas for AUS' + '/'
    if index == len(table_names) - 1:
        return pd.read_csv(input_dir + table_names[index], index_col = False)
    else:
        return pd.read_csv(
            input_dir + table_names[index], index_col = False
        ).merge(
            merge_tables(table_names, index + 1),
            on = 'POA_CODE_2021'
        )

def get_census_df(table_names):
    df = merge_tables(table_names, 0)
    df.columns = df.columns.str.lower()
    df['postcode'] = df['poa_code_2021'].apply(lambda x : x[-4:])
    return df.drop('poa_code_2021', axis = 1)


In [12]:
age_df = get_census_df([f'2021Census_{code}_AUST_POA.csv' for code in ['G04A', 'G04B']])
age_df.to_csv('../data/curated/census/age_data.csv', index = False)

### ABS Postal Areas

In [None]:
output_dir = '../data/raw/postcodes/'
mkdir(output_dir)
# Download and extract zip file
response = requests.get('https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/POA_2021_AUST_GDA2020_SHP.zip')
with open(output_dir + 'abs_postal_areas.zip', 'wb') as f:
    f.write(response.content)

### ABS Postal Area analysis

Only 83% of postcodes in the dataset appear as abs postal areas. This is because australian postcodes are managed by Auspost and do not necessarily correspond to ABS meshblocks (the smallest abs geography that is aggregated to form larger abs geographies). Auspost does not make publicly available postcode geographies. 

Some official postcodes are not included in Postal Areas. This occurs when a Mesh Block cannot be allocated to a postcode. There are two situations where this occurs:
- a Mesh Block covers more than one whole postcode, and the Mesh Block can be allocated to only one postcode
more than one Mesh Block partly covers a postcode, but all the Mesh Blocks are allocated to other postcodes, based on population.
- Postal Areas exclude postcodes that are not street delivery areas. These include post office boxes, mail back competitions, large volume receivers and specialist delivery postcodes. These postcodes are only valid for postal addresses and are not a valid location for population data.

There are opensource datasets that map postcodes to LGAs and other ABS statistical areas (the smallest being LGA) but these only give the statistical areas that the postcode intersects with, which is usually more than one. Therefore for geographical analysis we will use only abs postal areas. In doing this we compromise the size of usable of data for accuracy.

In [None]:
consumer_details_df = pd.read_csv('../data/tables/tbl_consumer.csv', delimiter="|")
sum(consumer_details_df['postcode'].isin(age_df['postcode']))/len(consumer_details_df)

### Opensource Postcode Data

In [13]:
output_dir = '../data/raw/postcodes/'
mkdir(output_dir)
response = requests.get('https://www.matthewproctor.com/Content/postcodes/australian_postcodes.csv')

with open(output_dir + 'postcodes.csv', 'wb') as f:
    f.write(response.content)