This project provides the Australian Bureau of Statistics Australian Statistical Geography Standard (ASGS) in a PostgreSQL database using the PostGIS extension to store the geospatial data.
This project provides the scripts to load the source data as provided by the ABS into PostgreSQL (building from source) but also provides a database dump of the loaded data (using the pre-built package).
The ASGS is published as a combination of non-spatial CSV files and spatial SHAPE files. Both of these files are used in this loader and combined together to form this (unofficial) ASGS PostgreSQL schema.
You may also be interested in the corresponding abs2pgsql loader scripts which will load the ABS 2011 Census of Population and Housing into PostgreSQL, making use of this ASGS schema to provide the geographic standard to those statistics.
About the ASGS
The ASGS consists of around 22 individual structures which are classified into ABS Structures and Non-ABS Structures as shown in this diagram.
The 2011 release of the ASGS consists of Volumes 1-5. Volume 5 has not yet been released and as such these scripts just process volumes 1-4.
- Volume 1 - 1270.0.55.001 consists of the Main Structure and Greater Capital City Statistical Areas (documentation)
- Volume 2 - 1270.0.55.002 consists of the Indigenous Structure (documentation)
- Volume 3 - 1270.0.55.003 consists of the Non-ABS Structures (documentation)
- Volume 4 - 1270.0.55.004 consists of the Significant Urban Areas and Section of State/Urban Centres and Localities Structure (SUA, SOS/UCL) (documentation)
- Volume 5 - 1270.0.55.005 consists of the Remoteness Structure
The ABS Structures (vol 1, 2, 4, 5) will not change until the next Census in 2016, however the Non-ABS Structures (vol 3) will be updated annually.
For now the scripts load all the volumes into the asgs_2011 schema, although in the future this will need a rethink to address the annual updates and 5 year census updates.
About the target schema
The target PostgreSQL schema which stores the ASGS aims to be true to and consistent with the standard, however there is still room for flexibility and there are many different ways you could structure the target schema.
Versioning and Stable Releases
The current schema isn't fixed and I'm happy to make modifications if there are valid reasons. You may consider the current state of the schema an RC release. It can and may change. I hope to release stable versions of this loader and the dumps with a release version, but as of yet no stable release has been made. If significant architectural changes are made to the schema I will ensure I at least tag the current state before moving direction.
PostGIS Topology support
One future direction is to use PostGIS Topology support. As all of the ASGS structures are built up from Mesh Blocks we could elegantly model the ASGS as PostGIS topologies built up as aggregates of the Mesh Blocks.
Primary key codes for the ASGS structures are generally made unique by concatenating the code with a the code of the parent structure which it was built from. For example the S/T structure is built up from the SA4 structure. That is S/T's are built up from one or more SA4's.
This means that the unique code for SA4's is only unique within its S/T, so to obtain a globally unique code for that SA4 you need to prepend the S/T code.
So for example SA1's are unique with respect to their 11 digit code. However that 11 digit code is made up of S/T . SA4 . SA4 . SA3 . SA2 . SA1.
Currently the schema will store the materialised 11 digit SA1 code. A potential modification to the schema could store only the 2 digit SA1 code and rely on an accompanying SA2 foreign key to provide the globally unique primary key.
The ABS ASGS data is Copyright (c) Commonwealth of Australia and as per http://www.abs.gov.au/websitedbs/D3310114.nsf/Home/©+Copyright?opendocument it is released under the Creative Commons Attribution 2.5 Australia license.
Thanks to the ABS for releasing this data under a free and open license.
All the files within this asgs2pgsql repository are released under the CC0 license by Andrew Harvey firstname.lastname@example.org. Although not required, I would prefer you give Attribution and release derived works or modifications under the same CC0 license.
To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work. http://creativecommons.org/publicdomain/zero/1.0/
Before You Start
Running these scripts is akin to building software from source. If you just want a copy of the database without needing to "build" it from source skip to the last section of this README.
gdal-bin (>= 1.7.0), libdbd-pg-perl,
postgresql-9.1-postgis | postgresql-9.3-postgis-2.1, libtext-csv-perl, unzip, wget
The scripts assume you have a PostgreSQL database up and running. We leave authentication to this database your responsibility through the PostgreSQL environment variables which are described in the next section. I also describe creating a database in the next section if you don't know how to set one up.
The simplest authentication setup is to allow local unauthenticated access to
your database. You can do this by adding the following example line to
local abs abs trust
This allows the database named
abs to be accessed by the database user
You could substitute trust with ident if you create the user
abs on your
system and run these scripts as that user.
Running the Scripts
Setting up the database environment
First you need to set up and export some PG environment variables otherwise the PostgreSQL defaults will be used. For example,
export PGHOST=localhost # not necessary if localhost export PGDATABASE=abs export PGUSER=abs
Refer to the PostgreSQL documentation for details on the environment variables which you can set.
If you don't already have a database and database user set up and you haven't loaded the PostGIS extensions into that database, then you can set up the user, database and PostGIS extensions on Debian using (replacing YOUR_DB and YOUR_DB_USER),
sudo su - postgres createuser --no-createdb --no-createrole --superuser YOUR_DB createdb --owner=YOUR_DB_USER YOUR_DB exit # then with your PG* environment variables set load the postgis extensions psql -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql # or if using >= PostGIS 2.1 psql -c "CREATE EXTENSION postgis SCHEMA asgs_2011;"
Once that is set up, the rest of the process is as follows.
Stage 1: Downloading the source ASGS data
Currently the download parameters are hard configured within
01-download-asgs.sh. These were created using the script
00-make-download-code.sh. I'm still not sure if this is the best approach or
if the download parameters should be parsed from the source HTML file at each
01-download-asgs.sh. For now you should be able to just run,
which should download and unzip the ASGS Volume 1-4 files.
If this step has been broken due to a change on the abs.gov.au web server
(because the source datasets are CC-BY 2.5 licensed) I host a copy of the
results after the
01-download-asgs.sh step at http://tianjara.net/data/asgs2pgsql/01-ASGS-ZIP/
Just download this directory (perhaps with a
mkdir -p 01-ASGS-ZIP && cd 01-ASGS-ZIP && wget -r -nd -nc -np http://tianjara.net/data/asgs2pgsql/01-ASGS-ZIP/
) then run 02-unzip-asgs.sh.
Stage 2: Loading the ASGS data into the database schema
This stage assumes you have the 02-ASGS-UNZIP directory from stage 1. With this just run,
When you load geographic data into PostgreSQL using the PostGIS extension you must define the coordinate system of that data. The coordinate system you should use depends on what you are most likely to use the data for.
For instance if you are going to be rendering web maps from the data it would make sense to store the data in PostGIS as the EPSG:900913 coordinate system.
If you want to do lots of analysis and calculations based on the areas of the regions it makes sense to load the data in the GDA94 / Australian Albers coordinate system (EPSG:3577) as that will give you the best area values. If you don't have any specific need it would be fine to leave it in the same coordinate system as the original shape files, unprojected lat longs in the GDA94 datum.
I have deliberated over which coordinate system should be the default, but have ultimately decided to respect the ASGS standard and use the same coordinate system as the source data from the ABS.
I present the common alternatives as user options. Though there are problems with supporting many different configurations including different pg_dumps need to be generated, the options add confusion to non-experts, makes it harder for other applications to rely on the one standard ASGS PostgreSQL schema if there are several different versions.
You can always do on the fly re-projection in PostGIS but quite often you will want to load the data in the coordinate system you wish to use it in to avoid performance issues associated with on the fly re-projection.
To switch the coordinate system which we load the data into check out the
configuration section at the top of the
Stage 3: Materialised Pyramids of Generalised Geometries (Optional)
You can optionally produce materialised pyramid tables of the generalised geometries,
This step requires that you have loaded the EPSG 3577 : GDA94 / Australian Albers coordinate system into your spatial_ref_sys table using the definition found in the spatial_ref_sys.sql file. You can do this by running,
psql -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql
Producing a new db dump
Once everything has been loaded into PostgreSQL using these scripts you can create a PostgreSQL dump file using,
pg_dump --format plain --schema "asgs_2011" --no-owner | xz > asgs_2011.sql.xz
Loading an existing db dump
I host a copy of this file at http://tianjara.net/data/asgs2pgsql/. After setting your PG* environment variables and creating a database (with the PostGIS extensions loaded) as described above, you can load the database dump using,
xzcat asgs_2011.sql.xz | psql -f -
Because I'm unsure how to install PostGIS into a schema other than public with PostGIS 1.4, the geometry_columns and spatial_ref_sys tables aren't included in the DB dump. I believe PostGIS 2.0 makes it easier to install the extension in another schema, so this issue should be resolved when PostGIS 2.0 is packaged for Debian and I'm able to load the PostGIS extensions into the asgs_2011 schema.
- You may run into the issue described here.
How to use the resulting database
Some examples of scripts you can run to pull intelligence from the resulting asgs2pgsql schema are included in the examples directory.