csv geo au

Kevin Ring edited this page Nov 29, 2016 · 15 revisions

csv-geo-au 1.5

csv-geo-au is a specification for publishing point or region-mapped Australian geospatial data in CSV format to data.gov.au and other open data portals. Datasets in this format are supported by TerriaJS (and hence the National Map) and are intended to be as reusable as possible. A State column in a CSV file with a resource format of csv-geo-au can unambiguously be understood to refer to an Australian state, for example.

Datasets with line feature or explicit polygons (instead of references to standard polygon boundaries) are not covered by this standard, and should be provided as GeoJSON.

Document Status: initial use. This document will evolve, but it is unlikely that field names currently recommended will become deprecated.

Terminology

  • Recommended: The best field name for maximum reusability. High priority for support in TerriaJS (the software that runs the National Map). Sometimes several options are recommended, depending on your need for precision.
  • Accepted: A field name which is reasonably reusable. Generally supported by TerriaJS.
  • Discouraged: A field name which is ambiguous or not intuitive to a wide audience, but may be commonly used due to existing software. Possibly supported by TerriaJS but may be discontinued.

It is generally acceptable to include "discouraged" fields if there is also recommended or accepted fields as the recommended or accepted fields will be used.

Goals

In designing this specification, we have tried to balance these goals:

  • Maximising the chance that existing CSV files may accidentally conform, correctly.
  • Allowing motivated dataset publishers to be very precise about the exact boundaries their data relates to.
  • Making column names guessable without consulting the specification.
  • Encouraging the production of datasets which are easy to use by consumers who are unaware of this specification.
  • Aligning with attribute names already used by authorities such as the ASGS

General recommendations

The CSV format MUST:

  • Consist only of one header row followed by data rows (no other metadata within the file)
  • Use , as field delimiter
  • Use \r\n (Windows) or \n (Linux, OSX) as end of line character
  • Use double quotes around any value containing a comma, and double-double quotes to represent double quotes: "like ""this"""

It SHOULD be encoded in UTF-8. Headers are not considered to be case-sensitive.

Use in data.gov.au

In data.gov.au and other CKAN-based portals, resources (individual files) that conform to this standard SHOULD be given a resource type of csv-geo-au. This is required for National Map to locate and display them. Resources with format set to csv-geo-au can also be previewed on data.gov.au like other CSV files.

Quick summary

Tables should look like one of these:

ID,Population,LGA_code_2015,State
1,100600,24600,VIC

or

ID,Population,Postcode,State
1,28000,3000,VIC

or

ID,Name,Lat,Lon
1,Bacchus Marsh Airport,-37.7313,144.4212

EITHER a latitude/longitude pair, OR one or more region fields should be provided.

Point data (latitude & longitude)

To encode individual points with a latitude and longitude, two fields are required. Each MUST be a number in decimal degrees. Numbers SHOULD NOT be enclosed in double quotes.

Recommended field names
  • Lat, Lon
Accepted field names
  • Latitude, Longitude;
  • Lat, Lng, Long
Discouraged
  • x, y;
  • WKT (single column with data in POINT(-37.8 144.9) format);
  • easting, northing;
  • combined format: (-37.8, 144.9);
  • GeoJSON

Locations SHOULD be given in the GDA94 datum (EPSG:4283), but WGS84 is acceptable (EPSG:4326). The difference is generally less than one metre. The datum chosen SHOULD be indicated in the metadata for the dataset. (There is currently no standard for this.)

LGAs, postcodes and other regions

Australian Statistical Geography Standard (ABS statistical regions)

For each boundary type, there are usually three field names that can be used for matching on codes:

  • "Field with year" (eg sa4_code_2011). This is the most precise, and recommended, particularly for boundaries which change frequently. Certain boundaries move significantly every year (eg LGA), and some are completely renumbered in each reissue (eg, Tourism Regions). (TerriaJS does not currently support different versions.)
  • "Field without year" (eg sa4_code). This is acceptable when the year is not known.

These field names generally match those used by the ABS. In addition, we define:

  • "Synonym" (eg sa4). This unofficial shorthand is useful for matching spreadsheets in this form, but it is not recommended due to ambiguity: does the field contain codes or names? (TerriaJS always assumes codes.)

In addition, we define field names for matching on names (eg sa4_name_2011).

Please note that there is currently no support for matching any ASGS regions other than LGA by name.

State/Territory (STE)

Name or code Field with year Field without year Synonyms
Full name (New South Wales) ste_name_2011 ste_name state
1 digit code (3=Queensland) ste_code_2011 ste_code ste

Note: TerriaJS may in the future support state abbreviations (TAS etc)

Statistical area 1 (SA1)

Note: The use of "maincode" here follows the ABS' convention.

Name or code Field with year Field without year Synonym
11-digit code sa1_maincode_2011 sa1_maincode sa1,sa1_code
7-digit code sa1_7digitcode_2011 sa1_7digitcode

Statistical area 2 (SA2)

Name or code Field with year Field without year Synonym
9-digit code
(1 digit state + 2 digit SA4 + 2 digit SA3 + 4)
sa2_code_2011 sa2_code sa2
5-digit code
(1 digit state + 4)
sa2_5digitcode_2011 sa2_5digitcode
Name (eg "O'Connor (WA)") sa2_name_2011 sa2_name

Statistical area 3 (SA3)

Name or code Field with year Field without year Synonym
5-digit code (1 digit state + 2 digit SA4 + 2 digits) sa3_code_2011 sa3_code sa3
Name (eg "North Sydney - Mosman") sa3_name_2011 sa3_name

Statistical area 4 (SA4)

Name or code Field with year Field without year Synonym
3 digit code (1-digit state code + 2) sa4_code_2011 sa4_code sa4
Name (eg "Melbourne - Inner South") sa4_name_2011 sa4_name

Greater Capital City Statistical Areas (GCCSA)

Name or code Field with year Field without year Synonym
5-character alphanumeric code
(1-digit state code + 4, eg 1GSYD)
gccsa_code_2011 gccsa_code gccsa
Name (eg "Greater Sydney") gccsa_name_2011 gccsa_name

Signifcant Urban Areas (SUA)

Name or code Field with year Field without year Synonym
4-digit code (non-hierarchical, eg 5009) sua_code_2011 sua_code sua
Name (eg "Warragul - Drouin") sua_name_2011 sua_name

Australia as a whole (AUS)

Name or code Field without year Synonyms
1 digit code (0=Australia) aus_code aus

ABS structures in the ASGS that are not currently supported by TerriaJS

Note: As of June 2015, no decisions have been made about future support of these structures.

Structure Name or code With year Without year Synonym
Mesh block 11 digit code mb_code_2011 mb_code mb
Section of state 2 digit code sos_code_2011 sos_code sos
Section of state range 3-digit code sosr_code_2011 sosr_code sosr
Urban Centres and Localities 6-digit code ucl_code_2011 ucl_code ucl
Indigenous Regions 3-digit code ireg_code_2011 ireg_code ireg
Indigenous Locations 8-digit code iloc_code_2011 iloc_code iloc
Indigenous Areas 6-digit code iare_code_2011 iare_code iare
Remoteness Areas 2-digit code ra_code_2011 ra_code ra

Non-ABS structures in ASGS

Postcode / postal area

A four digit Australian postcode.

Recommended field names
Authority Region name Name or code Field with year Field without year
PSMA Postcode 4 digit code postcode_2015 postcode
ABS Postal area (ABS approximation) 4 digit code poa_2011, poa_code_2011 poa, poa_code

Note: PSMA's boundaries are not open data, and TerriaJS hence uses the ABS Postal areas to display postcodes. They are not quite the same.)

For greater precision, additional fields Suburb and State MAY be provided. For example: Postcode 3068, Suburb Clifton Hill, State VIC.

Local Government Area

Name or code Field with year Field without year Synonym
5 digit code (eg "31000") lga_code_2014 lga_code lga
Name (eg "Brisbane") lga_name_2014 lga_name adm2 (see note)

Complete lists of 5 codes are available here.

Using name fields

The lga_name field SHOULD be used only a human-readable addition to lga_code. It is NOT recommended as the primary lookup (and is not currently supported as such by TerriaJS).

The adm2 field (not currently supported by TerriaJS) must contain the short form of the LGA name, with no "City of", "Council" etc. For example: "Melbourne", "Greater Geelong". It SHOULD be capitalised like this.

A separate State column (and/or lga_code column) MUST be provided, as LGA names are not unique across states.

Commonwealth Electoral Division

ABS approximations of electoral districts.

Name or code Field with year Field without year Synonym
3 digit code (1-digit state code + 2, e.g. "402") ced_code_2011 ced_code ced
Name (eg "Barker") ced_name_2011 ced_name

To explicitly map against actual AEC boundaries instead of ABS approximations:

Name or code Field with year Field without year Synonym
3 digit "division ID" *(eg "180") com_elb_id_2016 com_elb_id com_elb
Name (eg "Barker") com_elb_name_2016 com_elb_name

Note: the year here is the year of the AEC release, not ASGS. Division IDs are not the same as AEC codes, although they look similar. (ABS codes beginning 1xx are all in NSW, AEC ones can be in any state.)

State Electoral Division

ABS approximations of electoral districts.

Name or code Field with year Field without year Synonym
5 digit code (1-digit state code + 4, e.g. "20106") sed_code_2011 sed_code sed
Name (eg "Albert Park (Southern Metropolitan)") sed_name_2011 sed_name

To explicitly map against actual AEC boundaries instead of ABS approximations:

Name or code Field with year Field without year Synonym
5 digit code (1-digit state code + 4, e.g. "20106") sed_aec_code_2011 sed_aec_code sed_aec
Name (eg "Albert Park (Southern Metropolitan)") sed_aec_name_2011 sed_aec_name

Note: the year here is the year of the AEC release, not ASGS.

State Suburbs

ABS approximations of suburbs.

Name or code Field with year Field without year Synonym
5 digit code (1-digit state code + 4, e.g. "10002") ssc_code_2011 ssc_code ssc
Name (eg "Abbotsford (NSW)") ssc_name_2011 ssc_name

The field name suburb is currently treated as a synonym for ssc but may change.

Non-ABS structures in the ASGS that are not supported by TerriaJS

Structure Name or code With year Without year Without year (Synonym)
Australian Drainage Divisions 3 character code: D__ add_code_2011 add_code
Natural Resource Management Regions 3-digit code nrmr_code_2011 nrmr_code nrmr
Tourism Regions 5 character code: _R___ tr_code_2011 tr_code tr

Country boundaries

Name or code Field Synonyms
Two letter country code (ISO 3166-1 Alpha 2)
(eg AU)
cnt2 iso2
Three letter country code (ISO 3166-1 Alpha 3)
(eg AUS)
cnt3 iso3

Primary Health Network (Department of Health)

Name or code Field with year Field without year Synonym
6 character PHN___ code (eg PHN101) phn_code_2015 phn_code phn
Name (eg "Central and Eastern Sydney") phn_name_2015 phn_name

Statistical Local Area

An obsolete ABS structure roughly equivalent to an LGA.

Name or code Field with year Field without year Synonym
9-digit Code sla_code_2006 sla_code sla, sla_9digitcode_2006
5-digit Code sla_5digitcode_2006 sla_5digitcode
Name sla_name_2006 sla_name

Note: As this structure is obsolete, we recommend using the full field name sla_code_2006 in case the short form sla clashes with something else in the future.

Collection District

An obsolete ABS structure similar in size to an SA1.

Name or code Field with year Field without year Synonym
Code cd_code_2006 cd_code cd, cd_7digitcode_2006

Note: As this structure is obsolete, we recommend using the full field name cd_code_2006 in case the short form cd clashes with something else in the future.

Other boundaries not currently supported by TerriaJS

These are included here to support standardisation and future support by TerriaJS. As of June 2015, no decisions have been made about future support.

Dates and times

An optional date field MAY be used to indicate a date (and, optionally) time associated with a row. These formats of ISO8601 are acceptable:

Format Example Description
yyyy 2004
yyyy-mm 2004-05
yyyy-mm-dd 2004-05-01
yyyy-mm-ddThh:mm:ss 2004-05-01T19:43:16 recommended format without timezone (literal "T")
yyyy-mm-dd hh:mm:ss 2004-05-01T19:43:16 alternative format without timezone
yyyy-mm-ddThh:mm:ssZ 2004-05-01T19:43:16Z with UTC timezone (literal "Z")
yyyy-mm-ddThh:mm:ss+zz 2004-05-01T19:43:16+11 with timezone specified as + or -