Public transport data in GFTS format with schemas, a data package and tests
Python
Switch branches/tags
Nothing to show
Clone or download

README.md

GTFS

Public transportation schedules and associated geographic information for South-East Queensland.

The data is a snapshot and not planned to be kept up-to-date. The main purpose of this repository is to develop a data package and schemas for this dataset.

Data

This data is the General transit feed specification (GTFS) — South East Queensland data published by Transport and Main Roads, Queensland Government, licensed under Creative Commons Attribution sourced on 07 September 2016.

The data follows the GTFS specification and some of its extensions that define a common format for public transportation schedules and associated geographic information. The specification allows some files to be optional. It also allows some columns in the files to be optional. This means that the datapackage.json file and schemas may not work for other GTFS files.

The data is made up of a number of files.

Each data file is defined by a schema. The schemas follow the json table schema specification.

These schemas will be combined into a datapackage.json file to fully describe the data collection. The datapackage.json file will follow the data package specification.

Preparation

The data was downloaded, unzipped, and then uploaded to GitHub.

Two data files (shapes.txt and trips.txt) were too large to load into GitHub. They were truncated and uploaded. They will be adequate to use for testing valid data.

Tests

The focus of the tests is to ensure the schemas are correct. There are already GTFS data validation tools to test the data in more powerful ways than json table schemas allow.

The tests are invalid data that is used to ensure the schema detects all errors (e.g. incorrect types and violated constraints).

Results

The results can be verified using links to Good Tables. Tests include:

  • testing the valid data without a schema
  • testing the valid data with a schema
  • testing the invalid data with a schema

Good Tables doesn't check all types of errors (yet). Somethings not checked include:

  • Foreign keys. (See Good Tables #17, #8)
  • Some constraints (See Good Tables#55)

Automatic Testing

The scripts and .travis.yml file are used to automatically test the data that is defined in datapackage.json. Whenever there is a change to this repository, it triggers Travis to validate the data.

The last automatic test returned datapackage validation

Schemas

The schemas were created using Data Packagist. Using Data Packagist:

  • add some basic information about the data file (name, description, license, etc.)
  • upload the data file

Data Packagist will create a datapackage.json file for you. Download this file.

Good Tables can only use a json table schema for validation (see goodtables-web #65). You can extract the json table schema from the datapackage.json file. It's this bit {fields: [...]}. Save this a separate file.

Edit the schema file with a text editor (e.g. ATOM, jsoneditoronline.org) and add constraints, refine types and formats, etc. You may like to use the json table schema schema to improve your editing experience.

Some constraints use regular expressions to define a pattern. Use a online tool to help create and test a regular expresion e.g. regexr.com or regex101.

View the Data Package

Data packages are about providing machine-readable metadata for your data. You can view a human-readable version of the data package data, and readme files using the Data Package Viewer. There are a couple of issues with the viewer including providing an incorrect link to the metadata data.okfn.org-new #9.

License

All items in this repository, apart from the data, are licensed under Creative Commons Attribution 4.0.