Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Internal model GTFS'

Patrick GENDRE edited this page May 23, 2016 · 13 revisions

Introduction

The internal model used, GTFS', is close to GTFS but simplified / normalized / expanded for ease of use.

The Export plug-in re-exports the data according to the GTFS' model, for example: $ gtfsrun tatrobus.sqlite GtfsExport --bundle=tatrobus.zip

Calendar

A calendar is a simple list of calendar dates. There is no date range, day of the week and positive/negative exceptions anymore.

We also force calendars to exists if they are defined in calendars OR calendar_dates (in the GTFS model the calendars table is optional, calendar_dates can be defined only).

Calendar model diagram

This simplify greatly the following queries:

  • List of calendars active a given day or a set of days (SELECT ... WHERE calendar_dates.date = ?)
  • List of calendars active before/after a given date or an interval (SELECT ... WHERE calendar_dates.date <= ?)
  • Computing the number of days a calendar is active (SELECT calendar.id, COUNT(calendar_dates.*) ...)
  • Computing the union of days a set of calendars is active (SELECT DISTINCT calendar_dates.date ...)

Name change

A few fields have their name changed:

  • stop.parent_station has been renamed to stop.parent_station_id, for consistency with other objects, and because it conflict with the parent_station field that now refers to the linked parent station object.

All other fields and class/table names are identical to GTFS.

Zone normalization

A new Zone class is introduced for normalizing the stop-fare rule relationship. This class does not contains any fields except the feed and zone ID. The associated table is called zones.

Optional fields with default values

All optional fields with a default value are initialized to the default value if non defined. Below the list:

  • stop.location_type
  • stop.wheelchair_boarding
  • trip.wheelchair_accessible
  • trip.bikes_allowed
  • agency.agency_id (in case a single agency exists)
  • route.agency (in case a single agency exists)

This allow for simpler queries, the caller not having to check for missing values.

Interpolated stop times

All missing stop times are interpolated based on the distance between stops. Interpolated stop times have the field interpolated set to True.

Stop times diagram

This allow for simpler processing of trip times, a stop time always have a stop time set (except first arrival and last departure). For example to query for all departures in a given hour range: ... WHERE stop_times.departure_time >= ? AND stop_times.departure_time <= ?.

First arrival, last departure

The first arrival time and last departure time of each trip are set to NULL (None).

This allows simpler queries, for example all departures from a stop only need to select non-null departure times (... WHERE stop_times.departure_time IS NOT NULL), this will make sure the last stop times from each trip are not included in the result. The same for all arrivals to a stop (as first stop time should not be included).

Shape normalization

A new Shape class is introduced for normalizing the trip->shape relationship. Shape points are using the ShapePoint class. The table shapes is used for storing normalized Shape entities, a new shape_pts table is used for storing ShapePoint entities. ShapePoint::shape_pt_sequence are re-numbered using a consecutive index starting from 0 (same concept than StopTime::stop_sequence). Shape distances are converted to meters and computed if missing (see below).

Distances in meters

All missing traveled distances (stop_times.shape_dist_traveled, shape.shape_dist_traveled) are computed if missing, and all (including existing distances) are converted to meters. If no shapes are available, distance is simply the straight-line distance between stops.

Distance diagram

This allow for simpler queries based on distance (... WHERE stb.shape_dist_traveled - sta.shape_dist_traveled > ?) or speed (... WHERE (stb.shape_dist_traveled - sta.shape_dist_traveled) / (stb.departure_time - sta.departure_time) > ?).

Please note that while shape.shape_dist_traveled is always starting at 0.0, there is no guarantee that stop_times.shape_dist_traveled will start at 0.0 for any given trip (a trip can start at any point alongside a shape). If shapes are missing for a trip, stop_times.shape_dist_traveled will start at 0.0; but it's safer to never make that assumption. If you want to compute the traveled distance since the start of the trip, subtract the offset for the first stoptime:

distance = stop_time.shape_dist_traveled - stop_time.trip.stop_times[0].shape_dist_traveled

Consecutive stop_sequence

All stop_times.stop_sequence are re-numbered from 0 using a consecutive index (0, 1, 2, 3...). The number of stop times for a trip is always equals to the last stop sequence + 1.

This allow for simpler queries for hops. For example:

  • all hops between two stops (... WHERE sta.stop_sequence = stb.stop_sequence + 1)
  • the number of stops between two stop_times (... stb.stop_sequence - sta.stop_sequence)
  • trips passing by stop A then stop B (... stb.stop_sequence > sta.stop_sequence); although this would also work with a non-consecutive numbering.
  • selecting trip hop count (SELECT trip.trip_id, MAX(stop_times.stop_sequence)+1 ...); although it can also be done using a simple SQL COUNT().

Expansion of frequencies

All frequencies are expanded to normal trips, and flagged as such with the Boolean frequency_generated. The exact_times flag is back-ported to the trip ("standard" trips having exact_times=1, that is exactly scheduled). Both the initial frequencies and trips are deleted.

The ID for frequency-expanded trips is constructed by appending the trip departure time to the original trip ID, such as trip42@8:30:00, trip42@8:40:00, etc... This assume frequency-expanded trips do not overlap for the same original trip. (Note: this should be true according to the GTFS specifications, but may be wrong if two frequency rows associated to the same trip overlaps.)

Frequencies diagram

Proposal (not implemented) : expansion of transfers

In GTFS, a transfer that is defined for a station will apply, if not redefined, to all the station stops. A proposal to GTFS' is to expand any station transfer to sub-stops, if a transfer is not already redefined for the stops.

Goal: to provide for the API user an easy access of transfers between stops w/o having to check for transfers between stations. The check/load sequence can be rather complex (stop to stop, stop to station, station to stop, station to station...)

For example let's assume we have station A with stops A1 and A2, and station B with stops B1 and B2, and the following transfers:

from   to   type
A      B    0
A1     B1   3
B      A    1
B1     A    3

The transfer expansion process would expand to the following:

from   to   type   source
A      B    0      Original
A1     B1   3      Original
A1     B2   0      Expanded
A2     B1   0      Expanded
A2     B2   0      Expanded
B      A    1      Original
B1     A    3      Original
B1     A1   3      Expanded
B1     A2   3      Expanded
B2     A1   1      Expanded
B2     A2   1      Expanded