Skip to content

2015 Google Summer of Code Ideas

Sean Barbeau edited this page Mar 31, 2015 · 31 revisions

This page hosts ideas for the 2015 Google Summer of Code program, as discussed on the OTP mailing list.

GTFS-realtime validation tool

GTFS and GTFS-realtime have become dominant formats for open data in the transit industry. While the GTFS format has a GTFS Feed Validator, no such open-source tool currently exists for GTFS-rt. Due to the implementation details of the GTFS-rt specification, the GTFS-rt spec itself does not provide strong guidance for what data fields should be populated for particular use cases of the data.

This project would implement a tool that takes a GTFS and GTFS-rt feed as input, and examines the GTFS-rt feed to determine if it properly meets GTFS-rt best practices, both those explicitly listed in the spec and those based on the needs of consuming applications. The validator should be able to evaluate feeds implemented both via websockets (see this post for details) and as protobuf-encoded responses/files that are retrieved via a polling mechanism. Our preference is that the tool is implemented in Java or Python.

Sample execution steps:

  1. Load the corresponding GTFS (i.e., schedule data)
  2. Monitor the GTFS-rt feed over a certain amount of time (potentially days or more if needed)
  3. Log any warnings and failures in an easy-to-read format

Some errors will be instantly visible (e.g., timestamps in the wrong format), while other more subtle issues may take longer monitoring. We would want to write to a log as the validator is running so the output could be monitored before the validator finishes running, if the testing time is an extended period.

Sample rules to validate include:

  • stop_time_updates for a given trip_id must be sorted by increasing stop_sequence (this should always be enforced whether or not the feed contains the stop_sequence field). A TripUpdate can have multiple stop_time_updates (e.g., one prediction per stop) - so, this shouldn't be monitored across multiple feed messages, just in a single message. (failure)
  • Frequency-based trip_updates must contain trip_id and start_time (failure)
  • Timestamps should be populated for all elements (warning)
  • All timestamps must be in POSIX time (i.e., number of seconds since January 1st 1970 00:00:00 UTC) (failure)
  • vehicle_id should be populated in trip_update (warning)
  • If both vehicle positions and trip updates are provided, VehicleDescriptor or TripDescriptor values should match between the two feeds (warning). Note that in a recent proposal, it was suggested to allow both object types in the same GTFS-rt feed endpoint. We should also clarify if this is officially allowed now, and if so suggest that developers implement both objects in a single feed.
  • If only delay is provided in a stop_time_update (and not a time), then the GTFS stop_times.txt must contain arrival_times and/or departure_times for all stops referenced in the GTFS-rt feed (i.e., not just timepoints) (failure)
  • All trip_ids provided in the GTFS-rt feed must appear in the GTFS data (failure)
  • All route_ids provided in the GTFS-rt feed must appear in the GTFS data (failure)
  • All units and ranges for all fields such as speed, latitude/longitude, bearing, etc. should be validated. For example, latitude has a valid range of [-90 to 90], and longitude has a valid range of [-180 to 180] - any other values should generate a failure. Latitude/longitudes that have the value of (0,0) should be flagged as a warning because it is a common error, and also locations that fall outside of a "reasonable" (to be determined) bounding box surrounding the agency should generate warnings. In another example, speed values of 55 should generate a warning, since GTFS-rt requires meters/sec as the unit, and 55 would be 123 miles/hour, which is an unlikely speed for a bus. (warning/failure)
  • Some rules for GTFS schedule validation that are currently classified as warnings should be added to this tool and classified as failures, because they will break real-time data propagation downstream in trips. Examples include “Too Fast Travel” and “Too Many Consecutive Stop Times With Same Time”. (failure)

See the GTFS-realtime mailing list for an extensive discussion.

Sample GTFS-rt feeds:

Possible projects that could be leveraged for this:

Sample open-source GTFS-rt feed producer projects (to better understand how GTFS-rt is produced):

OpenTripPlanner for Android

Many new enhancements could be implemented for the OpenTripPlanner for Android open-source app, including:

  • Implement real-time turn-by-turn navigation for transit, as discussed in issue 297
  • Add support for the Open Source Routing Machine (OSRM) as a new trip planning engine, as discussed in issue 409
  • Add weather information for destination, as discussed in issue 448
  • Update the app to the new Android Material Design concepts, as discussed in issue 449
  • Update the app to work with the newest OTP server master branch REST API (I'm not sure if this is currently working or broken)
  • Integrate with OneBusAway Android for better shortcuts to real-time transit information, as discussed in issue 21-comment
Clone this wiki locally