Project: GTFS feed creation

Nikhil VJ edited this page Nov 28, 2017 · 17 revisions

Aim of this project:

Create a program that automates the task of generating static GTFS feed of a public transit system, with the input being a simple public transit system's regular daily routes as seen in Indian cities like Pune.

Skills needed :

python, java, php-mysql, javascript or similar programming, reading and outputting spreadsheet/csv, an understanding of static GTFS

Project Status

Template mentioned in step 2 is prepared, with some starter data to work with. The stops and routes info mentioned in step 3 are yet to be prepared: they will be the outputs of the stops and routes projects. All programming is still to be done, and even the framework/language to use isn't fixed yet. This project is OPEN to join in / take up.


  • Lead: Nikhil VJ, nikhil.js (at), +919665831250
  • Gaurav Sitlani



1] GTFS reference (static):

Keep this page open :

2] Template

I (Nikhil) have prepared a "template for gtfs conversion" spreadsheet here:

Download a copy and open it up. Check out the worksheets in it.

3] Our internal database:

stops.txt : the stops database, and also the stops.txt file in the gtfs.

routes-db : routes database that has the route's name and timing information keyed in to its unique id. timing is either in form of timings (pipeline separated), or first trip, frequency, last trip. (i have kept one route having timings and other having frequency)

sequence-db : the sequence of stops in each route. unique id's of both used. Up and down directions defined separately.

These would get data filled in through the other projects on stops and routes data.

4] GTFS tables/files:

From the above three tables (which would be our internal db), the GTFS feed is created, which comprises of a bunch of CSV files with a .txt extension.

  • stops.txt is as-is
  • routes.txt - each route's id and name
  • trips.txt - if a route has multiple timings instead of frequency, then it is multiplied into multiple trips here. else one trip. Oh, and separate trips for reverse direction.
  • frequences.txt - if a route operates on a frequency, then its here
  • stop_times.txt - where a trip expands into sequence of stops. This is where the main computation takes place.
  • calendar.txt - static for our purposes.
  • agency.txt - static for our purposes.

Please study the GTFS reference site for knowing more about these files/tables.

5] Programming

After understanding the above (gulp!), program it so that given an input with stops.txt, routes-db and sequence-db filled in, the program generates the remaining sheets (need not be as sheets in an excel.. that was just for my convenience. Output will be each of these sheets being a text file in csv format.)

Here's a page on understanding the GTFS format, by studying a real GTFS data snippet.

Basic Algorithm:

  1. Pick one route from routes-db sheet.
  2. Load timings values from routes-db sheet for that route.
  3. From sequence-db sheet, load stopcode sequences for that route.
  4. Create entry in routes.txt
  5. Based on timings values, calculate number of trips to provision. 5.1. If timing is in frequency format, then just one trip per direction. 5.2. Else as many trips as starting times in either direction.
  6. Create trip entries for chosen route in trips.txt.
  7. If frequency-based route, create entry in frequencies.txt.
  8. For each trip, the sequence of stops is to be defined in stop_times.txt. ie, if 30 stops then 30 rows for that trip, with stopcodes.
  9. Timing values are counted up from 00:00hrs at starting stop in case its a frequency based route, or from given start times in case its a fixed timings route.
  10. Estimate the timings of subsequent stops by choosing some methodology. Some options: 18.2. Assume some time interval between each stop, like 3 mins for example. 10.1. Calculate distance between stops using lat-long values and assume some average speed of buses.
  11. Remember to set timepoint as 0 to indicate that time values are approximate and not exact.
  12. Repeat for all trips under the route.
  13. Repeat for all routes.

6] Clarification

As initial dummy data I've copied two routes from the PMPML database (also attached), so the id's are from there. Please ignore the -D suffix as in GTFS one route will have trips defined under it going up or down. For the sake of simplicity I have not defined the return (up) journeys here yet. We'll probably have to add columns in the routes-db sheet to define timings for return trips.

7] Bigger Picture:

This task / project ties in to a long term process of improving PMPML through increased transparency and systematization. The global standard data format for public transit is (GTFS), which is used by Google Transit and most transit related apps. It critically needs a stop-centric database and routes info laid out in a systemized way, and from there we need to have a program that churns this data and generates GTFS.

8] Open for design inputs

If you feel that the internal db can be structured in a better way, please put the better structure forward and let's adapt to that. Have to do this in consultation with the other projects if they've started, since their output is the internal db.

Taking this forward

A follow-on project would be creating a GUI system that operates on the three internal db tables. Enabling a user to change the route info comfortably, and then the program takes the updated internal db and generates a fresh GTFS feed. Here is an album of some mockups created by Nikhil, and here is a full presentation on it he had made earlier, before the Pune Open Data Portal had released this bus data.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.