This is a demo application that has a Python back end and JavaScript / Leaflet maps front end. It uses GTFS (General Transit Feed Specification) and GTFS-RT (the extra realtime feeds for GTFS) to store and analyze transit system route, trip, stop and vehicle movement data in CrateDB.
GTFS and GRTFS-RT are standard ways of representing this type of data. This means that, in theory, this project could be applicable to any transit system that adopts this approach. However, there can be differences between transit agencies, so some aspects of the project may need adapting for that.
We have developed this demo using GTFS and GTFS-RT data from the Washington Metropolitan Area Transit Authority (WMATA), specifically for the DC Metro train system. The design of the database schema allows for data from multiple agencies / transit systems to be stored as long as each agency has a unique agency ID.
Here's a sped up demo of the front end running, showing train movements on the DC Metro system:
Individual trains can be tracked by clicking on them, which displays information about the train's current trip in a popup:
To run this project you'll need to install the following software:
- Python 3 (download) - we've tested this project with Python 3.12.2 on macOS Sequoia.
- Git command line tools (download).
- Your favorite code editor, to edit configuration files and browse/edit the code if you wish. Visual Studio Code is great for this.
- Access to a cloud or local CrateDB cluster (see below for details).
- A WMATA API key. These are free, and you can register for API access and get your key at the WMATA developer portal.
Next you'll need to get a copy of the code from GitHub by cloning the repository. Open up your terminal and change directory to wherever you store coding projects, then enter the following commands:
git clone https://github.com/crate/devrel-gtfs-transit.git
cd devrel-gtfs-transit
You'll need a CrateDB database to store the project's data in. Choose between a free hosted instance in the cloud, or run the database locally. Either option is fine.
Create a database in the cloud by first pointing your browser at console.cratedb.cloud
.
Login or create an account, then follow the prompts to create a "CRFREE" database on shared infrastructure in the cloud of your choice (choose from Amazon AWS, Microsoft Azure and Google Cloud). Pick a region close to where you live to minimize latency between your machine running the code and the database that stores the data.
Once you've created your cluster, you'll see a "Download" button. This downloads a text file containing a copy of your database hostname, port, username and password. Make sure to download these as you'll need them later and won't see them again. Your credentials will look something like this example (exact values will vary based on your choice of AWS/Google Cloud/Azure etc):
Host: some-host-name.gke1.us-central1.gcp.cratedb.net
Port (PostgreSQL): 5432
Port (HTTPS): 4200
Database: crate
Username: admin
Password: the-password-will-be-here
Wait until the cluster status shows a green status icon and "Healthy" status before continuing. Note that it may take a few moments to provision your database.
The best way to run CrateDB locally is by using Docker. We've provided a Docker Compose file for you. Once you've installed Docker Desktop, you can start the database like this:
docker compose up
Once the database is up and running, you can access the console by pointing your browser at:
http://localhost:4200
Note that if you have something else running on port 4200 (CrateDB admin UI) or port 5432 (Postgres protocol port) you'll need to stop those other services first, or edit the Docker compose file to expose these ports at different numbers on your local machine.
We've provided a Python data loader script that will create the database tables in CrateDB for you.
You'll first need to create a virtual environment for the data loader and configure it:
cd gtfs-static
python -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt
Now make a copy of the example environment file provided:
cp env.example .env
Edit the .env
file, changing the value of CRATEDB_URL
to be the connection URL for your CrateDB database.
If you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.
If you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:
https://admin:<password>@<hostname>:4200
Save your changes.
Next, run the data loader to create the tables used by this project:
python dataloader.py createtables
You should see output similar to this:
Created agencies table if needed.
Created networks table if needed.
Created routes table if needed.
Created vehicle positions table if needed.
Created trip updates table if needed.
Created trips table if needed.
Created stops table if needed.
Created stop_times table if needed.
Created config table if needed.
Finished creating any necessary tables.
Use the CrateDB console to verify that the above named tables were all created in the doc
schema.
The next step is to load static data about the transport network into the database. We'll use Washington DC (WMATA) as an example.
First, load the configuration data for the agency:
python dataloader.py config-files/wmata.json
Now, load data into the agencies
table:
python dataloader.py data-files/wmata/agency.txt
Next, populate the routes
table:
python dataloader.py data-files/wmata/routes.txt
Then the stops table. Here, 1
is the agency ID, and must match the spelling and capitalization of the agency ID in agency.txt
:
python dataloader.py data/files/wmata/stops.txt 1
Finally, insert data into the networks
table. Here WMATA
is the agency name, and must match the spelling and capitalization of the agency name in agency.txt
:
python dataloader.py geojson/wmata/wmata.geojson WMATA
This project has a web front end and a Flask application server. The front end is written in vanilla JavaScript and uses the Bulma framework for the majority of the styling. Leaflet is used to render maps and handle map events. The Flask application uses the CrateDB Python driver to talk to the database.
Before starting the front end Flask application, you'll need to create a virtual environment and configure it:
cd front-end
python -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt
Now make a copy of the example environment file provided:
cp env.example .env
Edit the .env
file, changing the value of CRATEDB_URL
to be the connection URL for your CrateDB database.
If you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.
If you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:
https://admin:<password>@<hostname>:4200
Now, edit the values of GTFS_AGENCY_NAME
and GTFS_AGENCY_ID
to contain the agency name and ID for the agency you're using. These should match the values returned by this query:
SELECT agency_name, agency_id FROM agencies
For example, for Washington DC / WMATA, the correct settings are:
GTFS_AGENCY_NAME=WMATA
GTFS_AGENCY_ID=1
Don't forget that if either value contains a space, you'll need to surround the entire value with quotation marks.
Save your changes.
Now, start the front end application:
python app.py
Using your browser, visit http://localhost:8000
to view the map front end interface.
At this point you should see the route map for the agency that you're working with, along with the stations / stops on the routes. Clicking a station or stop should show information about it.
No vehicles will be visible on the map yet. To see these, you'll need to run the real time data receiver components (see below).
When you're finished with the real time data receiver, stop it with Ctrl-C
(but keep it running for now, so you'll be able to see the real time data soon...)
The real time data receivers are responsible for reading real time vehicle location and other data from the transit agencies and saving it in the database.
First, create a virtual environment and install the dependencies:
cd front-end
python -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt
Now make a copy of the example environment file provided:
cp env.example .env
Edit the .env
file, changing the value of CRATEDB_URL
to be the connection URL for your CrateDB database.
If you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.
If you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:
https://admin:<password>@<hostname>:4200
Now, edit the value of GTFS_AGENCY_ID
to contain the ID for the agency you're using. It should match the value returned by this query:
SELECT agency_id FROM agencies
For example, for Washington DC / WMATA, the correct setting is:
GTFS_AGENCY_ID=1
Set the value of SLEEP_INTERVAL
to be the number of seconds that the component sleeps between checking the transit agency for updates. This defaults to 1
, but you may need to set a longer interval if the agency you're using implements rate limiting on its API endpoints.
Next, set the value of GTFS_POSITIONS_FEED_URL
to the realtime vehicle movements endpoint URL for your agency. For example for Washington DC / WMATA this is https://api.wmata.com/gtfs/rail-gtfsrt-vehiclepositions.pb
.
Set the value of GTFS_TRIPS_FEED_URL
to the realtime trip updates endpoint URL for your agency. For example for Washington DC / WMATA this is https://api.wmata.com/gtfs/rail-gtfsrt-tripupdates.pb
.
Set the value of GTFS_TRIPS_SCHEDULE_URL
to the static GTFS URL for your agency. This will be a URL that serves a zip file. For example for Washington DC / WMATA this is https://api.wmata.com/gtfs/rail-gtfs-static.zip
.
Finally, if your agency requires an API key to access realtime data, set the values of GTFS_POSITIONS_FEED_KEY
, GTFS_TRIPS_FEED_KEY
and GTFS_TRIPS_SCHEDULE_KEY
appropriately. You'll most likely use the same API key for each.
Save your changes.
The schedule of trips is stored in two tables in CrateDB: trips
and stop_times
. You need to update this once daily by running:
python trip_schedule.py 1
Start gathering real time vehicle position data continuously by running this command:
python vehicle_positions.py
You should also start continuous gathering of real time trip update data by running:
python trip_updates.py
When you're finished with the real time data receivers, stop them with Ctrl-C
.
Assuming that the Flask front end web application is running, you should now see vehicle movement details at http://localhost:8000
. Clicking a vehicle should display a pop up with information about the trip that the vehicle is currently on: trip ID, next stops, time estimates etc.
Once the system's been running for a while, you might want to run some queries that analyze and aggregate data. We've provided some examples in the example_queries.md
file.
Getting GeoJSON from GTFS:
https://github.com/BlinkTagInc/gtfs-to-geojson
cd gtfs-static
gtfs-to-geojson --configPath ./config_wmata.json
Getting GTFS static data for WMATA rail:
wget --header="api_key: <REDACTED>" https://api.wmata.com/gtfs/rail-gtfs-static.zip