A ruby gem to manipulate GTFS feeds using DataFrames using Polars (ruby-polars)
This project was created to bring the power of partridge to ruby.
⚠️ Warning: This gem is not ready for production use. It is currently in active development and the API may change without notice.
Install the gem and add to the application's Gemfile by executing:
bundle add gtfs_dfIf bundler is not being used to manage dependencies, install the gem by executing:
gem install gtfs_dfrequire 'gtfs_df'
# Load from a zip file
feed = GtfsDf::Reader.load_from_zip('path/to/gtfs.zip')
# Access dataframes for each GTFS file
puts feed.agency.head
puts feed.routes.head
puts feed.trips.head
puts feed.stop_times.head
puts feed.stops.headThe library supports filtering feeds by any field in any table. The filter automatically cascades through the dependency graph to ensure referential integrity.
# Filter by agency
filtered_feed = feed.filter('agency' => { 'agency_id' => 'MTA' })
# Filter by route
filtered_feed = feed.filter('routes' => { 'route_id' => ['1', '2', '3'] })
# Filter by a service
filtered_feed = feed.filter('calendar' => { 'service_id' => 'WEEKDAY' })
# Multiple filters
filtered_feed = feed.filter(
'agency' => { 'agency_id' => 'MTA' },
'routes' => { 'route_type' => 1 } # Filter to subway routes
)When you filter by a field, the library automatically:
- Filters the specified table
- Cascades related tables following foreign key relationships
- Keeps only the data that maintains referential integrity
For example, filtering by agency_id will automatically filter routes, trips, stop_times, and stops to only include data for that agency.
# Write to a new zip file
GtfsDf::Writer.write_to_zip(filtered_feed, 'output/filtered_gtfs.zip')See examples/split-by-agency for a complete example that splits a multi-agency GTFS feed into separate files per agency.
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.
- Time parsing
Just like partridge, we should parse Time as seconds since midnight. There's a draft in
lib/gtfs_df/utils.rbbut it's not used anywhere. I haven't figured out how to properly implement with Polars.
Bug reports and pull requests are welcome on GitHub at https://github.com/davidmh/ruby-gtfs_df.
The gem is available as open source under the terms of the MIT License.