Skip to content

Research Request - GTFS Digest Metrics Shakedown #1059

@tiffanychu90

Description

@tiffanychu90

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: Draft GTFS digest is up, now decide which metrics are worth charts (time-series), which are more easily captured by tables (perhaps mostly static with little changes)

Tables

Grain

  • route-direction
  • in digest, do not use schedule_gtfs_dataset_key or name (schedule), but use organization_source_record_id and organization_name.
  • organize by district
  • meaningful route name displayed, and charts use this standardized route_name, not route_id:
    • put through script to standardize / parse route_id over time so we can identify the same route over time (route_id2).
    • further clean up by operator so that combo route_short_name and route_long_name does not have redundancy (1 Route 1 -> Route 1)

Operator Stats ("digest/operator_profiles" and "digest/operator_routes")

  • monthly scheduled service hours by day of week (day_type, time_of_day)
  • number of routes, service area (length in miles, only count each route-direction once)
  • number of stops served, total stop arrivals, arrivals per stop
  • route typology breakdown

Schedule ("digest/operator_schedule_rt_category"?)

  • avg scheduled service minutes
  • avg stop distance (meters) -> change to per mile?
  • all_day / peak / offpeak scheduled n_trips
  • all_day /peak / offpeak scheduled frequency
  • monthly scheduled service hours by day of week (day_type) and time_of_day

Speeds ("digest/segment_speeds")

  • summary speeds for all_day /peak / offpeak

RT vs Schedule

  • trips breakdown by sched_vp_category (10 trips schedule only, 100 trips schedule_and_vp, 5 trips vp only)
    • .drop_duplicates() but keep some dates where this distribution changes
  • route breakdown by sched_vp_category ("digest/schedule_vp_metrics") don't do this, because routes can have different numbers of n_scheduled_trips, n_vp_trips, and we can assign incorrectly.

Charts

Schedule

  • monthly scheduled service hours by day of week (day_type) and time_of_day
  • number of trips by sched_vp_category?

Speeds

  • summary speeds for all_day / peak / offpeak

RT vs Schedule

  • filter to schedule_vp_category = "schedule_and_vp"
  • vp per minute (goal line = 2)
  • % vp in scheduled shape (goal line = 100%) - use all_day
  • % RT journey with 1+/2+ vp (goal line = 100%) - use all_day, one chart shared for 1+ and 2+
  • % schedule journey with 1+/2+ vp (goal line = 100%) - use all_day, one chart shared for 1+ and 2+
  • comparison of scheduled to RT trip time (aggregated to route-direction)
    • breakdown n_trips early (5+ min early) / on-time within 5 min early or late / late (5+ min late)

Maps

Deliverables

  • Parameterized notebook with supporting scripts (charting, report styling, or table aggregations).
    • Plug and play approach. How the metrics are displayed and organized potentially can get reshuffled, so set up functions for making each chart and then wrap them up for display.
  • If we want to have dropdown by route, would altair's extract data help with displaying tables? Right now, tables are not filtered interactively, but that is desired.
  • Try out great_tables, potentially can help with displaying tables. Their nanoplots look cool, but only take polars dfs, not pandas and df might have to be wide?

Data Sources

Metadata

Metadata

Labels

gtfs-rtWork related to GTFS-Realtimeresearch requestIssues that serve as a request for research (summary and handoff)

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions