-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Labels
gtfs-rtWork related to GTFS-RealtimeWork related to GTFS-Realtimeresearch requestIssues that serve as a request for research (summary and handoff)Issues that serve as a request for research (summary and handoff)
Description
Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Research Question
Single sentence description: We have various aggregation levels that need to be harmonized across exports for easy joining in time-series visualizations.
- by day (single day vs multiple days)
- geometry (segment vs line geometry)
- by time (offpeak, peak, all day, time-of-day bins)
Certain columns do not enable aggregation and others are needed if we want to aggregate beyond a single day.
Detailed description:
- Time harmonization --> add offpeak so we publish 3 time periods: peak, offpeak, and all-day speeds
- Shapes are not easily used across months. Moving away from them means using
route_id, direction_id, and stop_pair
earlier...but where?- Exploratory work taking Apr and Oct 2023 dates shows that Big Blue Bus had very few joins on shape-stop_sequence, meaning we would not be able to track speeds over time in that segment.
- Produce shape-stop values on a single day, but lose the multi day?
- Open data publishing: we ditch the internal keys (
shape_array_key
,gtfs_dataset_key
in favor of natural identifiers (shape_id
,route_id
) and a stable agency identifier (organization_source_record_id
).- Clean up redundancies: we save out 2 versions of the export (with and without internal keys)...redundant and confusing.
- Goal: We need to be able to key into our exports whenever we get user feedback, but let's add the natural identifiers a lot earlier, and carry them through the aggregations. We can probably drop the internal keys in the step right before we zip up the shapefile.
- This should solve the fact that some exports use
schedule_gtfs_dataset_key
while others useorganization_source_record_id
, and we're redundantly querying several tables inmart_transit_database
to get the crosswalk we want. If it's not solved, we need to save out a crosswalk we can use acrossrt_vs_schedule
andspeeds
to better merge the dfs prior to visualizing.
- New folder structure for these exports so it's clear whether we're grabbing single day, multiple day, what unit of analysis (segment or line).
- Set up a
catalog.yml
so we can easily find the paths for all these aggregated exports. - Potentially need to refactor
segment_speed_utils
. If we want to enable certain averages to route-direction, we need to move some functions here out ofrt_segment_speeds/scripts/
so they can be used elsewhere.
How will this research be used?
Enable usable time-series visualizations that are based on joining all the exports across speeds
, schedule
, and rt_vs_schedule
areas.
Metadata
Metadata
Assignees
Labels
gtfs-rtWork related to GTFS-RealtimeWork related to GTFS-Realtimeresearch requestIssues that serve as a request for research (summary and handoff)Issues that serve as a request for research (summary and handoff)
Type
Projects
Status
Done