Research Request - refactor segment speed exports 

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

## Research Question
**Single sentence** description:  We have various aggregation levels that need to be harmonized across exports for easy joining in time-series visualizations. 
- by day (single day vs multiple days)
- geometry (segment vs line geometry)
- by time (offpeak, peak, all day, time-of-day bins)

Certain columns do not enable aggregation and others are needed if we want to aggregate beyond a single day. 
* Supports #989 
  
**Detailed** description:
1. Time harmonization --> add offpeak so we publish 3 time periods: peak, offpeak, and all-day speeds
* Shapes are not easily used across months. Moving away from them means using `route_id, direction_id, and stop_pair` earlier...but where?
   * Exploratory work taking Apr and Oct 2023 dates shows that Big Blue Bus had very few joins on shape-stop_sequence, meaning we would not be able to track speeds over time in that segment.
   * Produce shape-stop values on a single day, but lose the multi day? 
2. Open data publishing: we ditch the internal keys (`shape_array_key`, `gtfs_dataset_key` in favor of natural identifiers (`shape_id`, `route_id`) and a stable agency identifier (`organization_source_record_id`). 
   * Clean up redundancies: we save out 2 versions of the export (with and without internal keys)...redundant and confusing. 
   * **Goal:** We need to be able to key into our exports whenever we get user feedback, but let's add the natural identifiers a lot earlier, and carry them through the aggregations. We can probably drop the internal keys in the step right before we zip up the shapefile.
   * This should solve the fact that some exports use `schedule_gtfs_dataset_key` while others use `organization_source_record_id`, and we're redundantly querying several tables in `mart_transit_database` to get the crosswalk we want. If it's not solved, we need to save out a crosswalk we can use across `rt_vs_schedule` and `speeds` to better merge the dfs prior to visualizing.
3. New folder structure for these exports so it's clear whether we're grabbing single day, multiple day, what unit of analysis (segment or line). 
4. Set up a `catalog.yml` so we can easily find the paths for all these aggregated exports.
5. Potentially need to refactor `segment_speed_utils`. If we want to enable certain averages to route-direction, we need to move some functions here out of `rt_segment_speeds/scripts/` so they can be used elsewhere. 

## How will this research be used?
Enable usable time-series visualizations that are based on joining all the exports across `speeds`, `schedule`, and `rt_vs_schedule` areas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Research Request - refactor segment speed exports #992

Research Question

How will this research be used?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research Request - refactor segment speed exports #992

Description

Research Question

How will this research be used?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions