-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status updates #6
Comments
I've been at Modelling World last 2 days and didn't get much done. I started looking at ticketing events to figure out which GTFS
I need to look more closely, but I think they actually flip between serving two routes through the day, and I have an idea how to segment better. I did start thinking about the UI to organize all the info we'll have. I think the main view should show the bus network -- drawing stops and lines to show route variants. The main control is filtering. By default, all data is shown. You can optionally filter by a set of dates (either a range, individual days, or weekends or Thursdays or something). This will hide any route variants not operating on those days (and hide any stops not visited). At all times, you also have a list of matching route variants, and you can select just one as well. In this main view, there can be ways to show spatial patterns by coloring/styling the route lines and stops. There could be ways to show frequency, ridership, delays, and capacity of vehicles filled. Based on the date filters, most of the time this would aggregate over a bunch of matching dates and show average / sum / median / something meaningful. From this main view, you could click a stop. You'd get a bunch of info, broken down by the route variants that pass through there. There could just be the timetable from GTFS to start, along with some summary/visualization of delays or how the actual AVL trajectory matches the intended timetable. Then there could be a ridership tab, showing both 1st boardings and transfers for that stop + route variant combo. This probably gets shown as a timeseries, with configurable ranges / aggregation rules. Options to export CSV, maybe some kind of simple anomaly detection if a large date range is selected. Aside from this main network view, I still feel like some kind of "playback one day" mode could be useful, even if just for debugging. This would have the time slider and show real bus position, with styling / tooltips to indicate current capacity and delays. Maybe we draw a "ghost" bus to show how far the actual bus is behind from the schedule. |
screencast.mp4Part of the UI described above is now working. The main view shows stops and route variants, and you can filter by date. Next up will be clicking a stop to see different stats broken down by route variant. For the moment, I've decided route variant is the main unit of analysis that makes sense. The concept of a route is useful for reporting and communicating service to riders, but seeing exactly the stops it visits and the days it runs seems pretty vital. |
You can now click a stop to see all of the route variants passing through it. (And they're filtered down based on the days selected.) screencast.mp4Next step is to put stats on boarding events and on delays for this route+stop combo here. But to do that, we still need to join all the datasets. As a small next step there for debugging, I'm drawing a green circle to show ticketing events. You can hover over one to see how far the event is from the bus's current position according to AVL. I haven't taken any stats yet to make sure these datasets match up, but eyeballing it so far, things look good. screencast.mp4 |
First the easy update: on the web version (only), the app now has Mapbox drawn behind it. It doesn't sync super smoothly when you move around, but it's functional. I'll start thinking through how to display the extra routes/vehicles/stops on top with sane colors. screencast.mp4I've otherwise been trying to figure out how to match up the times/positions from ticketing data to narrow down a route variant. I modified the UI to be able to inspect the trajectories better: screencast.mp4The yellow circle is the selected bus. The pink line shows the "alternate theory" for where that bus is at the current time. The trajectory built from the BIL data will differ from the AVL trajectory when the bus doesn't pick up any passengers on some stops. That explains all of the straight lines between stops shown. So I still need to come up with a metric for scoring how likely a route variant "explains" AVL or BIL data (or both). I think I'm going to snap ticketing events to possible stops and check the order of those next. |
I got kind of stuck today, but I at least wrote down the structure that I'm trying to assemble:
My latest idea to do the matching is to create a time-space trajectory from the GTFS schedule, then try to match AVL trajectories to that. To prune the search, the BIL data will at least say what possible routes (based on short name) a vehicle handles. GTFS trips have an effective start and end time, so that can be used to clip the AVL trajectory. If buses aren't horribly delayed, then some kind of distance function between trajectories might do the trick. I'll give that a shot first. Flying to New Orleans tomorrow for family stuff, next update might be Wednesday |
Hiatus ending now. Some of the updates from the last few days are summarized in #9. I'll just show a few things. First, you can choose a vehicle, find all possible GTFS trips that it might serve, and scroll through trajectories. The pink line shows the equivalent position in the AVL data for that time, to eyeball if the AVL data matches the trip trajectory well or not: screencast.mp4And I made progress splitting the AVL trajectory into non-overlapping pieces, which should make it easier to reason about matching to trips. This splitting is still kind of brittle and has issues near interstate cloverleaves/fly-overs, but it's better: screencast.mp4 |
Small bit of visualization how well the trajectory from an expected GTFS trip matches AVL today: screencast.mp4The red line is AVL, clipped to the time range of the cyan GTFS trip. The geometry often matches pretty decently. The problem is that it looks like there's usually a big time offset / delay from the schedule. So if we look at where the bus is during one trip that's supposed to happen, it's almost meaningless -- the bus could still be working on another trip in the opposite direction. A vague idea: try to first chop up one vehicle's trajectory into a sequence of route variants. In the common case, that's just two variants (opposite directions) back and forth. Then we know the sequence of expected stop positions, so we can find all times the vehicle passes close to that stop, and put things in order based on the expectation. The other idea I started today was an inversion of how matching has been happening. Instead of trying to match vehicles to a list of trips, instead look at all of the trips that're supposed to happen as "demand", and vehicles that might be serving that trip as "supply." A vehicle can only serve one trip at a time, so make a table of non-overlapping time intervals per vehicle, and try doing the assignment backwards. The naive / greedy approach assigns about 3000 (half) the expected trips to vehicles, with the other half unassigned. That's a huge missing chunk. I still need to understand the programmed file exceptions (#8) to account for these, probably. But a few wins -- for route 349, all 47 trips successfully get assigned to 1 vehicle. This is at least one simple case where we could now try to refine the assignment details. |
Huge progress with matching! A refinement of the idea from yesterday just enforces that times between stops increase. There's noise when a bus passes close to stops out-of-order, especially when there are two stops on opposite sides of a street. So if we can assume the first time a bus is near the first stop is correct, then we can build things up from there. Some examples for vehicle 224 (showing new UI bits useful for debugging): screencast.mp4Vehicle 224 follows route 430 from 5:56 to 6:40 -- verifying manually, it's a great match! The same heuristic claims another round of this happens from 7:42 to 10:12 -- but that's much longer, what's happening? The problem is between stop 22 -> 23: screencast.mp4Using a 10 meter threshold, the trajectory doesn't get close enough until 8:59. Even though from watching manually, it clearly happens at 8:01. This particular issue goes away if I increase the threshold to 20 meters, but then a few more resulting trips are found with unexplainably large ~hour long intervals between nearby stops. I'll keep iterating on this to only wind up with good trips. |
No major algorithmic advances, but I'm now getting much better results for matching everything together. The trick has been to increase the distance between a bus position and stop for a possible match. Currently, the total stats for one day are:
My strategy to keep improving this:
The matching process is now getting to the point where it produces enough useful data to really think about next steps. I have some scattered thoughts about that, but I'll try and post about that tomorrow. Edit: I also posted sample data and started a discussion at movingpandas/movingpandas#229 to get more ideas from people working with trajectory data |
I fixed a bug with sorting vehicle schedules, and it really improved the sanity check on boarding times vs bus arrival times: Of the matched, how long between the bus arriving and the ticketing? 99,375 count, 50%ile 43s, 90%ile 57min 26s, 99%ile 5hr 57min 14.6s, min 0s, mean 23min 9.5s, max 14hr 13min 57s I made a bunch of UI changes to make it easier to debug vehicles doing strange things (not serving routes for a while, serving unusual trips with high delays) and jump back and forth between different bits: screencast.mp4Finally, I wrote up https://github.com/dabreegster/bus_spotting/blob/main/design.md, which breaks down the work so far as 3 layers. The focus for the remaining weeks will be on layer 3, though the matching in layer 2 still needs work. |
I spent the last few days rearranging code to have a clear split between the single day and multiday model. The split includes the UIs -- the interface that lets you replay a single day and debug trajectory matching is totally unrelated to the multiday UI (which right now just displays the GTFS scheduled stuff). Remaining work will be on the multiday UI + analyses. Edit: and I tried a fun route visualization experiment in #11 |
I got inspired to work on #11 again. The import now optionally takes a .osm.xml file, pipes it through https://github.com/a-b-street/osm2streets, does extremely simple snapping of bus routes to the street network (with many known limitations), and attempts to draw non-overlapping routes. And as a side effect of bringing in a street network, we can subtly draw that underneath for additional context. screencast.mp4Now working on docs / a summary for tomorrow's meeting. |
I'll post here every few days with current progress/problems. Here's the first. Since this is the first update, I'll also briefly cover what's happened in the past few days.
AVL
There's a mode to load an AVL file and animate bus trajectory, but it doesn't do anything useful yet:
screencast.mp4
Loading data
Most of the focus the past few days has been getting the import flow to work. When running natively, it's not hard to point to a folder with all the GTFS, BIL, AVL, etc data. But to be a web-first app, it's unclear how to read a whole directory in a browser. So instead, I decided to make the input just be a single .zip file. This can be loaded either natively or in the browser; nothing is uploaded anywhere in either case. That import takes a few seconds (currently) and will increase as we start importing longer timespans.
The import only needs to happen once. So ideally after we import, we save the final model, and can more cheaply load that in the future. That works fine on native, but in the browser, I'm having trouble getting https://github.com/a-b-street/abstreet/blob/1df9eb940a3464ecdab4361cb191ada6807b696c/abstio/src/io_web.rs#L185 to work consistently. There seem to be file size limits with this trick.
GTFS
Most of the focus has been on rearranging GTFS data to be meaningful. Here's a demo of how things work now:
screencast.mp4
You can select a single route at a time. A route consists of many individual trips, but these can be further grouped into "variants" that better match intuition. A single route usually has about 4 variants -- outbound and inbound, for weekdays and weekends. A different set of stops is visited in each variant.
There's a "filter dates" widget at the top, but it doesn't work yet. GTFS describes routes that occur over a long timespan, so to ask questions about a route, it may be necessary to also specify when we want to view that route.
I'm trying to build up to the point where we can also load the AVL and BIL datasets, and start to link everything together. For a particular day, we can subset the GTFS schedule and find only the trips scheduled for that day. Then we can link up
route_short_name
and vehicle IDs, and figure out what GTFS trips a vehicle is serving. That'll let us interpret the AVL trajectory and produce an "actual" schedule of stop times (a list of (stop ID, arrival time, departure time) tuples). We can compare that to the "idealized" schedule in GTFS and start to measure delay. At first that definition will just focus on one day at a time, but then we can think about aggregating over longer timespans.The text was updated successfully, but these errors were encountered: