Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting Jensen's inequality: accessibility from travel-time percentiles vs accessibility percentiles from travel-times #785

Open
botanize opened this issue Jan 28, 2022 · 9 comments

Comments

@botanize
Copy link

Hi, I use R5 for all sorts of accessibility work at Metro Transit (Minneapolis), and issue #66 affects our work. From the linked OpenTripPlanner issue 2148 I infer that you've "redefined" accessibility to mean the access to opportunities for travel-time percentiles.

My concern is that your definition creates a travel-time matrix that's a chimera of actual itineraries and results in accessibility that can never be achieved.

For example, let's say the median travel time from A to B is 30 minutes and from A to C is 30 minutes. A and C are job centers with 100,000 jobs each. Route 1 serves A to B, but takes 25 minutes for departures from 8 to 8:30 and 35 from 8:30 to 9, while Route 2 serves A to C but takes 35 minutes for departures from 8 to 8:30 and 25 from 8:30 to 9. The median travel-time matrix is going to say both B and C are within 30 minute of A during the travel time window and 200,000 jobs are accessible. But in reality, at no time in the travel time window are both destinations available within 30 minutes and the true access to jobs during the window is 100,000.

I realize that calculating accessibility for each minute of the time window and each monte-carlo iteration is expensive, but I'd like the ability to get a "correct" accessibility measure from R5 that's comparable to the Accessibility Observatory method which correctly aggregates the accessibility, not the travel-time matrix.

@abyrd
Copy link
Member

abyrd commented Jan 29, 2022

Hi @botanize, it's great to hear you're getting some use out of R5 in your planning work. Thanks for contacting us, and I'll try to clarify things.

Your question seems to be based on the idea that computing access separately at each departure time and averaging the resulting indicators it is inherently "correct", and that we shifted away from that method to reduce computation or make a problem tractable. This is not the case.

First, we don't usually judge these definitions in terms of correctness. Much has been written about various measures of access in the academic literature and a wide range of methods and measures have been proposed. Our percentile-of-travel-time measure may be novel or unusual, but we don't consider it more or less "correct" than other definitions, it is just intentionally measuring something different that is appropriate for a specific use case. We have tried to be as transparent as possible about the implications of this method and its interpretation in the peer-reviewed articles we've published on our methodology. Some of our papers contain very similar examples to the one you gave, but serving as an example of why we prefer our method.

A distribution of instantaneous access values calculated independently at different departure times captures how access varies across a group of people each of whom is required leave at one exact time, with those times uniformly distributed across the time window. Each one of these travel times may be biased (positively or negatively depending on your point of view) by whether it includes initial waiting time or allows the rider to plan and adjust. Notably, it gives equal weight to each minute in the window though demand may vary across those times. The prevalence of values in this distribution is tied to how many minutes within the time window produce them, not how many people experience them or how often a particular individual will experience them under different conditions.

By contrast, in our method the distribution captures the variation due to how thoroughly the rider plans their trip by choosing any departure time in the specified window, together with how well the operator adheres to schedules, intersperses vehicles on common trunks, and provides accurate information on disruptions. This variation can be further broken down into the variation due to underspecified or unknown parameters in the scenario (the relative phase of frequency-based routes for example) and due to passenger and operator behavior. It reveals the uncertainty and potential variation in the accessibility value produced by a particular scenario, due to behavioral variation and missing details in the scenario.

We would usually capture the kind of variation you're interested in by running more than one analysis over different time windows throughout the day (morning peak, evening peak, midday, evening).

The different percentiles in our distributions represent the differing experiences of riders who plan a trip looking at schedules and real-time information (low percentiles) versus the experience over many days of of riders who leave randomly without consulting schedules (median) versus the one-off experience of someone who happens to leave at the worst moment or just misses the vehicle they should have taken (high percentile).

In underspecified scenarios (e.g. with a lot of low-frequency routes whose exact departure times are not given) the different percentiles in our distributions can also represent the best and worst case access riders may experience within the bounds of the scenario. Revealing this variation is very important for scenario comparison: a scenario that has a higher median instantaneous accessibility over a time window does not necessarily have a higher population-wide utility for all realizations of that scenario by riders and operators. It would be incorrect to judge such a scenario as inherently superior if its distribution of access scores overlapped significantly with that of another competing scenario. This is one of the greatest underlying motivations for our method: to be realistic about what models cannot tell us, revealing the degree of uncertainty that exists about which scenario provides greater utility.

Our tooling is primarily intended for cost-benefit analysis of diverging investment or operational scenarios all applying to a particular study region, not comparison of standardized scores for existing service across different regions.

Of course it's also interesting to see how these things vary over the course of the day, and we'd look at that by running more than one regional analysis over different time windows. But if you mix too many sources of variation together you lose the ability to judge when your scenario needs to be refined or when you're over-weighting service in low-demand periods etc.

From your point of view, our definition is overstating access because someone who must leave at exactly 8:15 AM can only reach 100k jobs. But for our use cases, the definition you favor is understating access because over the whole population whose utility we want to consider, some people can leave 15 minutes later and take jobs in the second job center instead of the first.

Like many access indicator definitions, our method is quite sensitive to the exact travel time threshold you choose. All 200,000 jobs are available at 30 minutes or not available at 29. This is why we encourage people to select a smooth decay function instead of a hard cutoff. If you were to choose a sigmoid decay function with a cutoff of 30 minutes, using median travel time all 200,000 of these jobs would count with a weight of 0.5. You get the same number you expect (100,000), but for the reason that the jobs are close to the threshold with travel times evenly distributed on either side of the threshold.

Also keep in mind that we never look at only the median figure. We usually retain at least five different percentiles and look at all of them, and some users of R5 look mainly at the 5th or 95th percentiles for certain use cases or to reflect local rider behavior. Using a low percentile, both of the job centers in your example are 25 minutes away and using a high percentile they are both 35 minutes away.

If in interpreting these results you interactively explored the outputs at different travel time thresholds and percentiles, these discontinuities would be apparent and influence your choice of parameters. To encourage such exploratory interpretation, we have ensured that we can efficiently compute a matrix of at least 5x5 travel time thresholds and percentiles, and we do this by default in all regional analyses. Use of a more gradual decay function would also eliminate many discontinuities.

Finally, returning to the second point in my original statement: This change was not made to reduce the amount of computation or make simulation tractable. We can certainly compute access at every iteration and average them all together, and we used to do exactly this. This change was made because we found the percentile-of-travel-time definition more theoretically sound, allowed more thorough discovery of realistic travel times, and allowed more of the underlying variation and uncertainty to show through in the results.

I hope this clarifies things, please let me know if you have any additional questions. I may move some of this comment into our documentation to provide more background on these choices.

@botanize
Copy link
Author

I appreciate this detailed response! There's no mention of any of these things in the relevant closed issues, which is why I opened this issue.

Can you share links to the papers you mentioned (if open-access) or pre-prints? I've added my reactions to your comments below, but I'd like to review the literature you mentioned. I can follow-up here, or directly with you after I've had a chance to review.

After reading your response a few times I think this comes down to a different set of assumptions. I am willing to make the assumption that each minute in the analysis time-window is equally important, in part because I think arrival time often is not terribly flexible, so choice of departure time may be limited by the required arrival time, and that shorter waits at the origin may mean longer waits at the destination. The systems I usually work with don't have much sub-5 or even sub-10 minute frequencies, or great headway reliability and schedule adherence for that matter, so arrival time sensitivity seems particularly important. I think your argument for aggregating travel-times is that people make decisions about when to leave based on total trip time, and that reducing waiting time at the origin reduces overall trip times (there's no compensatory wait at the destination). From there it's safer to assume that the distribution of start times is not uniform within the analysis window, but proportional to the trip-time itself (including waiting at the origin). Does that sound about right, or maybe I'm completely off?

Of course, neither set of assumptions is correct, but either may be useful in different circumstances or with different concerns.

First, we don't usually judge these definitions in terms of correctness. Much has been written about various measures of access in the academic literature and a wide range of methods and measures have been proposed. Our percentile-of-travel-time measure may be novel or unusual, but we don't consider it more or less "correct" than other definitions, it is just intentionally measuring something different that is appropriate for a specific use case. We have tried to be as transparent as possible about the implications of this method and its interpretation in the peer-reviewed articles we've published on our methodology. Some of our papers contain very similar examples to the one you gave, but serving as an example of why we prefer our method.

A distribution of instantaneous access values calculated independently at different departure times captures how access varies across a group of people each of whom is required leave at one exact time, with those times uniformly distributed across the time window. Each one of these travel times may be biased (positively or negatively depending on your point of view) by whether it includes initial waiting time or allows the rider to plan and adjust. Notably, it gives equal weight to each minute in the window though demand may vary across those times. The prevalence of values in this distribution is tied to how many minutes within the time window produce them, not how many people experience them or how often a particular individual will experience them under different conditions.

Well yes, but there is no way to know the origins, destinations and time-constrains for every person who would consider transit at any given time, and those would change constantly. I understand accessibility as a measure of the promise of the transit network. It's not intended to represent the sum of individual experiences, but the sum of possibilities. I also think the problem and example I shared applies to people needing to arrive at their destination at a prescribed time, which is common and often comes with very severe consequences (losing your job or childcare) that are usually not associated with departing an origin late.

By contrast, in our method the distribution captures the variation due to how thoroughly the rider plans their trip by choosing any departure time in the specified window, together with how well the operator adheres to schedules, intersperses vehicles on common trunks, and provides accurate information on disruptions. This variation can be further broken down into the variation due to underspecified or unknown parameters in the scenario (the relative phase of frequency-based routes for example) and due to passenger and operator behavior. It reveals the uncertainty and potential variation in the accessibility value produced by a particular scenario, due to behavioral variation and missing details in the scenario.

I don't see how aggregating travel-times provides any insight into schedule adherence, headway management or service disruptions which are explicitly not accounted for in the GTFS schedule used for the analysis. Do you mean that this method creates results that are similar to the variability you'd see if you calculated accessibility from a set of as-run times (AVL)?

We would usually capture the kind of variation you're interested in by running more than one analysis over different time windows throughout the day (morning peak, evening peak, midday, evening).

These are different service types, I'm interested in variation within and among service types.

The different percentiles in our distributions represent the differing experiences of riders who plan a trip looking at schedules and real-time information (low percentiles) versus the experience over many days of of riders who leave randomly without consulting schedules (median) versus the one-off experience of someone who happens to leave at the worst moment or just misses the vehicle they should have taken (high percentile).

This explanation makes some intuitive sense, but I think it requires an assumption that arrival time is flexible within the service window. There certainly are trip types where that's a reasonable assumption (recreation, shopping), but maybe most others are quite sensitive to arrival time (work commutes, medical, professional and social appointments). Trips that are sensitive to arrival time mean that if you're not waiting at your origin (because you planned your departure perfectly) you will probably be waiting at your destination—at least for service typical of mid-sized North American transit agencies.

In underspecified scenarios (e.g. with a lot of low-frequency routes whose exact departure times are not given) the different percentiles in our distributions can also represent the best and worst case access riders may experience within the bounds of the scenario. Revealing this variation is very important for scenario comparison: a scenario that has a higher median instantaneous accessibility over a time window does not necessarily have a higher population-wide utility for all realizations of that scenario by riders and operators. It would be incorrect to judge such a scenario as inherently superior if its distribution of access scores overlapped significantly with that of another competing scenario. This is one of the greatest underlying motivations for our method: to be realistic about what models cannot tell us, revealing the degree of uncertainty that exists about which scenario provides greater utility.

I think we agree that the variation is critical, we just disagree about which variation, and what the variation means.

Our tooling is primarily intended for cost-benefit analysis of diverging investment or operational scenarios all applying to a particular study region, not comparison of standardized scores for existing service across different regions.

I don't see why these should be incompatible.

Of course it's also interesting to see how these things vary over the course of the day, and we'd look at that by running more than one regional analysis over different time windows. But if you mix too many sources of variation together you lose the ability to judge when your scenario needs to be refined or when you're over-weighting service in low-demand periods etc.

From your point of view, our definition is overstating access because someone who must leave at exactly 8:15 AM can only reach 100k jobs. But for our use cases, the definition you favor is understating access because over the whole population whose utility we want to consider, some people can leave 15 minutes later and take jobs in the second job center instead of the first.

Again, this assumes that arrival time is as flexible as departure time, an assumption I'm not comfortable with.

Like many access indicator definitions, our method is quite sensitive to the exact travel time threshold you choose. All 200,000 jobs are available at 30 minutes or not available at 29. This is why we encourage people to select a smooth decay function instead of a hard cutoff. If you were to choose a sigmoid decay function with a cutoff of 30 minutes, using median travel time all 200,000 of these jobs would count with a weight of 0.5. You get the same number you expect (100,000), but for the reason that the jobs are close to the threshold with travel times evenly distributed on either side of the threshold.

We also use decay functions, and weight jobs by travel time. But longer travel times will mean fewer time-weighted jobs. And if one cares about arrival times, as I do, then differences in travel times can be very real and lead to important differences in weighted job access.

Also keep in mind that we never look at only the median figure. We usually retain at least five different percentiles and look at all of them, and some users of R5 look mainly at the 5th or 95th percentiles for certain use cases or to reflect local rider behavior. Using a low percentile, both of the job centers in your example are 25 minutes away and using a high percentile they are both 35 minutes away.

If in interpreting these results you interactively explored the outputs at different travel time thresholds and percentiles, these discontinuities would be apparent and influence your choice of parameters. To encourage such exploratory interpretation, we have ensured that we can efficiently compute a matrix of at least 5x5 travel time thresholds and percentiles, and we do this by default in all regional analyses. Use of a more gradual decay function would also eliminate many discontinuities.

These are great points and great tools, but I don't think it would be too hard to come up with a service scenario that resulted in travel-time distributions that were shifted relative to each-other depending on which part of the analysis window you were in.

Finally, returning to the second point in my original statement: This change was not made to reduce the amount of computation or make simulation tractable. We can certainly compute access at every iteration and average them all together, and we used to do exactly this. This change was made because we found the percentile-of-travel-time definition more theoretically sound, allowed more thorough discovery of realistic travel times, and allowed more of the underlying variation and uncertainty to show through in the results.

The computation comment was more of an aside, I was trying to rationalize the R5 method given my rider behavior assumptions. I think I am precisely concerned with the theoretical underpinnings of the choice. I do think that perhaps we have different assumptions about rider behavior and departure time flexibility. Your assessment of travel-times, variation and uncertainty seems entirely consistent with your assumptions. But perhaps there's room for R5 to support my assumptions as well?

@ansoncfit
Copy link
Member

ansoncfit commented Feb 2, 2022

Hi @botanize, good to hear from you. I think further responses are pending, but in the meantime, the first second paper under https://github.com/conveyal/r5#methodology may be of interest if you haven't seen it already. We also have a paper under revision about a frequency-based network (i.e. no published schedules) in a large Latin American city, which elaborates on some of the points above about variation and uncertainty. I can share that one once it's a bit further along in the review process.

@mattwigway
Copy link
Contributor

mattwigway commented Feb 2, 2022

Late to the party here as I haven't followed R5 development as closely since I jumped ship to academia, but I wanted to give my two cents as I was closely involved in the original development of these algorithms. I think the paper that is most relevant to our thinking on this issue is actually the second one linked in the link @ansoncfit pasted above (Conway, Byrd and van Eggermond, 2018), specifically sections 2 and 3. The metric you describe from the Accessibility Observatory is what we called in the paper "average instantaneous accessibility."

For example, let's say the median travel time from A to B is 30 minutes and from A to C is 30 minutes. A and C are job centers with 100,000 jobs each. Route 1 serves A to B, but takes 25 minutes for departures from 8 to 8:30 and 35 from 8:30 to 9, while Route 2 serves A to C but takes 35 minutes for departures from 8 to 8:30 and 25 from 8:30 to 9. The median travel-time matrix is going to say both B and C are within 30 minute of A during the travel time window and 200,000 jobs are accessible. But in reality, at no time in the travel time window are both destinations available within 30 minutes and the true access to jobs during the window is 100,000.

I would argue in this case that 200,000 jobs are actually accessible with a median travel time of 30 minutes, because jobs are non-fungible (at least in the short term). As a worker, it doesn't matter to me that the job I got is accessible at a different time than another job I could have gotten. You are correct that R5 doesn't explicitly account for waiting at the destination, only waiting at the origin - although a high percentile of travel time is somewhat related as it represents how long the trip could take if the schedules did not line up well. If you really wanted to account for waiting at the destination, you would need to run a shortest-path search from destinations to origins (a "reverse search") which I believe R5 does not yet support. OpenTripPlanner did support it but it was a mess of conditional code that would change the behavior based on search direction. What we've discussed in the past with R5 is to instead create a "backwards network" by multiplying all stop arrival times by -1 and reversing the directions of all patterns. Then running our standard "forward search" algorithm will produce reverse search results.

@abyrd
Copy link
Member

abyrd commented Feb 15, 2022

I appreciate this detailed response! There's no mention of any of these things in the relevant closed issues, which is why I opened this issue.

Thanks for sharing your perspectives. Yes, we should definitely write more about this, add it to documentation and maybe add some comments on those issues to point readers to relevant documentation. Below are a few responses to your points - not intending to invalidate your reasoning about access, which may be driven by particular use cases, but just to further clarify where our choices and my reactions are coming from.

I think your argument for aggregating travel-times is that people make decisions about when to leave based on total trip time, and that reducing waiting time at the origin reduces overall trip times (there's no compensatory wait at the destination).

We keep all travel times because people may make decisions based on total travel time, but may not choose to or be able to. Selecting specific percentiles then helps understand how wide the variation in accessibility results is under different behaviors.

From there it's safer to assume that the distribution of start times is not uniform within the analysis window, but proportional to the trip-time itself (including waiting at the origin).

We try to avoid making up front behavioral assumptions. We are instead trying to characterize the potential of the transport-land use system under a range of behaviors, and establish error bars on that characterization, to avoid artificially simplistic comparisons of alternate scenarios.

I also think the problem and example I shared applies to people needing to arrive at their destination at a prescribed time, which is common and often comes with very severe consequences (losing your job or childcare) that are usually not associated with departing an origin late.

We are indeed not handling the difference between opportunities with variable arrival time and those with a fixed arrival time. Opportunities with non-flexible arrival times may be narrowly concentrated at specific times determined by local culture (e.g. shift work or school) so perhaps the relevant measure here is just single arrival time (non-window) accessibility at a few key moments of the day. That does raise the problem of specifying an arrival time instead of a departure time, which I covered in another comment.

I don't see how aggregating travel-times provides any insight into schedule adherence, headway management or service disruptions which are explicitly not accounted for in the GTFS schedule used for the analysis.

The main use case of our system, and the use case driving most of the design decisions, is comparing future scenarios layered on top of GTFS by planners. These scenarios are often specified in terms of new or modified routes with frequencies and perhaps synchronized transfer points, but not known departure times. I wasn't referring to how the network described in the baseline GTFS is managed, but the range of different operational choices that could all match a loosely-defined scenario. We can't use one specific realization of scenario A and claim it is better than one specific realization of another scenario B. We want to see if all possible operational realizations of scenario A are better than all possible operational realizations of scenario B.

This explanation makes some intuitive sense, but I think it requires an assumption that arrival time is flexible within the service window. There certainly are trip types where that's a reasonable assumption (recreation, shopping), but maybe most others are quite sensitive to arrival time (work commutes, medical, professional and social appointments). Trips that are sensitive to arrival time mean that if you're not waiting at your origin (because you planned your departure perfectly) you will probably be waiting at your destination—at least for service typical of mid-sized North American transit agencies.

Inflexible arrival times are certainly an area we could handle better. But because inflexible arrival times are concentrated at specific clock times I would hesitate to summarize such results across a departure time window. It seems like accessibility to inflexible opportunities would be well-characterized by the single shortest travel time to arrive at each destination at a single fixed time. Where you really do want to characterize access with a fixed arrival time, but with equal weight given to all arrival times (e.g. medical and professional appointments) then yes, in that case averaging access computed for each separate arrival time seems like a good measure.

The majority of our customers have focused on peak-period access to employment, perhaps because the peak tends to define the overall dimension of infrastructure and service, and simply because the data are more readily available. More nuanced measures of access would need to apply different methods for different destination types.

@mattwigway
Copy link
Contributor

In my comment above (and @abyrd's edit) I think there was some confusion about my use of the term reverse searches - as we've used it to mean two different things. I was simply referring to arrive-by searches in order to compute accessibility based on a time window at the destination, rather than the path-compression we had in OTP. I think Andrew is correct that one can also do this by by simply using a long departure time window (probably one that ends at the desired arrival time) and letting range-RAPTOR find the earliest arrival for each departure time, and then filtering, though it's been a long time since I've thought very seriously about the RAPTOR algorithm.

@abyrd
Copy link
Member

abyrd commented Feb 15, 2022

@mattwigway I didn't mean to edit your comment, I intended to reply to it! I'll see if I can fix that...

@abyrd
Copy link
Member

abyrd commented Feb 15, 2022

Thanks for the links and commentary on the publications @mattwigway and @ansoncfit.

If you really wanted to account for waiting at the destination, you would need to run a shortest-path search from destinations to origins (a "reverse search") which I believe R5 does not yet support. OpenTripPlanner did support it but it was a mess of conditional code that would change the behavior based on search direction. What we've discussed in the past with R5 is to instead create a "backwards network" by multiplying all stop arrival times by -1 and reversing the directions of all patterns. Then running our standard "forward search" algorithm will produce reverse search results.

One of our significant realizations in recent years (maybe via OTP2, or were we already talking about this in R5?) is that once you have efficient time-range searches, a separate reverse search implementation is redundant. Range Raptor will find the shortest travel times departing at any minute in a window. If you throw out the itineraries that arrive too late, you will have all the same results as a reverse search from the destination, assuming a window of sufficient width. The window can be established from upper and lower bounds on travel time between the origin and destination, even loose ones.

Using this idea, R5 could be adapted to measure access to opportunities with fixed arrival times (e.g. shift work, schools) by just throwing out results that arrive after a specified time. That is, in addition to the departure time window, specifying a fixed arrival time. This would yield only one best travel time to each destination though, which is not so much an implementation problem as something that follows directly from the fixed arrival time constraint. (That this would also allow some serious optimization: destinations could be skipped for the rest of the search once any path was found to them). It's not clear how this would mesh with our inherently statistical approach for frequency-based scenario modeling, because it's not clear with fixed arrival times what determines the weight of different travel times in the distribution.

So considering all that, @botanize to me it doesn't seem coherent to assume a fixed but unknown latest acceptable arrival time at each destination, but also range the departure time over a window and compute summary statistics over the values produced at each of those departure times (e.g. compute the average of the instantaneous accessibilities for each departure time in a range, assuming inflexible "on time" arrival is required at each destination). If on-time arrival is required, why should the travel times for each departure time be equally weighted in the summary statistic, when only one departure time would actually be appropriate to reach each opportunity location on time? I'm not trying to characterize it as outright wrong, I just don't feel this should be seen as inherently "more correct".

@abyrd
Copy link
Member

abyrd commented Feb 15, 2022

@mattwigway please confirm whether I have reverted your comment to its original form. There doesn't seem to be a way to revert edits so this involved some copy and paste from the history. And if you get a chance maybe repost your last comment so they appear in order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants