Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the preparation time vary so much for different maps? #34

Closed
easbar opened this issue Jul 14, 2021 · 13 comments · Fixed by #37
Closed

Why does the preparation time vary so much for different maps? #34

easbar opened this issue Jul 14, 2021 · 13 comments · Fixed by #37

Comments

@easbar
Copy link
Owner

easbar commented Jul 14, 2021

The preparation time varies drastically depending on the graph we use. Take for example these graphs (included in this repository), that are similar in size:

name nodes edges edges/nodes preparation time (ms) fast_graph edges fast_graph edges/node prep per node (μs) query time (μs)
Bremen dist 40_461 85_090 2.10 429 66_520 1.64 10 18
Bremen time 40_461 85_090 2.10 6_868 62_518 1.54 169 13
Bremen uniform* 40_461 85_090 2.10 307 65_669 1.62 8 15
South Seattle 29_763 49_604 1.67 16_750 64_029 2.15 560 32

The measurements shown in the table are just the result of running the tests in lib.rs on my laptop.

*Update: For the Bremen uniform graph I used the Bremen time graph, but set all edge weights to a constant value of 10.

I'd really like to find out where this difference comes from and it would be especially useful to speed up the preparation of the abstreet graphs (South Seattle here) somehow. Also related: #33.

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

Surprisingly (for me) the map with the slowest preparation is the one with the smallest node degree (edges per node). But we also see a huge difference (approx. x17) between Bremen dist and time, which both share the same topology, and only use different edge weights. The number of created shortcuts (fast_graph edges) per node does not seem to correlate with the preparation time. For South Seattle it is biggest (2.15) and the preparation is the slowest, but for Bremen time it is smaller than for Bremen dist and the preparation for Bremen time is still much slower than for Bremen dist.

@dabreegster
Copy link
Contributor

My conceptual understanding of CHs is very weak, but is the number of shortcuts created relevant? My intuition is that CHs work by finding the most "useful" edges in the graph -- first route to the highway, then skip lots of searching until it's time to exit onto local roads again. The South Seattle map doesn't have a clear set of "highways", based on the current edge weights I'm using. I don't know about the Brement time/dist graphs, but maybe that's also the case there.

Another simple experiment I can try: keep the same edge-expanded graph structure, but just use a simple distance cost function, ignoring speed, left turn penalties, etc. I'll report back if the preparation time differs dramatically because of that.

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

I think the slow preparation is simply due to some edges with extreme weights. For example if I remove all edges with weight >= 100 from the South Seatlle car map there will be 48_547 of the original 49_604 left (so almost all of them, but some outliers are removed). The median weight is just 6. But the preparation time is already down to 2400ms (from 16750ms), a x7 speedup! If I simply set all edge weights to 10 (uniform edge weight distribution) the preparation time even goes down to 2000ms.

The maximum edge weights in South Seattle are around 11_000. Does this even make sense? In a graph with median edge weight of 6 this might be like traversing 1800 edges, but the whole graph has only 50_000 edges.

Anyway, @dabreegster can you check why there are some edges with extreme weights in your graph? With my current understanding you should expect much faster preparation times if you exclude these edges.

And obviously fast_paths should better deal with such large edge weights. I think when fast_paths encounters such an edge it explores almost the entire graph to search for witnesses and this can drastically increase the total preparation time.

but is the number of shortcuts created relevant?

Definitely relevant in terms of the memory used to store the graph and also strongly related to the resulting query times. I think it is less clear how this corresponds to the preparation time, but I just included it in the table because I was curious. A very bad heuristic would lead to very very many shortcuts, but here I don't think we are in this regime.

My intuition is that CHs work by finding the most "useful" edges in the graph -- first route to the highway, then skip lots of searching until it's time to exit onto local roads again

I think a better intuition would be that CH finds the most important nodes (important intersections) and adds artificial edges (shortcuts) between them, so the Dijkstra tree quickly expands (with only a few iterations) along these important intersections (you can think main roads).

The South Seattle map doesn't have a clear set of "highways", based on the current edge weights I'm using

Yes, thinking a very 'hierarchical' graph works best for CH is very common, but here I saw that even using the same edge weight for all edges leads to a very fast preparation (and queries). Also a 'hierarchical' road network (some roads clearly faster than others) does not even necessarily imply 'hierarchical' edge weights, because the edge weight is not just the speed, but also depends on the distance of an edge. For example 'fast' edges on a highway are often also rather long (they connect two exits of a highway for example) so their weight might be bigger than the weight of some 'slow' country road.

Another simple experiment I can try: keep the same edge-expanded graph structure, but just use a simple distance cost function, ignoring speed, left turn penalties, etc. I'll report back if the preparation time differs dramatically because of that.

Yes, definitely! First thing I would try is setting all edge weights to a constant value, and then maybe try excluding edges with a very large weight (maybe they can be removed entirely, because large weight edges might never be part of a shortest path).

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

@dabreegster Can you please also try this quick fix in your application? 634957a

@dabreegster
Copy link
Contributor

can you check why there are some edges with extreme weights in your graph?

I'm about to dig in and verify, but I suspect the problem is this: a-b-street/abstreet#582
To model private areas where you cannot drive through, but you can start or end a trip, I'm using a huge penalty to the edge entering or leaving the area. The edge cost is the equivalent of 3 hours. https://github.com/a-b-street/abstreet/blob/10fdaa8e92fe4b5be5441077834cb71efeca4a24/map_model/src/pathfind/mod.rs#L132

I might need to revisit how this works...

About to try out your fix.

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

To model private areas where you cannot drive through, but you can start or end a trip,

Yes, this makes sense (GraphHopper does the same for example). Still the cost should not be more than e.g. crossing the entire city or something. Either way it should not be too much of your concern, because the large edge weights should just be dealt with by fast_paths instead (there is no reason why that should not work).

@dabreegster
Copy link
Contributor

The huge edge weights look like the smoking gun.

  • Baseline: 15s
  • Change zone_cost to add a few seconds penalty instead of 3 hours: 3.2s
  • Keep the high zone_cost but use your large_edge_weights_quick_fix branch: 3.2s

I can run some more tests to verify that the queries come out the same with your fix and without it, if that's helpful.

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

I can run some more tests to verify that the queries come out the same with your fix and without it, if that's helpful.

Sure, yes. I verified that the calculated routes are the same and that the queries are still fast using the South Seattle map, but of course it's better if you check as well. I'll definitely make this max_nodes parameter configurable, but I'm still wondering if there is something better we can do. For example we could base the max_nodes on previous searches or check explicitly whether or not a given edge is an outlier in terms of the edge weight. Will have to try with bigger OSM maps as well.

@dabreegster
Copy link
Contributor

Rebuilding a map with your branch does change the serialized CH. But the net impact just looks like noise:
Screenshot from 2021-07-14 15-14-36
The points on the right are all practically on the diagonal, meaning individual trips were marginally slower or faster. Individual routes I'm testing involving the high-cost edges still behave as expected.

@easbar
Copy link
Owner Author

easbar commented Jul 14, 2021

Rebuilding a map with your branch does change the serialized CH.

Yes, this is expected. What should not change is the outcome of the route queries (the same shortest path should be found unless there are multiple shortest paths in which case the result might be different, but the weight of the shortest path should always be the same). The speed of single queries is expected to change as well, obviously.

@dabreegster
Copy link
Contributor

the same shortest path should be found unless there are multiple shortest paths in which case the result might be different, but the weight of the shortest path should always be the same

This explains the results I'm seeing. From my perspective, your fix appears correct.

Another quick test using a constant weight for every edge: 2.4s (down from 3.2s with just large_edge_weights_quick_fix)

dabreegster added a commit to a-b-street/abstreet that referenced this issue Jul 18, 2021
dramatically improve time to import and edit maps.

The fix helps all maps that use extremely high edge weights to prevent
people from cutting through private roads. There may be a more robust
fast_paths fix later, but I want to reap the benefits for tomorrow's
release.

The dramatic numbers:

- importing huge_seattle: 893s down to 108s
- editing huge_seattle: 102s down to 19s

Query speeds didn't appear to substantially change.
@PayasR
Copy link

PayasR commented Aug 10, 2021

Okay your "quick fix" is actually the right thing to do when building CHs; witness search is actually supposed to be limited to a small set of nodes. Still, I'd recommend setting the threshold a little higher (500 nodes), so it doesn't affect the query times on larger graphs.

@easbar
Copy link
Owner Author

easbar commented Aug 12, 2021

What do you base your recommendation of 500 nodes on? Did you run any measurements with different thresholds? I chose 100 for the sole purpose of providing a quick fix for @dabreegster after trying a few different values on one of his maps. Using no limit at all yields the extreme case of maximum preparation time and minimum amount of unnecessary shortcuts. This works well unless there are edges with significantly larger weights than, say, the average edge weight of the given graph. For edges with average weights the limit is not needed, because witness searches are already limited by the weight of the skipped edges.

I thought about implementing some kind of heuristic that chooses the maximum visited nodes threshold depending on the contracted node, the weight of its adjacent edges and the weight distribution of the graph. But this seems a bit overkill. For the actual implementation in this crate I think the threshold should just be configurable so users can optimize it for the kind of maps they are using. IMO there should even be different thresholds for priority calculation and actual node contraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants