Why does the preparation time vary so much for different maps? #34

easbar · 2021-07-14T19:24:23Z

The preparation time varies drastically depending on the graph we use. Take for example these graphs (included in this repository), that are similar in size:

name	nodes	edges	edges/nodes	preparation time (ms)	fast_graph edges	fast_graph edges/node	prep per node (μs)	query time (μs)
Bremen dist	40_461	85_090	2.10	429	66_520	1.64	10	18
Bremen time	40_461	85_090	2.10	6_868	62_518	1.54	169	13
Bremen uniform*	40_461	85_090	2.10	307	65_669	1.62	8	15
South Seattle	29_763	49_604	1.67	16_750	64_029	2.15	560	32

The measurements shown in the table are just the result of running the tests in lib.rs on my laptop.

*Update: For the Bremen uniform graph I used the Bremen time graph, but set all edge weights to a constant value of 10.

I'd really like to find out where this difference comes from and it would be especially useful to speed up the preparation of the abstreet graphs (South Seattle here) somehow. Also related: #33.

The text was updated successfully, but these errors were encountered:

easbar · 2021-07-14T19:30:41Z

Surprisingly (for me) the map with the slowest preparation is the one with the smallest node degree (edges per node). But we also see a huge difference (approx. x17) between Bremen dist and time, which both share the same topology, and only use different edge weights. The number of created shortcuts (fast_graph edges) per node does not seem to correlate with the preparation time. For South Seattle it is biggest (2.15) and the preparation is the slowest, but for Bremen time it is smaller than for Bremen dist and the preparation for Bremen time is still much slower than for Bremen dist.

dabreegster · 2021-07-14T19:58:36Z

My conceptual understanding of CHs is very weak, but is the number of shortcuts created relevant? My intuition is that CHs work by finding the most "useful" edges in the graph -- first route to the highway, then skip lots of searching until it's time to exit onto local roads again. The South Seattle map doesn't have a clear set of "highways", based on the current edge weights I'm using. I don't know about the Brement time/dist graphs, but maybe that's also the case there.

Another simple experiment I can try: keep the same edge-expanded graph structure, but just use a simple distance cost function, ignoring speed, left turn penalties, etc. I'll report back if the preparation time differs dramatically because of that.

easbar · 2021-07-14T21:09:13Z

I think the slow preparation is simply due to some edges with extreme weights. For example if I remove all edges with weight >= 100 from the South Seatlle car map there will be 48_547 of the original 49_604 left (so almost all of them, but some outliers are removed). The median weight is just 6. But the preparation time is already down to 2400ms (from 16750ms), a x7 speedup! If I simply set all edge weights to 10 (uniform edge weight distribution) the preparation time even goes down to 2000ms.

The maximum edge weights in South Seattle are around 11_000. Does this even make sense? In a graph with median edge weight of 6 this might be like traversing 1800 edges, but the whole graph has only 50_000 edges.

Anyway, @dabreegster can you check why there are some edges with extreme weights in your graph? With my current understanding you should expect much faster preparation times if you exclude these edges.

And obviously fast_paths should better deal with such large edge weights. I think when fast_paths encounters such an edge it explores almost the entire graph to search for witnesses and this can drastically increase the total preparation time.

but is the number of shortcuts created relevant?

Definitely relevant in terms of the memory used to store the graph and also strongly related to the resulting query times. I think it is less clear how this corresponds to the preparation time, but I just included it in the table because I was curious. A very bad heuristic would lead to very very many shortcuts, but here I don't think we are in this regime.

My intuition is that CHs work by finding the most "useful" edges in the graph -- first route to the highway, then skip lots of searching until it's time to exit onto local roads again

I think a better intuition would be that CH finds the most important nodes (important intersections) and adds artificial edges (shortcuts) between them, so the Dijkstra tree quickly expands (with only a few iterations) along these important intersections (you can think main roads).

The South Seattle map doesn't have a clear set of "highways", based on the current edge weights I'm using

Yes, thinking a very 'hierarchical' graph works best for CH is very common, but here I saw that even using the same edge weight for all edges leads to a very fast preparation (and queries). Also a 'hierarchical' road network (some roads clearly faster than others) does not even necessarily imply 'hierarchical' edge weights, because the edge weight is not just the speed, but also depends on the distance of an edge. For example 'fast' edges on a highway are often also rather long (they connect two exits of a highway for example) so their weight might be bigger than the weight of some 'slow' country road.

Another simple experiment I can try: keep the same edge-expanded graph structure, but just use a simple distance cost function, ignoring speed, left turn penalties, etc. I'll report back if the preparation time differs dramatically because of that.

Yes, definitely! First thing I would try is setting all edge weights to a constant value, and then maybe try excluding edges with a very large weight (maybe they can be removed entirely, because large weight edges might never be part of a shortest path).

easbar · 2021-07-14T21:45:32Z

@dabreegster Can you please also try this quick fix in your application? 634957a

dabreegster · 2021-07-14T21:59:51Z

can you check why there are some edges with extreme weights in your graph?

I'm about to dig in and verify, but I suspect the problem is this: a-b-street/abstreet#582
To model private areas where you cannot drive through, but you can start or end a trip, I'm using a huge penalty to the edge entering or leaving the area. The edge cost is the equivalent of 3 hours. https://github.com/a-b-street/abstreet/blob/10fdaa8e92fe4b5be5441077834cb71efeca4a24/map_model/src/pathfind/mod.rs#L132

I might need to revisit how this works...

About to try out your fix.

easbar · 2021-07-14T22:06:22Z

To model private areas where you cannot drive through, but you can start or end a trip,

Yes, this makes sense (GraphHopper does the same for example). Still the cost should not be more than e.g. crossing the entire city or something. Either way it should not be too much of your concern, because the large edge weights should just be dealt with by fast_paths instead (there is no reason why that should not work).

dabreegster · 2021-07-14T22:10:54Z

The huge edge weights look like the smoking gun.

Baseline: 15s
Change zone_cost to add a few seconds penalty instead of 3 hours: 3.2s
Keep the high zone_cost but use your large_edge_weights_quick_fix branch: 3.2s

I can run some more tests to verify that the queries come out the same with your fix and without it, if that's helpful.

easbar · 2021-07-14T22:14:57Z

I can run some more tests to verify that the queries come out the same with your fix and without it, if that's helpful.

Sure, yes. I verified that the calculated routes are the same and that the queries are still fast using the South Seattle map, but of course it's better if you check as well. I'll definitely make this max_nodes parameter configurable, but I'm still wondering if there is something better we can do. For example we could base the max_nodes on previous searches or check explicitly whether or not a given edge is an outlier in terms of the edge weight. Will have to try with bigger OSM maps as well.

dabreegster · 2021-07-14T22:16:57Z

Rebuilding a map with your branch does change the serialized CH. But the net impact just looks like noise:

The points on the right are all practically on the diagonal, meaning individual trips were marginally slower or faster. Individual routes I'm testing involving the high-cost edges still behave as expected.

easbar · 2021-07-14T22:20:15Z

Rebuilding a map with your branch does change the serialized CH.

Yes, this is expected. What should not change is the outcome of the route queries (the same shortest path should be found unless there are multiple shortest paths in which case the result might be different, but the weight of the shortest path should always be the same). The speed of single queries is expected to change as well, obviously.

dabreegster · 2021-07-14T22:24:05Z

the same shortest path should be found unless there are multiple shortest paths in which case the result might be different, but the weight of the shortest path should always be the same

This explains the results I'm seeing. From my perspective, your fix appears correct.

Another quick test using a constant weight for every edge: 2.4s (down from 3.2s with just large_edge_weights_quick_fix)

dramatically improve time to import and edit maps. The fix helps all maps that use extremely high edge weights to prevent people from cutting through private roads. There may be a more robust fast_paths fix later, but I want to reap the benefits for tomorrow's release. The dramatic numbers: - importing huge_seattle: 893s down to 108s - editing huge_seattle: 102s down to 19s Query speeds didn't appear to substantially change.

PayasR · 2021-08-10T16:26:01Z

Okay your "quick fix" is actually the right thing to do when building CHs; witness search is actually supposed to be limited to a small set of nodes. Still, I'd recommend setting the threshold a little higher (500 nodes), so it doesn't affect the query times on larger graphs.

easbar · 2021-08-12T12:03:37Z

What do you base your recommendation of 500 nodes on? Did you run any measurements with different thresholds? I chose 100 for the sole purpose of providing a quick fix for @dabreegster after trying a few different values on one of his maps. Using no limit at all yields the extreme case of maximum preparation time and minimum amount of unnecessary shortcuts. This works well unless there are edges with significantly larger weights than, say, the average edge weight of the given graph. For edges with average weights the limit is not needed, because witness searches are already limited by the weight of the skipped edges.

I thought about implementing some kind of heuristic that chooses the maximum visited nodes threshold depending on the contracted node, the weight of its adjacent edges and the weight distribution of the graph. But this seems a bit overkill. For the actual implementation in this crate I think the threshold should just be configurable so users can optimize it for the kind of maps they are using. IMO there should even be different thresholds for priority calculation and actual node contraction.

easbar mentioned this issue Jul 14, 2021

Turning this into a general-purpose library Stunkymonkey/osm_ch#1

Open

easbar mentioned this issue Sep 12, 2021

Add parameters to limit number of settled nodes during preparation, much faster preparation for certain graphs #37

Merged

2 tasks

easbar closed this as completed in #37 Sep 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does the preparation time vary so much for different maps? #34

Why does the preparation time vary so much for different maps? #34

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021 •

edited

Loading

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 •

edited

Loading

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 •

edited

Loading

dabreegster commented Jul 14, 2021

PayasR commented Aug 10, 2021 •

edited

Loading

easbar commented Aug 12, 2021 •

edited

Loading

Why does the preparation time vary so much for different maps? #34

Why does the preparation time vary so much for different maps? #34

Comments

easbar commented Jul 14, 2021 • edited Loading

easbar commented Jul 14, 2021 • edited Loading

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 • edited Loading

easbar commented Jul 14, 2021

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 • edited Loading

dabreegster commented Jul 14, 2021

easbar commented Jul 14, 2021 • edited Loading

dabreegster commented Jul 14, 2021

PayasR commented Aug 10, 2021 • edited Loading

easbar commented Aug 12, 2021 • edited Loading

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021 •

edited

Loading

easbar commented Jul 14, 2021 •

edited

Loading

PayasR commented Aug 10, 2021 •

edited

Loading

easbar commented Aug 12, 2021 •

edited

Loading