-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does the preparation time vary so much for different maps? #34
Comments
Surprisingly (for me) the map with the slowest preparation is the one with the smallest node degree (edges per node). But we also see a huge difference (approx. x17) between Bremen dist and time, which both share the same topology, and only use different edge weights. The number of created shortcuts (fast_graph edges) per node does not seem to correlate with the preparation time. For South Seattle it is biggest (2.15) and the preparation is the slowest, but for Bremen time it is smaller than for Bremen dist and the preparation for Bremen time is still much slower than for Bremen dist. |
My conceptual understanding of CHs is very weak, but is the number of shortcuts created relevant? My intuition is that CHs work by finding the most "useful" edges in the graph -- first route to the highway, then skip lots of searching until it's time to exit onto local roads again. The South Seattle map doesn't have a clear set of "highways", based on the current edge weights I'm using. I don't know about the Brement time/dist graphs, but maybe that's also the case there. Another simple experiment I can try: keep the same edge-expanded graph structure, but just use a simple distance cost function, ignoring speed, left turn penalties, etc. I'll report back if the preparation time differs dramatically because of that. |
I think the slow preparation is simply due to some edges with extreme weights. For example if I remove all edges with weight >= 100 from the South Seatlle car map there will be 48_547 of the original 49_604 left (so almost all of them, but some outliers are removed). The median weight is just 6. But the preparation time is already down to 2400ms (from 16750ms), a x7 speedup! If I simply set all edge weights to 10 (uniform edge weight distribution) the preparation time even goes down to 2000ms. The maximum edge weights in South Seattle are around 11_000. Does this even make sense? In a graph with median edge weight of 6 this might be like traversing 1800 edges, but the whole graph has only 50_000 edges. Anyway, @dabreegster can you check why there are some edges with extreme weights in your graph? With my current understanding you should expect much faster preparation times if you exclude these edges. And obviously fast_paths should better deal with such large edge weights. I think when fast_paths encounters such an edge it explores almost the entire graph to search for witnesses and this can drastically increase the total preparation time.
Definitely relevant in terms of the memory used to store the graph and also strongly related to the resulting query times. I think it is less clear how this corresponds to the preparation time, but I just included it in the table because I was curious. A very bad heuristic would lead to very very many shortcuts, but here I don't think we are in this regime.
I think a better intuition would be that CH finds the most important nodes (important intersections) and adds artificial edges (shortcuts) between them, so the Dijkstra tree quickly expands (with only a few iterations) along these important intersections (you can think main roads).
Yes, thinking a very 'hierarchical' graph works best for CH is very common, but here I saw that even using the same edge weight for all edges leads to a very fast preparation (and queries). Also a 'hierarchical' road network (some roads clearly faster than others) does not even necessarily imply 'hierarchical' edge weights, because the edge weight is not just the speed, but also depends on the distance of an edge. For example 'fast' edges on a highway are often also rather long (they connect two exits of a highway for example) so their weight might be bigger than the weight of some 'slow' country road.
Yes, definitely! First thing I would try is setting all edge weights to a constant value, and then maybe try excluding edges with a very large weight (maybe they can be removed entirely, because large weight edges might never be part of a shortest path). |
@dabreegster Can you please also try this quick fix in your application? 634957a |
I'm about to dig in and verify, but I suspect the problem is this: a-b-street/abstreet#582 I might need to revisit how this works... About to try out your fix. |
Yes, this makes sense (GraphHopper does the same for example). Still the cost should not be more than e.g. crossing the entire city or something. Either way it should not be too much of your concern, because the large edge weights should just be dealt with by fast_paths instead (there is no reason why that should not work). |
The huge edge weights look like the smoking gun.
I can run some more tests to verify that the queries come out the same with your fix and without it, if that's helpful. |
Sure, yes. I verified that the calculated routes are the same and that the queries are still fast using the South Seattle map, but of course it's better if you check as well. I'll definitely make this max_nodes parameter configurable, but I'm still wondering if there is something better we can do. For example we could base the max_nodes on previous searches or check explicitly whether or not a given edge is an outlier in terms of the edge weight. Will have to try with bigger OSM maps as well. |
Yes, this is expected. What should not change is the outcome of the route queries (the same shortest path should be found unless there are multiple shortest paths in which case the result might be different, but the weight of the shortest path should always be the same). The speed of single queries is expected to change as well, obviously. |
This explains the results I'm seeing. From my perspective, your fix appears correct. Another quick test using a constant weight for every edge: 2.4s (down from 3.2s with just |
dramatically improve time to import and edit maps. The fix helps all maps that use extremely high edge weights to prevent people from cutting through private roads. There may be a more robust fast_paths fix later, but I want to reap the benefits for tomorrow's release. The dramatic numbers: - importing huge_seattle: 893s down to 108s - editing huge_seattle: 102s down to 19s Query speeds didn't appear to substantially change.
Okay your "quick fix" is actually the right thing to do when building CHs; witness search is actually supposed to be limited to a small set of nodes. Still, I'd recommend setting the threshold a little higher (500 nodes), so it doesn't affect the query times on larger graphs. |
What do you base your recommendation of 500 nodes on? Did you run any measurements with different thresholds? I chose 100 for the sole purpose of providing a quick fix for @dabreegster after trying a few different values on one of his maps. Using no limit at all yields the extreme case of maximum preparation time and minimum amount of unnecessary shortcuts. This works well unless there are edges with significantly larger weights than, say, the average edge weight of the given graph. For edges with average weights the limit is not needed, because witness searches are already limited by the weight of the skipped edges. I thought about implementing some kind of heuristic that chooses the maximum visited nodes threshold depending on the contracted node, the weight of its adjacent edges and the weight distribution of the graph. But this seems a bit overkill. For the actual implementation in this crate I think the threshold should just be configurable so users can optimize it for the kind of maps they are using. IMO there should even be different thresholds for priority calculation and actual node contraction. |
The preparation time varies drastically depending on the graph we use. Take for example these graphs (included in this repository), that are similar in size:
The measurements shown in the table are just the result of running the tests in
lib.rs
on my laptop.*Update: For the Bremen uniform graph I used the Bremen time graph, but set all edge weights to a constant value of 10.
I'd really like to find out where this difference comes from and it would be especially useful to speed up the preparation of the abstreet graphs (South Seattle here) somehow. Also related: #33.
The text was updated successfully, but these errors were encountered: