Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using min-distance as criterium/weight #9

Open
skinkie opened this issue Oct 24, 2016 · 24 comments
Open

Using min-distance as criterium/weight #9

skinkie opened this issue Oct 24, 2016 · 24 comments

Comments

@skinkie
Copy link
Contributor

skinkie commented Oct 24, 2016

The documentation suggest we have currently three vectors available that can potentially be used as weight.

std::vector<unsigned>geo_distance;
std::vector<unsigned>travel_time;
std::vector<unsigned>speed;

The distance as described on the frontpage example is in fact result cost in seconds. Thus is the sum of all travel_times over all used arc's. This suggests that the weight function is travel_time which is an argument to ContractionHierarchy::build. If a user would like to get travel_time in concert with distance as "given this is the shortest path, this is the travel time required" would this require to recurse over the path found and use the arc index to query the additional weight, or is there something more elegant possible? Ideally I would want to fly at it as a multi-criteria weight.

@skinkie
Copy link
Contributor Author

skinkie commented Oct 24, 2016

I have taken a peak at pinning and get_arc_path which is at this moment not very compatible with that concept. Would it be possible to store the shortest_path_meeting_node in a vector, or are forward_predecessor_node and backward_predecessor_node not stable in a pinned run?

@ben-strasser
Copy link
Member

I wrote a reply to your first post, but because of some reasons it does not appear here. Oh well, here is a copy:

Hi,

This suggests that the weight function is travel_time which is an argument to ContractionHierarchy::build.

If you want more flexibility, then look at the CCH stuff. A CH is designed for a single immutable weight. CH performance varies depending on the weights.

If a user would like to get travel_time in concert with distance as "given this is the shortest path, this is the travel time required" would this require to recurse over the path found and use the arc index to query the additional weight, or is there something more elegant possible?

Use the following code:

auto p = q.get_arc_path();
unsigned t = 0;
for(auto a:p)
    t += travel_time[a];

In terms of performance, it is very hard to beat, if you need the path anyway.

Ideally I would want to fly at it as a multi-criteria weight.
If you want to, for example, avoid tunnels, go at at most 50km/h, and hate going through some region you can do the following:

vector<int>weight(arc_count);
for(int xy=0; xy<arc_count; ++xy){
    if(is_tunnel(xy) || is_in_no_go_region(latitude[tail[xy]], longitude[tail[xy]])){
        weight[xy] = inf_weight;
    }else{
        weight[xy] = geo_distance[xy] * std::min(50, speed[xy]) * unit_conversion_factor;
    }
}

You can be very creative when it comes to weights.

With a CCH, but not with a CH, you have the proven guarantee, that the distance query running time and the memory consumption will not vary no matter how absurd the weights are that you throw at it. This property is very useful if you want to let the user specify the details of his weights.

Best Regards
Ben Strasser

As to your second post: I do not really understand what you are trying to achieve.

@skinkie
Copy link
Contributor Author

skinkie commented Oct 25, 2016

I am now trying the following:

ch_query.reset();
for(auto s:source_list) {
    if (s == invalid_id) {
        /* TODO: haversine */

    } else {
        ch_query.reset_source();
        ch_query.add_source(s);
        for (auto t:target_list) {
            json pair;
            if (t != invalid_id) {
                ch_query.reset_target().add_target(t).run();
                // ch_query.reset().add_source(s).add_target(t).run();
                std::vector<unsigned>arcs = ch_query.get_arc_path();
                unsigned distance = ch_query.get_distance();
                if (distance != 0) {
                    unsigned travel_time = 0;
                    for (auto a:arcs) travel_time += graph.travel_time[a];
                    pair.push_back(ch_query.get_distance());
                    pair.push_back(travel_time);
                }   
            }   
            if (pair.is_null()) {
                /* TODO: haversine */
            }   
            reply["matrix"].push_back(pair);
        }   
    }   
}   

I wondered if I could do the same more efficiently when operating on a run_to_pinned_targets or run_to_pinned_sources:

std::vector<unsigned> d = ch_query.reset_source().add_source(s).run_to_pinned_targets().get_distances_to_targets();

So something like: get_arc_paths.

@ben-strasser
Copy link
Member

Hi,

the basic usage pattern is

ContractionHierarchyQueryquery(ch);
std::vector<unsigned>source_list = ...;
std::vector<unsigned>target_list = ...;

query.reset().pin_targets(target_list);

for(auto  s:source_list){
     std::vector<unsigned> d = query.reset_source().add_source(s).run_to_pinned_targets().get_distances_to_targets();
     // d[i] contains the distance from s to target_list[i]
}

Your code is missing the pin_targets part.

From the API you can only get the distances to the target nodes. If you need the actual paths, you need to do individual queries.

The main rational for this design is that many-to-many queries are usually used as input to other problem settings such as for example the traveling sales man problem, which only requires the distances. Once the planning phase is completed, i.e., you know in which order your sales person visits the cities, you only need to compute the few paths that you actually need. Another question is what a hypothetical get_paths_to_pinned_targets should return. A vector of paths is rarely a good idea, as many subpaths will be duplicated. Perhaps the union of all paths? There is some code for the later there that is commented out. It's not fully tested because I did not see enough use cases to warrant the maintenance costs.

Best Regards
Ben Strasser

@skinkie
Copy link
Contributor Author

skinkie commented Oct 25, 2016

From the API you can only get the distances to the target nodes. If you need the actual paths, you need to do individual queries.

Actually I need both distance and traveltime as input for a second algorithm. That is why I am doing the individual queries above. The geo_distance only (or travel_time only) weight works well in concert with the run_to_pinned_targets.

@ben-strasser
Copy link
Member

Hi,

This suggests that the weight function is travel_time which is an
argument to |ContractionHierarchy::build|.

If you want more flexibility, then look at the CCH stuff. A CH is
designed for a single immutable weight. CH performance varies depending
on the weights.

If a user /would/ like to get travel_time in concert with distance as
"given this is the shortest path, this is the travel time required"
would this require to recurse over the path found and use the arc
index to query the additional weight, or is there something more
elegant possible?

Use the following code:

auto p = q.get_arc_path();
unsigned t = 0;
for(auto a:p)
t += travel_time[a];

In terms of performance, it is very hard to beat, if you need the path
anyway.

Ideally I would want to fly at it as a multi-criteria weight.

If you want to, for example, avoid tunnels, go at at most 50km/h, and
hate going through some region you can do the following:

vectorweight(arc_count);
for(int xy=0; xy<arc_count; ++xy){
if(is_tunnel(xy) || is_in_no_go_region(latitude[tail[xy]],
longitude[tail[xy]])){
weight[xy] = inf_weight;
}else{
weight[xy] = geo_distance[xy] * std::min(50, speed[xy]) *
unit_conversion_factor;
}
}

You can be very creative when it comes to weights.

With a CCH, but not with a CH, you have the proven guarantee, that the
distance query running time and the memory consumption will not vary no
matter how absurd the weights are that you throw at it. This property is
very useful if you want to let the user specify the details of his weights.

Best Regards
Ben Strasser

@ben-strasser
Copy link
Member

ben-strasser commented Oct 26, 2016 via email

@jlabeit
Copy link

jlabeit commented Sep 1, 2017

Hi,
I actually have a similar real word use case where I need to compute a distance matrix with route distances optimized by driving time but also need the driving distance of the routes.
The background is that of computing transportation costs of trucks.
Truck drivers take the fastest routes (e.g. take highways and not short country roads). However, they bill by driving distance. Thus when optimizing costs I actually need to get the driving distances of the routes.

@ben-strasser
Copy link
Member

ben-strasser commented Sep 6, 2017 via email

@skinkie
Copy link
Contributor Author

skinkie commented Sep 6, 2017

@ben-strasser the code to 'do' this is really easy. So unless you know that there is a more efficient way without traversal all paths, I can actually point to the code that wrote in my repository :)

@ben-strasser
Copy link
Member

ben-strasser commented Sep 7, 2017 via email

@skinkie
Copy link
Contributor Author

skinkie commented Sep 7, 2017

@ben-strasser can you give an hint how you would do so?

@jlabeit
Copy link

jlabeit commented Sep 7, 2017

Hi,

My idea for an interface would be a function that gets passed a binary reducer and a stop stop criteria.

template<class T> T reducer(T val, Edge edge);
template<class T> bool done(T val);
int max_output = 10;
T initval;
template<class T> vector<tuple<node,T>> reducePathsToTargets(initval, reducer, done, max_output);

The reducer then could be applied on the edges of the tree connecting the source and the target nodes. The tree could be pruned with the boolean function. If the tree nodes are traversed in the order given by the distances the function could stop after finding_max target nodes.
The example with the 10 closest nodes with no more than 10 km travel distance would look like this:

recucePathsToTargets(0, sumEdgeLengths, sum > 10km, 10)

Do you think this could work.
I could give it a try and implement it.

Regards,

Julian

@skinkie
Copy link
Contributor Author

skinkie commented Sep 7, 2017

My question would me more in the direction, consider that you would have a run_to_pinned_targets how would you get all results back given the paths calculated.

@jlabeit
Copy link

jlabeit commented Sep 7, 2017

Yes, I also want to get the results back after run_to_pinned_targets. I want to get more information about the paths but without explicitly enumerating all edges in the paths (thats what I am currently doing). Like Ben said, it actually is not very efficient to enumerate all edges if you just want to compute the lengths of the paths. As most paths usually share edges they only need to be enumerated once when computing the lengths.

@skinkie
Copy link
Contributor Author

skinkie commented Sep 7, 2017

@jlabeit I assume that pinned_run would keep the forward_first_out / backward_first_out, this would be a subgraph. For the subgraph we can create a forward_tentative_alt_distance and backward_tentative_alt_distance, by walking the subgraph, dijkstra style. get_distances_to_alt_targets would get the forward_tentative_alt_distance

...and this sounds much too simple. Because the get_arc_path puts a lot of effort in recovering the original input arcs, which align with the alternative weight. So I wonder what @ben-strasser considers the best approach here. I can think of some;

  1. having an extra attribute in the CH which can be summed (extending struct Arc)
  2. after run_to_pinned_targets have something analogue to get_arc_path but then for to create a graph, where dijkstra style all nodes can get a distance.

@skinkie
Copy link
Contributor Author

skinkie commented Sep 8, 2017

@ben-strasser short update I am preparing a branch with an attempt to implement option one from the above. "Please stop now" is appreciated, if it can be done much easier ;)

@ben-strasser
Copy link
Member

I'll look at it next week. I do not currently have time for an in-depth analysis of the situation.

having an extra attribute in the CH which can be summed (extending struct Arc)

I will not accept any patch that modifies the ContractionHierarchy structure in any significant way. Most applications do not need many-to-many and thus should not pay for it.

The current many-to-many code just reuses the data structures that are in-place anyway. It costs therefore nothing if you do not use the feature. One can make many-to-many faster if one would allow for more memory. If we need a new query object type, it might be a good idea to go for the faster variants.

after run_to_pinned_targets have something analogue to get_arc_path but then for to create a graph, where dijkstra style all nodes can get a distance.

In some early RoutingKit code there was code that extracted the union of the edges in all paths. This can be done efficiently. However, it is unclear how to use this set efficiently.

If one only wants to solve the problem of Julian, I think the following interface is the way to go:

// contains for every shortcut the corresponding weight
struct ContractionHierarchyExtraWeight{
  // computes the weights of every shortcut using the path unpacking datastructure
  void reset(ContractionHierarchy&ch, const vector<unsigned>&weight);
};

struct ContractionHierarchyQuery{
  // this function is for the one-to-one case. Using it is faster than extracting a path and then summing the weights. It extracts the up-down path and sums its weights.
  unsigned get_extra_weight_distance(const ContractionHierarchyExtraWeight&);

  // Extract the distance to all targets. Extracts (partial) up-down paths and sums the weights along them. I think one of the n-sized arrays is unused by the current algorithm. You can use it here. I'd like to avoid allocating extra memory as a subroutine.
  std::vector<unsigned> get_extra_weight_distances_to_targets(const ContractionHierarchyExtraWeight&);
  // the variants of get_extra_weight_distances_to_targets
};

Ideally one wants to mirror these functions for CustomizableContractionHierarchy, which is non-trivial because the query algorithms are actually not the same.

What I dislike about the interface above are:

  • Signed weights are not supported. There is no algorithmic limitation that justifies this.
  • The interface does not support the find-all-targets-within-range problem.

Then again maybe I want too much and the interface above is already good enough...

@skinkie
Copy link
Contributor Author

skinkie commented Sep 8, 2017

I will not accept any patch that modifies the ContractionHierarchy structure in any significant way. Most applications do not need many-to-many and thus should not pay for it.

Good to know, and no offence taken :) If I get it to work, we could at least use it to benchmark.

skinkie added a commit to bliksemlabs/RoutingKit that referenced this issue Sep 8, 2017
This is a proof of concept after the suggestion of @ben-strasser in RoutingKit#9 that the retreival of a second weight should be much faster than enumerating over all paths to recover it. This implementation proofs that. Sadly it has been implemented in such way that is very performant but breaks the API of RoutingKit, use this branch to see what code has been changed in order to support an extra attribute that is not used in routing, but purely in retreival.

This effort has given me better insight in the internal structures of RoutingKit, and specifically the min_to change was the last remaining bug in the code.

While the code compiles cleanly the CCH has not been verified, the CH is verified. The following commit contains a small benchmark utility.
@skinkie
Copy link
Contributor Author

skinkie commented Sep 8, 2017

The performance of the proof of concept. @ben-strasser @clinct

image

@ben-strasser
Copy link
Member

@skinkie If I understand your plot correctly, then it shows that using a one-to-many algorithm is faster than doing many one-to-one queries. Was that the main message of your plot?

The implementation you made has the problem, that people that do not have an extra weight must pay for it during the CH construction and the CCH customization.

Further, I can only have one extra weight. Distance, maximum weight, maximum height, and energy consumption are examples of four extra weights that are often needed.

Energy consumption can be negative, if recuperation is considered. Signed integers are needed for this case. If over- and undercharging is accounted for, the energy consumption is no longer a scalar and even more complex data structures are needed.

In your implementation, you use a simple addition for the extra weight. This can lead to problems, if several arcs on the shortest path have weight inf_weight. The sum will overflow and bad things happen.

The more I think about the more I like the following interface:

template<class Weight>
struct ContractionHierarchyExtraWeight{
  template<class LinkFunction>
  void reset(
    const ContractionHierarchy&ch, 
    const vector<Weight>&input_arc_weight, 
    const LinkFunction&link
  );
};

struct ContractionHierarchyQuery{
  template<class Weight, class LinkFunction>
  unsigned get_extra_weight_distance(
    const ContractionHierarchyExtraWeight<Weight>&extra_weight, 
    const LinkFunction&link
  );

  template<class Weight, class LinkFunction>
  std::vector<Weight> get_extra_weight_distances_to_targets(
    const ContractionHierarchyExtraWeight<Weight>&extra_weight,
    const LinkFunction&link,
    std::vector<Weight>&temporary_buffer_with_node_count_size
  );

  template<class Weight, class LinkFunction>
  std::vector<Weight> get_extra_weight_distances_to_targets(
    const ContractionHierarchyExtraWeight<Weight>&extra_weight,
    const LinkFunction&link
  ){
    std::vector<Weight>buf(ch->node_count());
    return get_extra_weight_distances_to_targets(extra_weight, link, buf);
  }
  
};

struct SaturatedAddition{
unsigned operator()(unsigned l, unsigned r)const{
  if(l >= inf_weight-r)
   return inf_weight;
  else
   return l+r;
}

int operator()(int l, int r)const{
  if(l >= static_cast<int>(inf_weight)-r)
   return static_cast<int>(inf_weight);
  else
   return l+r;
}
};

ContractionHierarchyExtraWeight::reset iterates over the nodes bottom to top according to the CH order. For every nodes it enumerates outgoing forward and backward arcs. For these, it looks up whether they correspond to an input arc. If so, the input weight is copied. Otherwise, it looks up the weights of the two arcs out of which the shortcut was build and links these weights.

get_extra_weight_distance computes an up-down-path and looks up the corresponding weights and links them together.

run_to_pinned_targets needs to be extended to store predcessor arcs and nodes. In a preliminary version of RoutingKit this feature already existed. By quickly looking at the code it seems I never got round to fully removing it and all that is needed is to adjust the downward sweep in pinned_run.

get_extra_weight_distances_to_targets computes the extra weight costs/distances to all targets. It does so by starting at an arbitrary target and follows the predecessors until a source is reached. It stores the intermediate values from the source to the intermediate nodes in the temporary buffer. We can recycle one of the was_pushed flag arrays to indicate which nodes have a valid value. The algorithm continues and follows the predcessors from the next target. Once it finds an intermediate node which already has a valid value it can stop and link the values together.

I do not think that adding this feature to the CCH code is useful. The reason is that the ExtraWeight objects needs to be rebuild after each customization and that building it for a CH is cheaper than for a CCH. I there believe that if one needs many-to-many queries combined with extra weights, then using a perfect customization is the best route anyway.

What the proposed interface above does not support is a bounded range and POI-like queries. However, my feeling is that to make these really efficient, the internal data structures need to be changed too much. My feeling is that a specialized query object is necessary.

I do not want to template the functions of SaturatedAddition because I want to prevent that it is instantiated for a type for which inf_weight overflows (such as a short).

There are only two things I dislike about the interface above:

  • The link function must be passed to every function. An alternative would be to make the functor part of the ContractionHierarchyExtraWeight object. However, this is problematic if the functor contains references to local variables which go out of scope. This easily gets a nightmare to debug.
  • Too many templates.

Any comments?

PS: I do not know if I will find time to implement this any time soon.

@skinkie
Copy link
Contributor Author

skinkie commented Sep 14, 2017

@skinkie If I understand your plot correctly, then it shows that using a one-to-many algorithm is faster than doing many one-to-one queries. Was that the main message of your plot?

Exactly, and show the scale of improvement. Especially for the travelmatrix case - traveltime + geodistance results - where @clinct and @jlabeit are interested in.

The implementation you made has the problem, that people that do not have an extra weight must pay for it during the CH construction and the CCH customization.

Further, I can only have one extra weight. Distance, maximum weight, maximum height, and energy consumption are examples of four extra weights that are often needed.

I fully agree. This implements a common used pattern which can be implemented using RoutingKit, but is far from generic. Hence it is interesting to plot your generic interface along with it. On the other hand: this code exists now, and can be used until a more generic way is available.

Energy consumption can be negative, if recuperation is considered. Signed integers are needed for this case. If over- and undercharging is accounted for, the energy consumption is no longer a scalar and even more complex data structures are needed.

Shouldn't energy consumption be part of another structure so it can be used deterministic, and implement charging via hubs?

@ben-strasser
Copy link
Member

@jlabeit @skinkie Latest commit introduces some functions to do this. Please test.

@jlabeit
Copy link

jlabeit commented Oct 10, 2017

Nice! I'll try it out as soon as possible.
EDIT: Works like a charm for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants