Clustering heuristic for CVRP #74

jcoupey · 2018-01-15T21:04:27Z

I'd like to take a first step toward handling multiple vehicles by adding capacity constraints to force load-balancing between available vehicles. I think we could get interesting results with a cluster-first-route-second approach. Advantages would include:

reusing the existing TSP building block for each vehicle (potentially parallelized) once the clustering is done (Feature/sub-problem #65 made that simple)
cutting down the complexity (solving a 1000-jobs instance with 10 vehicles with even load would be in the same range as solving a 100-point TSP)
a good clustering would probably raise solutions that "looks good" (well, that may need checking)

Things to keep in mind for implementation:

capacity descriptions at job/vehicle level should be an array of values to account for several potential limitations
in order to easily model many situations, we want to retain the current API were vehicles can have different start and end (and even only one of the two), so not always the plain single-depot thing. This calls for benchmarking as different start/end settings would probably trigger different biases in any clustering approach.
we might get decent solutions with that simple approach, but a local search phase is probably mandatory afterward...

jcoupey · 2018-02-06T15:41:03Z

I commited a first draft for a clustering approach in #75. The reason why it's not possible to just use a classical clustering technique like k-means:

the number of clusters is fixed (one per vehicle)
each cluster should contain start/end locations for the matching vehicle
cluster "size", i.e. sum of jobs amount values (not job number) must match the vehicle capacity constraint

Clusters are built around start/end locations and a "good" clustering should minimize the sum of "job spreading" (for some metric) over the clusters. As a measure of points closeness, we use the length of a spanning tree. The implementation is derived from Prim's algorithm, except that we want to end up with a forest of spanning trees (one per cluster). Thus we make the same kind of greedy choice but we maintain one heap per cluster. Clusters are built concurrently, picking the best possible addition to the best cluster at each step. Note: there is no guarantee that final spanning trees are minimal at cluster level.

The rationale for this strategy is:

the length of a minimal spanning tree provides a bound for the cost of the resulting route within each cluster, so it makes sense to use it to avoid computing actual routes at the clustering phase
clusters are built super fast
handling capacity constraints is easy

jcoupey · 2018-02-06T16:42:32Z

A couple of gifs are probably worth a long explanation.

Here is a "good" case where all jobs are assigned and after applying a TSP to each cluster, the solution cost is already under +15% from the optimal value:

This second example shows how the greedy assignments can end up with really wrong clusters when only bad choices are left (all jobs assigned but around +55% from the optimal)...

Also there is no guarantee so far that all jobs will be served, even for a problem where it's possible.

jcoupey · 2018-02-09T10:28:28Z

Turns out the parallel construction of the clusters is not always the best choice. It's good though in many situations where clusters are expected to be scattered geographically because vehicles start/end are distributed.

In the specific case where vehicles start/end are equal (think single depot), the sequential_clustering approach (pushed in the latest commits) is usually much more efficient:

clusters are build in turn for each vehicle until full capacity is reached
to mitigate the greedy choices, we use the notion of regret which serves as a rough evaluation of the cost of fetching a job later if we don't add it to the current cluster
clusters are initialized with the more demanding jobs wrt capacity. This vastly improves the number of served jobs as we're making sure the harder to fit are included from the start, and not leftover at a time were they can't be served anymore.

The outline of this sequential approach is actually very close to the well-known insertion heuristic proposed by Solomon except we work at cluster level so we don't need to test all possible insertions at route level.

jcoupey · 2018-02-28T16:42:03Z

So we end up with three parameters for the clustering scheme:

build strategy (parallel or sequential)
cluster init strategy (none, nearest job, job with higher_amount)
weighting factor for regret (see above comments)

For now I'm happily firing a bunch of combinations of the above and picking up the best result before moving to the TSP solving phase. This probably calls for more tuning at some point but is fine for now as it's still fast (and can be parrallelized). Also we get a more "all-round" approach as a single tuning can be very good on some problems but really poor on others (different problem shape really call for different clustering parameters).

I'll be doing a little tidying for output logs, update the docs and then merge #75 into master.

jcoupey · 2018-02-28T16:56:54Z

On the benchmarking side, I've looked at results on CVRPLIB. The classes that can easily be tested are A, B, E, F, M, P, X (see conversion script).

Overall 44737 jobs out of 44993 are served (that's 99.4%). We manage to get all jobs done for 159 instances out of 189. This means that only the 30 instances that are the tighter with regard to capacities have unassigned jobs (and usually a small number) after the heuristic.

For those 159 instances, it makes sense to compare our overall costs to the best known solutions. Gaps range from 0% to 46.9% with an average of 15.8%. Median gap is at 15.2%.

This looks promising for a fast heuristic approach and we now want to look at what happens when we get those solutions through a local search phase.

jcoupey · 2018-03-09T10:41:35Z

Landed in master with #75.

jcoupey added the CVRP label Jan 15, 2018

jcoupey added this to the v1.2.0 milestone Jan 15, 2018

jcoupey self-assigned this Jan 15, 2018

This was referenced Jan 15, 2018

VRP - Automatic Zoning #62

Closed

VRP wishlist #4

Closed

feature/CVRP-heuristic #75

Merged

jcoupey mentioned this issue Feb 20, 2018

A couple of questions #86

Closed

jcoupey closed this as completed Mar 9, 2018

This was referenced Mar 9, 2018

Multi-threaded clustering #91

Closed

Local-search phase for CVRP #92

Closed

jcoupey mentioned this issue Jun 26, 2019

Describe VRP algorithm #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering heuristic for CVRP #74

Clustering heuristic for CVRP #74

jcoupey commented Jan 15, 2018

jcoupey commented Feb 6, 2018

jcoupey commented Feb 6, 2018

jcoupey commented Feb 9, 2018

jcoupey commented Feb 28, 2018

jcoupey commented Feb 28, 2018

jcoupey commented Mar 9, 2018

Clustering heuristic for CVRP #74

Clustering heuristic for CVRP #74

Comments

jcoupey commented Jan 15, 2018

jcoupey commented Feb 6, 2018

jcoupey commented Feb 6, 2018

jcoupey commented Feb 9, 2018

jcoupey commented Feb 28, 2018

jcoupey commented Feb 28, 2018

jcoupey commented Mar 9, 2018