Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering heuristic for CVRP #74

Closed
jcoupey opened this issue Jan 15, 2018 · 6 comments
Closed

Clustering heuristic for CVRP #74

jcoupey opened this issue Jan 15, 2018 · 6 comments
Assignees
Labels
Milestone

Comments

@jcoupey
Copy link
Collaborator

jcoupey commented Jan 15, 2018

I'd like to take a first step toward handling multiple vehicles by adding capacity constraints to force load-balancing between available vehicles. I think we could get interesting results with a cluster-first-route-second approach. Advantages would include:

  • reusing the existing TSP building block for each vehicle (potentially parallelized) once the clustering is done (Feature/sub-problem #65 made that simple)
  • cutting down the complexity (solving a 1000-jobs instance with 10 vehicles with even load would be in the same range as solving a 100-point TSP)
  • a good clustering would probably raise solutions that "looks good" (well, that may need checking)

Things to keep in mind for implementation:

  • capacity descriptions at job/vehicle level should be an array of values to account for several potential limitations
  • in order to easily model many situations, we want to retain the current API were vehicles can have different start and end (and even only one of the two), so not always the plain single-depot thing. This calls for benchmarking as different start/end settings would probably trigger different biases in any clustering approach.
  • we might get decent solutions with that simple approach, but a local search phase is probably mandatory afterward...
@jcoupey jcoupey added the CVRP label Jan 15, 2018
@jcoupey jcoupey added this to the v1.2.0 milestone Jan 15, 2018
@jcoupey jcoupey self-assigned this Jan 15, 2018
This was referenced Jan 15, 2018
@jcoupey
Copy link
Collaborator Author

jcoupey commented Feb 6, 2018

I commited a first draft for a clustering approach in #75. The reason why it's not possible to just use a classical clustering technique like k-means:

  • the number of clusters is fixed (one per vehicle)
  • each cluster should contain start/end locations for the matching vehicle
  • cluster "size", i.e. sum of jobs amount values (not job number) must match the vehicle capacity constraint

Clusters are built around start/end locations and a "good" clustering should minimize the sum of "job spreading" (for some metric) over the clusters. As a measure of points closeness, we use the length of a spanning tree. The implementation is derived from Prim's algorithm, except that we want to end up with a forest of spanning trees (one per cluster). Thus we make the same kind of greedy choice but we maintain one heap per cluster. Clusters are built concurrently, picking the best possible addition to the best cluster at each step. Note: there is no guarantee that final spanning trees are minimal at cluster level.

The rationale for this strategy is:

  • the length of a minimal spanning tree provides a bound for the cost of the resulting route within each cluster, so it makes sense to use it to avoid computing actual routes at the clustering phase
  • clusters are built super fast
  • handling capacity constraints is easy

@jcoupey
Copy link
Collaborator Author

jcoupey commented Feb 6, 2018

A couple of gifs are probably worth a long explanation.

Here is a "good" case where all jobs are assigned and after applying a TSP to each cluster, the solution cost is already under +15% from the optimal value:

x-n110-k13_loop

This second example shows how the greedy assignments can end up with really wrong clusters when only bad choices are left (all jobs assigned but around +55% from the optimal)...

b-n56-k7_loop

Also there is no guarantee so far that all jobs will be served, even for a problem where it's possible.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Feb 9, 2018

Turns out the parallel construction of the clusters is not always the best choice. It's good though in many situations where clusters are expected to be scattered geographically because vehicles start/end are distributed.

In the specific case where vehicles start/end are equal (think single depot), the sequential_clustering approach (pushed in the latest commits) is usually much more efficient:

  • clusters are build in turn for each vehicle until full capacity is reached
  • to mitigate the greedy choices, we use the notion of regret which serves as a rough evaluation of the cost of fetching a job later if we don't add it to the current cluster
  • clusters are initialized with the more demanding jobs wrt capacity. This vastly improves the number of served jobs as we're making sure the harder to fit are included from the start, and not leftover at a time were they can't be served anymore.

The outline of this sequential approach is actually very close to the well-known insertion heuristic proposed by Solomon except we work at cluster level so we don't need to test all possible insertions at route level.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Feb 28, 2018

So we end up with three parameters for the clustering scheme:

  1. build strategy (parallel or sequential)
  2. cluster init strategy (none, nearest job, job with higher_amount)
  3. weighting factor for regret (see above comments)

For now I'm happily firing a bunch of combinations of the above and picking up the best result before moving to the TSP solving phase. This probably calls for more tuning at some point but is fine for now as it's still fast (and can be parrallelized). Also we get a more "all-round" approach as a single tuning can be very good on some problems but really poor on others (different problem shape really call for different clustering parameters).

I'll be doing a little tidying for output logs, update the docs and then merge #75 into master.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Feb 28, 2018

On the benchmarking side, I've looked at results on CVRPLIB. The classes that can easily be tested are A, B, E, F, M, P, X (see conversion script).

Overall 44737 jobs out of 44993 are served (that's 99.4%). We manage to get all jobs done for 159 instances out of 189. This means that only the 30 instances that are the tighter with regard to capacities have unassigned jobs (and usually a small number) after the heuristic.

For those 159 instances, it makes sense to compare our overall costs to the best known solutions. Gaps range from 0% to 46.9% with an average of 15.8%. Median gap is at 15.2%.

This looks promising for a fast heuristic approach and we now want to look at what happens when we get those solutions through a local search phase.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Mar 9, 2018

Landed in master with #75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant