Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Algorithm tuning #33

Closed
N-Wouda opened this issue Aug 1, 2022 · 58 comments · Fixed by #152
Closed

Algorithm tuning #33

N-Wouda opened this issue Aug 1, 2022 · 58 comments · Fixed by #152
Assignees
Labels
dynamic enhancement New feature or request static

Comments

@N-Wouda
Copy link
Owner

N-Wouda commented Aug 1, 2022

There are lots of parameters to the HGS algorithm, not many of which seem to be tuned particularly well. At some point, we should run a tuning tool (e.g., smac) to determine a good set of parameters.

Parameters, in this sense, also includes "which operators to use" (see also e.g. #32).

@N-Wouda N-Wouda added the enhancement New feature or request label Aug 1, 2022
@N-Wouda
Copy link
Owner Author

N-Wouda commented Sep 12, 2022

We should probably have different configurations for different instance sizes. A natural split seems to be in three parts, like how the quali/final run-times are split.

@leonlan
Copy link
Collaborator

leonlan commented Sep 26, 2022

We also might have to consider separate parameter tuning for the dynamic instances. The dynamic dispatch instances are very different than the ones from the static problem because of the relatively tight time windows and small instance sizes.

Based on greedy/hindsight/rollout, a rough estimate is that 40% of the instances is between 100-150 customers, 40% is between 50-100 customers, 10% <50 customers and 10% 150-200 customers.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 4, 2022

Static

We should have different configurations for each of the three instance sizes (of <300, 300-500, >500 clients), since they get different time limits and are in general very different in size and/or shape.

We currently have these parameters (22 in total):

Param Default value Description
nbIter 10K Number of iters without improvement before restarting
initialTimeWarpPenalty 1
nbPenaltyManagement 100 Number of iterations between penalty updates
feasBooster 2 Penalty increase if no feasible solutions yet
penaltyIncrease 1.2 Penalty increase if insufficient (but not zero) feasible solutions
penaltyDecrease 0.85 Penalty decrease if too many feasible solutions
minPopSize 25 Minimum population size
generationSize 40 Min + gen size = max pop size before culling
nbElite 4 Number of elite individuals which are biased to survive in survival mechanism
nbClose 5 Number of closest individuals to calculate average diversity
targetFeasible 40% Target percentage of generated pops that were feasible (between penalty updates)
nbKeepOnRestart 1 Number of best feasible pops to keep on restart
repairProbability 50% Probability of attempting to repair infeasible offspring
repairBooster 10 Penalty booster when repairing (hammering out infeasibilities)
selectProbability 90% Geometric selection probability of best crossover offspring
nbGranular 40 Number of 'near' clients explored by node operators in local search
weightWaitTime 2 Penalty for waiting time when determining granular neighborhoods
weightTimeWarp 10 Penalty for time warps when determining granular neighborhoods
circleSectorOverlapTolerance 0 degrees Overlap percentage before route operators are applied
minCircleSectorSize 15 degrees Minimum overlap percentage (overlap is bounded from below by this)
destroyPct 20% Percentage of the solution to destroy in broken pairs exchange crossover
postProcessPathLength 4 Subpath length to optimally recombine by enumeration in local search postprocessing

Additionally, if we have to tune the node, route, and the crossover operators (another 10-15 or so) as well, we have maybe 40 parameters in total. That's quite a lot. Can we shorten this list somehow, or do you know of a tuning algorithm that can deal with this? I'm not sure SMAC can.

Dynamic

This probably also needs its own static configuration, particularly for the simulations (since those need to be fast). The configuration for the epoch optimisation after rollout can probably be shared with that for the small instances.

Rollout needs a static parameter configuration (for the simulations), and the following additional parameters:

Param Default value Description
N_LOOKAHEAD 3 Number of periods to simulate ahead
SIM_TLIM_FACTOR 50% Percentage of the epoch time limit to take for simulating
SIM_SOLVE_ITERS 25 Number of static solver iterations to use in a simulation
DISPATCH_THRESHOLD 15% Threshold to clear before dispatching a customer in current epoch

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 4, 2022

Each parameter lies in some [min, max] range. One simple way to get a feel for this would be to run two benchmarks for each parameter value: one where we increase it from the default to the max value, and one where we decrease it to the min value. That's doable, and should quickly give us a feel for which parameters really matter, and which do not seem to be as relevant. Then we can take the subset of parameters that really matter and tune those together.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 4, 2022

@leonlan @LuukPentinga @jaspervd96 @rijalarpan @MarjoleinAerts @sbhulai: what do you think?

@rijalarpan
Copy link
Collaborator

@leonlan @LuukPentinga @jaspervd96 @rijalarpan @MarjoleinAerts @sbhulai: what do you think?

Each parameter lies in some [min, max] range. One simple way to get a feel for this would be to run two benchmarks for each parameter value: one where we increase it from the default to the max value, and one where we decrease it to the min value. That's doable, and should quickly give us a feel for which parameters really matter, and which do not seem to be as relevant. Then we can take the subset of parameters that really matter and tune those together.

It is a might load of parameter to tune. I think the min and max idea may not work fully, because in my experience some of them have non-linear behaviour. I would consider a set of values for each parameter and test in a full factorial way. Then as you suggested to go deeper into few of them. I do not think a parameter optimization packages like the ones available for machine learning will work here.

@leonlan
Copy link
Collaborator

leonlan commented Oct 5, 2022

I agree with @rijalarpan; these parameters are usually non-linear in behavior (e.g., destroyPct), so we need more than min-max. A factorial design should be manageable and I can also help to tune part of the paramters. Some other thoughts

  • I don't think we need to try all parameters. For example, the initial time warp penalty of 1 doesn't have to be changed I think, because the penalty-related parameters should deal with that. Actually, I think with a factorial design we can easily figure out which parameters need to be tuned and which not. (This is what you also propose.)
  • We could try to optimize logical parameter groups. For example, (penaltyIncrease, penaltyDecrease, targetFeasible) seem to belong together and are rather independent from the rest. We use SMAC or some other tuning algorithm to tune this.

Also I filled in the remaining ?? for some parameters (nbElite, nbClose, weightTimeWarp, weightWaitTime).

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 5, 2022

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

@leonlan
Copy link
Collaborator

leonlan commented Oct 5, 2022

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

It's not 😅 My understanding of the (full) factorial design was flawed. Let me rephrase. A full factorial design is not feasible, but it should be manageable to perform a factorial design for smaller logical parameter groups.

@MarjoleinAerts
Copy link
Collaborator

Agree with the comments to check a couple of values for the parameters and then focus on the ones with the most impact. Min and max might not give good indications, as also suggested in the comments above

@rijalarpan
Copy link
Collaborator

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

It's not 😅 My understanding of the (full) factorial design was flawed. Let me rephrase. A full factorial design is not feasible, but it should be manageable to perform a factorial design for smaller logical parameter groups.

Could the procedure comparable to variable neighborhood depth be pragmatic here? It would follow a similar procedure:

  1. Shuffle the parameters in random order.
  2. Select one of them and optimize with all others fixed.
  3. Select the optimal choice of the previous one and continue untill all parameter values are fixed.
  4. Until time runs out, keep redoing 1 to 3.

Then, you can try to see if a full-factorial set up of a limited set of parameters that proved of greatest value from the above process and focus on them.

@leonlan
Copy link
Collaborator

leonlan commented Oct 6, 2022

Here are our current parameters and operators organized per category. I like the suggestion by @rijalarpan to do a VND kind of approach to optimize this, but I think its best done per parameter group instead of 1 single parameter. Those parameter group sizes are not super big, so within each (sub)group you could do a full-factorial approach.

Click me to show parameter groups

Penalty management

Param Default value Description
initialTimeWarpPenalty 1
nbPenaltyManagement 100 Number of iterations between penalty updates
feasBooster 2 Penalty increase if no feasible solutions yet
penaltyIncrease 1.2 Penalty increase if insufficient (but not zero) feasible solutions
penaltyDecrease 0.85 Penalty decrease if too many feasible solutions
targetFeasible 40% Target percentage of generated pops that were feasible (between penalty updates)
repairProbability 50% Probability of attempting to repair infeasible offspring
repairBooster 10 Penalty booster when repairing (hammering out infeasibilities)

Population management

Param Default value Description
minPopSize 25 Minimum population size
generationSize 40 Min + gen size = max pop size before culling
nbElite 4 Number of elite individuals which are biased to survive in survival mechanism
nbClose 5 Number of closest individuals to calculate average diversity

Restart mechanism

Param Default value Description
nbIter 10K Number of iters without improvement before restarting
nbKeepOnRestart 1 Number of best feasible pops to keep on restart

Crossover

Param Default value Description
selectProbability 90% Geometric selection probability of best crossover offspring
destroyPct 20% Percentage of the solution to destroy in broken pairs exchange crossover
brokenPairsExchange N -
selectiveRouteExchange Y -
edgeAssembly N -

Local search

Node Ops

Param Default value Description
Exchange 10 Y
Exchange 11 Y
Exchange 20 Y
MoveTwoClientsReversed Y
Exchange 21 Y
Exchange 22 Y
Exchange 31 N
Exchange 32 N
Exchange 33 N
TwoOpt Y

Granular neighborhoods

Param Default value Description
nbGranular 40 Number of 'near' clients explored by node operators in local search
weightWaitTime 2 Penalty for waiting time when determining granular neighborhoods
weightTimeWarp 10 Penalty for time warps when determining granular neighborhoods

Intensification and route ops

Param Default value Description
circleSectorOverlapTolerance 0 degrees Overlap percentage before route operators are applied
minCircleSectorSize 15 degrees Minimum overlap percentage (overlap is bounded from below by this)
shouldIntensify Y Whether to apply route and enumeration operators to new best solutions
RelocateStar Y
SwapSwar Y

Post-process

Param Default value Description
postProcessPathLength 4 Subpath length to optimally recombine by enumeration in local search postprocessing

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 6, 2022

Added three more node ops that we keep forgetting about :-).

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 6, 2022

I like the idea of doing things by logical group. If we want to do a factorial design we need to come up with levels for the non-binary parameters (e.g. what's 'low' or 'high' for nbGranular?).

We might also just dump every group into, say, SMAC or ParamILS and have that thing tune for us. The number of parameters we have in each group is manageable for these algorithms, so I suspect they can get us a reasonably good configuration quite quickly. If the resulting configuration of tuning each group in isolation is not already effective, we can use the tuning trajectories to identify which parameters should be tuned together in a follow-up step.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 6, 2022

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

@leonlan
Copy link
Collaborator

leonlan commented Oct 6, 2022

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

I'm available all day before 15:30. So 15:00-15:30 I could meet.

@rijalarpan
Copy link
Collaborator

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

I'm available all day before 15:30. So 15:00-15:30 I could meet.

I am away for holidays but I can find time if between 09:00-10:00.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 6, 2022

Tomorrow at 9 works for me. I'll send a Google meet in a bit.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 6, 2022

Here's the link: https://meet.google.com/ekp-weba-xci. See you tomorrow!

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 7, 2022

What we discussed just yet:

  • Tuning by group is the most sensible thing to do, given that we cannot do everything at once.
  • There's ordering in the groups, and the order in which we tune probably matters (e.g., if we first tune group 1 and then group 2 we get different results than if we had first tuned group 2 and then group 1). We should experiment with this a bit to see if this matters (too) much.
  • We'll start with tuning the penalty management, population management, and restart mechanism groups.
  • We'll use a tuning algorithm for this (e.g. SMAC or ParamILS). Niels will have a look at this later today.
  • After tuning each group we might want to tune the most important parameters of each group together as well, in a second tuning round. But that's for later.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 9, 2022

After this Friday (14 October) I would like to freeze any new static changes, and focus solely on tuning the static solver. So any planned or open PRs that impact the static solver should ideally be merged before next weekend!

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 24, 2022

None of the population parameter sets appear to be better than our current default: the one promising set I found gets an average objective +15 above our current settings. So we should probably keep the defaults for population management.

I'll set up the LS parameters now, and run that. That's the last set of static parameters to tune!

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

The LS stuff is in now too, and there's potential here for another thirty-ish points improvement. There are a few candidate parameter settings that I'll benchmark later today, and then I'll pick the best settings based on what comes out of that.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

@leonlan @jaspervd96 before tuning the dynamic part: is there anything in the works there still that I have to wait for?

@jmhvandoorn
Copy link
Collaborator

Still trying all kind of variations I can think of, but (from my side at least) still nothing that clearly beats the original.
So I would at least not wait for that and start tuning soon to have enough time for this.

In the "worst case", we might last minute find something that's better, with the same tuning as the original instead of something specifically tuned on that.

@jmhvandoorn
Copy link
Collaborator

Small addition: We should decide though if we can/want to tune certain parameters separately per epoch.

E.g. not one fixed threshold, but a separate threshold for each epoch.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

Small addition: We should decide though if we can/want to tune certain parameters separately per epoch.

I talked to @leonlan about this, and we figured we might try a different threshold for epoch 0, 1, and >1. Or just 0 and >0, because the first epoch is a bit special.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

Raw configurations + results of the static tuning runs are available here.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

The best configuration is this one:

nbGranular = 34
weightWaitTime = 18
weightTimeWarp = 20
circleSectorOverlapToleranceDegrees = 216
minCircleSectorSizeDegrees = 332
postProcessPathLength = 7

with an average cost of 164228, and 48234 iterations.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 26, 2022

minCircleSectorSizeDegrees = 332

This means that we're more or less trying all routes in the intensification phase. Based on that, I'm also trying something where we get rid of the whole circle sector overlap thing and just try all routes.

@leonlan
Copy link
Collaborator

leonlan commented Oct 26, 2022

It seems that tuning the population parameters hasn't resulted in a significant improvement. There's one set of values that seems slightly better, so I'm testing that now. But it might not pan out - the defaults here were already quite good.

I think it may be worthwhile to tune nbIter independently. Correct me if I'm wrong, but the current parameter tuning tests an entire configuration and, consequently, any positive effect of changing nbIter is probably negated by negative effects of changes in the other population parameters.

@leonlan
Copy link
Collaborator

leonlan commented Oct 26, 2022

@leonlan @jaspervd96 before tuning the dynamic part: is there anything in the works there still that I have to wait for?

I don't have any major changes, but I'm undecided whether we should use n_lookahead equal to 1, 2 or 3. For each value, it seems that we can get similar performance when setting the corresponding threshold to 35, 25 and 15%. We could try to use parameter tuning to figure out which number of lookahead periods to use, but the threshold ranges and values should be different in each case. Is this doable? On the other hand, it shouldn't matter if we just fix 1, 2 or 3 because the performance of each is very similar and then we can reduce the number of parameters to tune.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 27, 2022

minCircleSectorSizeDegrees = 332

This means that we're more or less trying all routes in the intensification phase. Based on that, I'm also trying something where we get rid of the whole circle sector overlap thing and just try all routes.

This gets 164224 average cost with 48905 iterations, so more or less the same outcome. It's a lot simpler, however, so I'll open a PR removing the circle sector stuff.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 27, 2022

For tuning dynamic: a full dynamic run on eight cores takes about 3h20min, so 3h40min should be plenty on the cluster.

A dynamic instance is derived from a static instance. This process uses some randomness, so there's a seed controlling this (--instance_seed). The solver also uses randomness, and there's a different seed controlling that (--solver_seed). This results in different variabilities over ten runs with different seeds:

  • Varying instance seed: average cost 371015, with stddev 1189
  • Varying solver seed: average cost 371008, with stddev 181

On the qualification and finals, we will be evaluated with different solver seeds (the instances - and their seeds - are fixed for us). So it's good that the variability in that is quite a bit smaller.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 28, 2022

For rollout, we have the following parameters

Parameter Default Meaning
rollout_tlim_factor 0.7 Fraction of epoch time to spend on simulations; the rest goes to solving the dynamic instance.
n_cycles 1 Number of simulation cycles to perform
n_simulations 50 Number of simulations in each cycle
n_lookahead 1 Number of epochs to sample ahead in each simulation
n_requests 100 Number of requests per sampled epoch
dispatch_threshold 0.35 We dispatch if the average number of times a request is dispatched (across the simulations) is above this threshold.

I propose the following ranges:

Parameter Default Min Max
rollout_tlim_factor 0.5 0.6 0.9
n_cycles 1 1 3
n_simulations 50 25 100
n_lookahead 1 1 5
n_requests 100 50 100
dispatch_threshold 0.35 0 1.0

But I do not have a lot of intuition for this. @leonlan @jaspervd96 what do you think?

@leonlan
Copy link
Collaborator

leonlan commented Oct 28, 2022 via email

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 29, 2022

I'm running 200 scenarios with Leon's suggested parameters (see #152 for details), and five solver seeds each. So 1,000 experiments, each lasting up to 8h. The first 200 of those are underway, and I hope the rest completes by tomorrow.

@jmhvandoorn
Copy link
Collaborator

Perfect. Let me know if there would be anything left to be done (or run).

@leonlan
Copy link
Collaborator

leonlan commented Oct 29, 2022

I ran main on final time limits for nbIter 2K-10K (1K step size), and 12.5K-20K (2.5K step size), each with 10 seeds. This is the result:

image

Seems like 5K, 8K and 12.5K improve the baseline (10K) the most. I'm now running these values on 30 seeds and will pick the best one.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 29, 2022

The first batch of 500 experiments have nearly finished. I just started the second batch, that should hopefully complete overnight/early tomorrow morning. Then we should have a final dynamic config later tomorrow.

image

@leonlan so as long as we pick something in (5K, 10K), we're more or less set, with perhaps good values being either 5K or 8K? I had expected a slightly smoother figure, and am now unsure what to make of this exactly.

@leonlan
Copy link
Collaborator

leonlan commented Oct 30, 2022

I was surprised by that as well. Turns out my experiments were not using 10 different seeds but just a single fixed value. 🤦 I'll rerun the experiment for a subset of the values (5k-10k) but with correct seeds.

(My 30 seed experiment failed anyhow because I ran out of budget on my GPU account)

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 30, 2022

#152 now contains the new dynamic parameters. I'm running a few more evaluations (including the baseline) to make sure they're, in fact, better than what we had. Expect those results in 2-4 hours.

@leonlan
Copy link
Collaborator

leonlan commented Oct 30, 2022

These are the correct results for nbIter:
image

The differences are minimal when comparing 8K to 10K (-5 pts). Similar to the last figure, I would have expected the figure to a bit smoother with the 9K. I don't think it's worth changing the nbIter value.

@N-Wouda
Copy link
Owner Author

N-Wouda commented Oct 30, 2022

Expect those results in 2-4 hours.

/ Baseline Tuned
Quali 370991.3 369459.2
Final 369536.8 368541.7

So an improvement of 1.5K with the qualification time limit, and around 1K using the final time limit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dynamic enhancement New feature or request static
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants