-
Notifications
You must be signed in to change notification settings - Fork 2
Algorithm tuning #33
Comments
We should probably have different configurations for different instance sizes. A natural split seems to be in three parts, like how the quali/final run-times are split. |
We also might have to consider separate parameter tuning for the dynamic instances. The dynamic dispatch instances are very different than the ones from the static problem because of the relatively tight time windows and small instance sizes. Based on greedy/hindsight/rollout, a rough estimate is that 40% of the instances is between 100-150 customers, 40% is between 50-100 customers, 10% <50 customers and 10% 150-200 customers. |
StaticWe should have different configurations for each of the three instance sizes (of <300, 300-500, >500 clients), since they get different time limits and are in general very different in size and/or shape. We currently have these parameters (22 in total):
Additionally, if we have to tune the node, route, and the crossover operators (another 10-15 or so) as well, we have maybe 40 parameters in total. That's quite a lot. Can we shorten this list somehow, or do you know of a tuning algorithm that can deal with this? I'm not sure SMAC can. DynamicThis probably also needs its own static configuration, particularly for the simulations (since those need to be fast). The configuration for the epoch optimisation after rollout can probably be shared with that for the small instances. Rollout needs a static parameter configuration (for the simulations), and the following additional parameters:
|
Each parameter lies in some [min, max] range. One simple way to get a feel for this would be to run two benchmarks for each parameter value: one where we increase it from the default to the max value, and one where we decrease it to the min value. That's doable, and should quickly give us a feel for which parameters really matter, and which do not seem to be as relevant. Then we can take the subset of parameters that really matter and tune those together. |
@leonlan @LuukPentinga @jaspervd96 @rijalarpan @MarjoleinAerts @sbhulai: what do you think? |
It is a might load of parameter to tune. I think the min and max idea may not work fully, because in my experience some of them have non-linear behaviour. I would consider a set of values for each parameter and test in a full factorial way. Then as you suggested to go deeper into few of them. I do not think a parameter optimization packages like the ones available for machine learning will work here. |
I agree with @rijalarpan; these parameters are usually non-linear in behavior (e.g., destroyPct), so we need more than min-max. A factorial design should be manageable and I can also help to tune part of the paramters. Some other thoughts
Also I filled in the remaining |
We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable? |
It's not 😅 My understanding of the (full) factorial design was flawed. Let me rephrase. A full factorial design is not feasible, but it should be manageable to perform a factorial design for smaller logical parameter groups. |
Agree with the comments to check a couple of values for the parameters and then focus on the ones with the most impact. Min and max might not give good indications, as also suggested in the comments above |
Could the procedure comparable to variable neighborhood depth be pragmatic here? It would follow a similar procedure:
Then, you can try to see if a full-factorial set up of a limited set of parameters that proved of greatest value from the above process and focus on them. |
Here are our current parameters and operators organized per category. I like the suggestion by @rijalarpan to do a VND kind of approach to optimize this, but I think its best done per parameter group instead of 1 single parameter. Those parameter group sizes are not super big, so within each (sub)group you could do a full-factorial approach. Click me to show parameter groupsPenalty management
Population management
Restart mechanism
Crossover
Local searchNode Ops
Granular neighborhoods
Intensification and route ops
Post-process
|
Added three more node ops that we keep forgetting about :-). |
I like the idea of doing things by logical group. If we want to do a factorial design we need to come up with levels for the non-binary parameters (e.g. what's 'low' or 'high' for nbGranular?). We might also just dump every group into, say, SMAC or ParamILS and have that thing tune for us. The number of parameters we have in each group is manageable for these algorithms, so I suspect they can get us a reasonably good configuration quite quickly. If the resulting configuration of tuning each group in isolation is not already effective, we can use the tuning trajectories to identify which parameters should be tuned together in a follow-up step. |
@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h. |
I'm available all day before 15:30. So 15:00-15:30 I could meet. |
I am away for holidays but I can find time if between 09:00-10:00. |
Tomorrow at 9 works for me. I'll send a Google meet in a bit. |
Here's the link: https://meet.google.com/ekp-weba-xci. See you tomorrow! |
What we discussed just yet:
|
After this Friday (14 October) I would like to freeze any new static changes, and focus solely on tuning the static solver. So any planned or open PRs that impact the static solver should ideally be merged before next weekend! |
None of the population parameter sets appear to be better than our current default: the one promising set I found gets an average objective +15 above our current settings. So we should probably keep the defaults for population management. I'll set up the LS parameters now, and run that. That's the last set of static parameters to tune! |
The LS stuff is in now too, and there's potential here for another thirty-ish points improvement. There are a few candidate parameter settings that I'll benchmark later today, and then I'll pick the best settings based on what comes out of that. |
@leonlan @jaspervd96 before tuning the dynamic part: is there anything in the works there still that I have to wait for? |
Still trying all kind of variations I can think of, but (from my side at least) still nothing that clearly beats the original. In the "worst case", we might last minute find something that's better, with the same tuning as the original instead of something specifically tuned on that. |
Small addition: We should decide though if we can/want to tune certain parameters separately per epoch. E.g. not one fixed threshold, but a separate threshold for each epoch. |
I talked to @leonlan about this, and we figured we might try a different threshold for epoch 0, 1, and >1. Or just 0 and >0, because the first epoch is a bit special. |
Raw configurations + results of the static tuning runs are available here. |
The best configuration is this one:
with an average cost of 164228, and 48234 iterations. |
This means that we're more or less trying all routes in the intensification phase. Based on that, I'm also trying something where we get rid of the whole circle sector overlap thing and just try all routes. |
I think it may be worthwhile to tune |
I don't have any major changes, but I'm undecided whether we should use |
This gets 164224 average cost with 48905 iterations, so more or less the same outcome. It's a lot simpler, however, so I'll open a PR removing the circle sector stuff. |
For tuning dynamic: a full dynamic run on eight cores takes about 3h20min, so 3h40min should be plenty on the cluster. A dynamic instance is derived from a static instance. This process uses some randomness, so there's a seed controlling this (
On the qualification and finals, we will be evaluated with different solver seeds (the instances - and their seeds - are fixed for us). So it's good that the variability in that is quite a bit smaller. |
For rollout, we have the following parameters
I propose the following ranges:
But I do not have a lot of intuition for this. @leonlan @jaspervd96 what do you think? |
Most parameter ranges look fine. The only change I suggest is to make dispatch_threshold depend on n_lookahead. All other parameters can be sampled independently.
n_lookahead=1 should take dispatch threshold (20, 35, 50) as (min, default, max)
For 2, (10, 25, 40)
For 3, (5, 15, 30)
For 4 and 5 I don’t know, but something like (5, 10, 20) should work. Maybe even skip 4, I don’t think it’s worth it to try both 4 and 5 because they don’t differ a lot from each other.
Verzonden vanuit Outlook voor iOS<https://aka.ms/o0ukef>
…________________________________
Van: Niels Wouda ***@***.***>
Verzonden: Friday, October 28, 2022 12:26:01 PM
Aan: N-Wouda/Euro-NeurIPS-2022 ***@***.***>
CC: Lan, L. (Leon) ***@***.***>; Mention ***@***.***>
Onderwerp: Re: [N-Wouda/Euro-NeurIPS-2022] Algorithm tuning (Issue #33)
For rollout, we have the following parameters
Parameter Default Meaning
rollout_tlim_factor 0.7 Fraction of epoch time to spend on simulations; the rest goes to solving the dynamic instance.
n_cycles 1 Number of simulation cycles to perform
n_simulations 50 Number of simulations in each cycle
n_lookahead 1 Number of epochs to sample ahead in each simulation
n_requests 100 Number of requests per sampled epoch
dispatch_threshold 0.35 We dispatch if the average number of times a request is dispatched (across the simulations) is above this threshold.
I propose the following ranges:
Parameter Default Min Max
rollout_tlim_factor 0.7 0.6 1.0
n_cycles 1 1 3
n_simulations 50 25 100
n_lookahead 1 1 5
n_requests 100 50 100
dispatch_threshold 0.35 0 1.0
But I do not have a lot of intuition for this. @leonlan<https://github.com/leonlan> @jaspervd96<https://github.com/jaspervd96> what do you think?
—
Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHMPWHVC2CIUJQ2AC7OGYO3WFOS3TANCNFSM55HPTNNQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I'm running 200 scenarios with Leon's suggested parameters (see #152 for details), and five solver seeds each. So 1,000 experiments, each lasting up to 8h. The first 200 of those are underway, and I hope the rest completes by tomorrow. |
Perfect. Let me know if there would be anything left to be done (or run). |
The first batch of 500 experiments have nearly finished. I just started the second batch, that should hopefully complete overnight/early tomorrow morning. Then we should have a final dynamic config later tomorrow. @leonlan so as long as we pick something in (5K, 10K), we're more or less set, with perhaps good values being either 5K or 8K? I had expected a slightly smoother figure, and am now unsure what to make of this exactly. |
I was surprised by that as well. Turns out my experiments were not using 10 different seeds but just a single fixed value. 🤦 I'll rerun the experiment for a subset of the values (5k-10k) but with correct seeds. (My 30 seed experiment failed anyhow because I ran out of budget on my GPU account) |
#152 now contains the new dynamic parameters. I'm running a few more evaluations (including the baseline) to make sure they're, in fact, better than what we had. Expect those results in 2-4 hours. |
So an improvement of 1.5K with the qualification time limit, and around 1K using the final time limit. |
There are lots of parameters to the HGS algorithm, not many of which seem to be tuned particularly well. At some point, we should run a tuning tool (e.g., smac) to determine a good set of parameters.
Parameters, in this sense, also includes "which operators to use" (see also e.g. #32).
The text was updated successfully, but these errors were encountered: