Impact of simulation-solution quality on rollout performance #100

leonlan · 2022-09-29T15:09:02Z

To investigate

Run $N$ simulations with very good solution quality and $N$ simulation with not so good solution quality, what do we see? (to try this out, we can just set the environment time limit really high)
- 25 iterations gets us to roughly 10% gap. 20s/2000 iters get us to roughly 2% gap (for 400-500 cust instances)
(WIP) Increase the lookaheads to 8 (which is the maximum). Use 120s time limits, because otherwise the number of simulations may be too small. Other setting are the same.

Observations

Running more simulations did not help (60s epochs vs 120s epochs). This seems to suggest we already run enough simulations, which is surprising since it's only 50-150 per epoch.

leonlan · 2022-09-30T11:10:57Z

The table below shows the average results over 10 seeds for [a-d]*-id instances for greedy, rollout and 'rollout-long', 60 second "epoch time limit". See this commit if you want to see the adjustments in detail.

Rollout-long is rollout where the simulation instances are solved using 500 iterations with HGS and slightly adjusted config. The number of simulations between rollout and rollout-long is similar so that the only difference is the quality of the solution.

There is not really an epoch time limit for rollout long. The only restriction is that the dispatch instance is solved in half the epoch time, just like rollout.

Having better quality simulation-solutions gives a ~400 improvement. This is not a lot, given that the solutions from rollout long are quite a bit better than rollout. I don't have hard numbers on this, but these "long" solution are on average 5% better for the first few epochs. In the later epochs this goes down to 0.1-0.5%, because there are the simulation instances become smaller.

N-Wouda · 2022-09-30T12:47:56Z

@leonlan what am I looking at in the above results? I guess dynamic performance, but how did you increase the epoch time limit to allow for more iterations?

leonlan · 2022-09-30T15:07:52Z

@N-Wouda I updated my earlier comment! There's some merit in improving the solution quality, but the current experiment shows that this is only very little. I'll try a few more experiments in the next days.

leonlan · 2022-10-01T10:06:39Z

I ran a similar experiment to one mentioned earlier. 10 seeds, [a-d]* instances. But this time I used N_LOOKAHEAD=8 and 120 epoch time limits.

Just like before, rollout long improves rollout but only by a little (500-ish).

The results are very similar to the previous experiment, even though we have 120 epoch time limits and more lookaheads. As already said before somewhere, more simulations generally don't lead to better improvements in rollout (i.e., we already have enough simulations). Moreover, more lookahead periods than 3 don't produce a lot more requests and those extra requests are hard to combine (tight time windows and release dates).

leonlan · 2022-10-01T10:20:29Z

In summary:

Solving simulations instances much better only yields a small improvement on reward. In the numerical experiments I ran, this was about 400-500 points on average using 4x more computation time. This seems to indicate that having "good enough" solutions is sufficient, even though they may be 10-15% off from the optimal solution. The general conclusion is: improve solving the simulation instance where possible, but don't overdo it. Here are some low-effort ideas to improve simulation instance quality:
- Initialize solutions using the postponed routes from previous epoch.
- Improve release time-based checks in the static solver. Most things, like neighborhood computation, greedy repair, etc., do not take into account release times yet. This is an easy win.
- Re-use previous simulation as starting solution.
Increasing N_LOOKAHEAD doesn't seem to help a lot. So this is a parameter tuning problem.
Increasing the number of simulations doesn't seem to be a lot either. This means that we already simulate enough with the current configuration.

An important note is that all these observations have to be interpreted within the context of the threshold dispatching criteria. If we use some other dispatching criteria based on costs, this story may result in a completely different story. See #101 for more on that.

I leave this issue open for a while to discuss these experiments. Also let me know if you have any suggestions for numerical experiments to try out.

N-Wouda · 2022-10-01T10:29:26Z

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

N-Wouda · 2022-10-01T10:31:44Z

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

leonlan · 2022-10-01T11:09:05Z

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

Sure. We should keep greedy as baseline reference but we can just make a module for that as well.

leonlan · 2022-10-01T13:03:15Z

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

I changed DISPATCH_THRESHOLD from 0.15 to 0.20. Other factors remained the same as in the last experiment.

The differences between rollout and rollout long are now much larger. There's now a 1500 difference, up from 500 before. Interesting! What's also interesting: rollout 0.15 vs rollout 0.20 threshold is still about the same.

leonlan · 2022-10-01T19:03:59Z

I experimented with a few more threshold values. The setup is:

instances [a-d]* (enough instances to be representative of benchmark results)
10 seeds
greedy, rollout, and rollout long (better simulation solutions). Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

The thresholds used for rollout are 15% (original), 20%, 30% and 40%. There is one slightly confusing thing due to a mistake of mine:

15 and 20% were run with 120s epoch time limits
30 and 40% were run with 60s epoch time limits

AFAICT the time limit does not play a huge role because the only significant difference is how many simulations you get, but this is often already enough in the 60s epoch time limit.

Here is a summarized table of results:

The best scenario, rollout_long30, is now 20K improving over greedy! Insane. That means that there is definitely merit in trying to obtain better solutions.

Crazy stuff! I'm stoked 😄

jmhvandoorn · 2022-10-01T19:10:10Z

Very nice!! Could you share the exact configuration you are using for rollout_long30?

leonlan · 2022-10-01T19:11:45Z

Very nice!! Could you share the exact configuration you are using for rollout_long30?

Here you can find it:

Euro-NeurIPS-2022/dynamic/rollout/rollout_long.py

Lines 52 to 58 in 5cc1067

    
           sim_sol_long, cost_long, is_feas_long = solve_simulation( 
        
               sim_inst, 
        
               500,  # 500 iterations is often around 5% from bks. 
        
               initialTimeWarpPenalty=50, 
        
               nbPenaltyManagement=20, 
        
               generationSize=30, 
        
               minPopSize=20,

Note that it still uses an older rollout code version (but with same performance). The solutions with rollout_long are between 2-8% better than rollout in the earlier epochs, in the later epochs this falls off because the instances become smaller. Actually, I even fall-back on regular rollout if rollout long does not improve the solutions. But these are all (insignificant) details, I hope 😅.

jmhvandoorn · 2022-10-01T19:15:37Z

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

leonlan · 2022-10-01T19:17:42Z

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

jmhvandoorn · 2022-10-01T19:20:05Z

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

leonlan · 2022-10-01T19:23:28Z

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

That's correct. These experiments show that, if we can get "rollout long quality" solutions within 60s, then we can improve our dynamic performance even further. At the moment, this is not possible, but to be honest, this is not a very big challenge to achieve (see my comment above with 3 suggestions on how to improve solution quality). I also think that tuning HGS will work, but I have a very bad understanding of how to tune these parameters.

leonlan · 2022-10-10T13:53:02Z

Unfortunately, the results for rollout_long30 were incomplete. A single instance set was missing, meaning that the comparison of results was faulty since the other methods did have results for all the instance sets. The sole reason why rollout_long30 had better results was simply because of missing data.

This means that my conclusion that improving the simulation solution quality could lead to drastic improvements was incorrect :-( There are small gains (1-2K) which are already merged by #132.

I'll close this issue, but feel free to open again when relevant.

leonlan self-assigned this Sep 29, 2022

leonlan added dynamic rollout research Research related discussions labels Sep 29, 2022

leonlan mentioned this issue Oct 1, 2022

Improve rollout dispatching criteria #101

Closed

leonlan closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impact of simulation-solution quality on rollout performance #100

Impact of simulation-solution quality on rollout performance #100

leonlan commented Sep 29, 2022 •

edited

Loading

leonlan commented Sep 30, 2022 •

edited

Loading

N-Wouda commented Sep 30, 2022

leonlan commented Sep 30, 2022

leonlan commented Oct 1, 2022

leonlan commented Oct 1, 2022 •

edited

Loading

N-Wouda commented Oct 1, 2022 •

edited

Loading

N-Wouda commented Oct 1, 2022

leonlan commented Oct 1, 2022

leonlan commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading

jmhvandoorn commented Oct 1, 2022

leonlan commented Oct 1, 2022 •

edited

Loading

jmhvandoorn commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022

jmhvandoorn commented Oct 1, 2022

leonlan commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 10, 2022

Impact of simulation-solution quality on rollout performance #100

Impact of simulation-solution quality on rollout performance #100

Comments

leonlan commented Sep 29, 2022 • edited Loading

To investigate

Observations

leonlan commented Sep 30, 2022 • edited Loading

N-Wouda commented Sep 30, 2022

leonlan commented Sep 30, 2022

leonlan commented Oct 1, 2022

leonlan commented Oct 1, 2022 • edited Loading

N-Wouda commented Oct 1, 2022 • edited Loading

N-Wouda commented Oct 1, 2022

leonlan commented Oct 1, 2022

leonlan commented Oct 1, 2022 • edited Loading

leonlan commented Oct 1, 2022 • edited Loading

jmhvandoorn commented Oct 1, 2022

leonlan commented Oct 1, 2022 • edited Loading

jmhvandoorn commented Oct 1, 2022 • edited Loading

leonlan commented Oct 1, 2022

jmhvandoorn commented Oct 1, 2022

leonlan commented Oct 1, 2022 • edited Loading

leonlan commented Oct 10, 2022

leonlan commented Sep 29, 2022 •

edited

Loading

leonlan commented Sep 30, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading

N-Wouda commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading

jmhvandoorn commented Oct 1, 2022 •

edited

Loading

leonlan commented Oct 1, 2022 •

edited

Loading