Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Impact of simulation-solution quality on rollout performance #100

Closed
leonlan opened this issue Sep 29, 2022 · 17 comments
Closed

Impact of simulation-solution quality on rollout performance #100

leonlan opened this issue Sep 29, 2022 · 17 comments
Assignees
Labels
dynamic research Research related discussions rollout

Comments

@leonlan
Copy link
Collaborator

leonlan commented Sep 29, 2022

To investigate

  1. Run $N$ simulations with very good solution quality and $N$ simulation with not so good solution quality, what do we see? (to try this out, we can just set the environment time limit really high)
    • 25 iterations gets us to roughly 10% gap. 20s/2000 iters get us to roughly 2% gap (for 400-500 cust instances)
  2. (WIP) Increase the lookaheads to 8 (which is the maximum). Use 120s time limits, because otherwise the number of simulations may be too small. Other setting are the same.

Observations

  • Running more simulations did not help (60s epochs vs 120s epochs). This seems to suggest we already run enough simulations, which is surprising since it's only 50-150 per epoch.
@leonlan leonlan self-assigned this Sep 29, 2022
@leonlan leonlan added dynamic rollout research Research related discussions labels Sep 29, 2022
@leonlan
Copy link
Collaborator Author

leonlan commented Sep 30, 2022

The table below shows the average results over 10 seeds for [a-d]*-id instances for greedy, rollout and 'rollout-long', 60 second "epoch time limit". See this commit if you want to see the adjustments in detail.

Rollout-long is rollout where the simulation instances are solved using 500 iterations with HGS and slightly adjusted config. The number of simulations between rollout and rollout-long is similar so that the only difference is the quality of the solution.

There is not really an epoch time limit for rollout long. The only restriction is that the dispatch instance is solved in half the epoch time, just like rollout.

image

Having better quality simulation-solutions gives a ~400 improvement. This is not a lot, given that the solutions from rollout long are quite a bit better than rollout. I don't have hard numbers on this, but these "long" solution are on average 5% better for the first few epochs. In the later epochs this goes down to 0.1-0.5%, because there are the simulation instances become smaller.

@N-Wouda
Copy link
Owner

N-Wouda commented Sep 30, 2022

@leonlan what am I looking at in the above results? I guess dynamic performance, but how did you increase the epoch time limit to allow for more iterations?

@leonlan
Copy link
Collaborator Author

leonlan commented Sep 30, 2022

@N-Wouda I updated my earlier comment! There's some merit in improving the solution quality, but the current experiment shows that this is only very little. I'll try a few more experiments in the next days.

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

I ran a similar experiment to one mentioned earlier. 10 seeds, [a-d]* instances. But this time I used N_LOOKAHEAD=8 and 120 epoch time limits.

image

Just like before, rollout long improves rollout but only by a little (500-ish).

The results are very similar to the previous experiment, even though we have 120 epoch time limits and more lookaheads. As already said before somewhere, more simulations generally don't lead to better improvements in rollout (i.e., we already have enough simulations). Moreover, more lookahead periods than 3 don't produce a lot more requests and those extra requests are hard to combine (tight time windows and release dates).

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

In summary:

  • Solving simulations instances much better only yields a small improvement on reward. In the numerical experiments I ran, this was about 400-500 points on average using 4x more computation time. This seems to indicate that having "good enough" solutions is sufficient, even though they may be 10-15% off from the optimal solution. The general conclusion is: improve solving the simulation instance where possible, but don't overdo it. Here are some low-effort ideas to improve simulation instance quality:
    • Initialize solutions using the postponed routes from previous epoch.
    • Improve release time-based checks in the static solver. Most things, like neighborhood computation, greedy repair, etc., do not take into account release times yet. This is an easy win.
    • Re-use previous simulation as starting solution.
  • Increasing N_LOOKAHEAD doesn't seem to help a lot. So this is a parameter tuning problem.
  • Increasing the number of simulations doesn't seem to be a lot either. This means that we already simulate enough with the current configuration.

An important note is that all these observations have to be interpreted within the context of the threshold dispatching criteria. If we use some other dispatching criteria based on costs, this story may result in a completely different story. See #101 for more on that.

I leave this issue open for a while to discuss these experiments. Also let me know if you have any suggestions for numerical experiments to try out.

@N-Wouda
Copy link
Owner

N-Wouda commented Oct 1, 2022

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

@N-Wouda
Copy link
Owner

N-Wouda commented Oct 1, 2022

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

Sure. We should keep greedy as baseline reference but we can just make a module for that as well.

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

I changed DISPATCH_THRESHOLD from 0.15 to 0.20. Other factors remained the same as in the last experiment.
image

The differences between rollout and rollout long are now much larger. There's now a 1500 difference, up from 500 before. Interesting! What's also interesting: rollout 0.15 vs rollout 0.20 threshold is still about the same.

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

I experimented with a few more threshold values. The setup is:

  • instances [a-d]* (enough instances to be representative of benchmark results)
  • 10 seeds
  • greedy, rollout, and rollout long (better simulation solutions). Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

The thresholds used for rollout are 15% (original), 20%, 30% and 40%. There is one slightly confusing thing due to a mistake of mine:

  • 15 and 20% were run with 120s epoch time limits
  • 30 and 40% were run with 60s epoch time limits

AFAICT the time limit does not play a huge role because the only significant difference is how many simulations you get, but this is often already enough in the 60s epoch time limit.

Here is a summarized table of results:
image

The best scenario, rollout_long30, is now 20K improving over greedy! Insane. That means that there is definitely merit in trying to obtain better solutions.

Crazy stuff! I'm stoked 😄

@jmhvandoorn
Copy link
Collaborator

Very nice!! Could you share the exact configuration you are using for rollout_long30?

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

Very nice!! Could you share the exact configuration you are using for rollout_long30?

Here you can find it:

sim_sol_long, cost_long, is_feas_long = solve_simulation(
sim_inst,
500, # 500 iterations is often around 5% from bks.
initialTimeWarpPenalty=50,
nbPenaltyManagement=20,
generationSize=30,
minPopSize=20,

Note that it still uses an older rollout code version (but with same performance). The solutions with rollout_long are between 2-8% better than rollout in the earlier epochs, in the later epochs this falls off because the instances become smaller. Actually, I even fall-back on regular rollout if rollout long does not improve the solutions. But these are all (insignificant) details, I hope 😅.

@jmhvandoorn
Copy link
Collaborator

jmhvandoorn commented Oct 1, 2022

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

@jmhvandoorn
Copy link
Collaborator

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 1, 2022

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

That's correct. These experiments show that, if we can get "rollout long quality" solutions within 60s, then we can improve our dynamic performance even further. At the moment, this is not possible, but to be honest, this is not a very big challenge to achieve (see my comment above with 3 suggestions on how to improve solution quality). I also think that tuning HGS will work, but I have a very bad understanding of how to tune these parameters.

@leonlan
Copy link
Collaborator Author

leonlan commented Oct 10, 2022

Unfortunately, the results for rollout_long30 were incomplete. A single instance set was missing, meaning that the comparison of results was faulty since the other methods did have results for all the instance sets. The sole reason why rollout_long30 had better results was simply because of missing data.

This means that my conclusion that improving the simulation solution quality could lead to drastic improvements was incorrect :-( There are small gains (1-2K) which are already merged by #132.

I'll close this issue, but feel free to open again when relevant.

@leonlan leonlan closed this as completed Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dynamic research Research related discussions rollout
Projects
None yet
Development

No branches or pull requests

3 participants