Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
WIP: better adapt HyperbandSearchCV for exploratory searches #532
This PR is not ready to be merged. I still need to verify the implementation.
What does this PR implement?
These two features mean that the user can exit out of hyper-parameter selection when the score is high enough.
This histogram is from the 200 runs of
I'll take your question as "how can the search be made more aggressive?"
The aggressiveness of Hyperband's early stopping scheme can be configured (which I accidentally forgot to mention in the talk!*). This is controlled by
It's also possible to only run bracket 4:
from dask_ml.model_selection import HyperbandSearchCV, SuccessiveHalvingSearchCV search = HyperbandSearchCV(model, params, max_iter=max_iter) bracket_params = search.metadata["brackets"]["SuccessiveHalvingSearchCV params"] bracket = SuccessiveHalvingSearchCV(model, params, **bracket_params)
This will require the user deciding how important the training data is, and which bracket to choose.
* I have removed that typo from the slides
Aggressiveness is not what I meant, that's just the base of the "halving", right?
What I meant was what you showed last. The paper basically says that just running the bracket that puts the least emphasis on the amount of training data basically always wins, and in practice Successive Halving is basically as good as Hyperband.
Good. Yes, almost always* (there's one case where the second most exploratory bracket is optimal, and it has a narrow search space). The other experiments all run Hyperband (via
Looks like repeating the most exploratory bracket finds slight lower losses than Hyperband on average, at least in their example. I've also simulated this with my data (the same example as #532 (comment)):
Here "2 repeats" means "2 repeats of the most exploratory bracket". It looks like 2 or 3 runs of the most exploratory bracket recovers the Hyperband performance, at least for this example.
I'll run some more simulations to see how this performs over time. I'll probably edit the search space a bit to make it more difficult for the most adaptive bracket to be optimal.
* The one exception finds the second most aggressive bracket is optimal in Figure 4 (which runs LeNet)
They're the same thing. "Repeating the most exploratory bracket two times" is "running two instances of that bracket in parallel". i.e., "2 repeats" means "running 2 copies of the most exploratory bracket in parallel".
#532 (comment) is only concerned with the final score, not any of the intermediate scores. I plan to run those simulations somewhat soon.
The plot above is an estimate of the final validation score for running a different number of the most exploratory bracket in parallel. It's a vertical histogram, so the x-axis is the frequency from my simulations.
Earlier, I simulated Hyperband 200 times so I also have the results from 200 runs of the most exploratory bracket (shown in "1 repeat"). The other plots (with more repeats) are generated by pulling from that histogram, and taking the maximum of the values pulled. Something like
def simulate(final_scores, repeats): return np.random.choice(final_scores, size=repeats).max() n = 200 # I ran Hyperband 200 times in an earlier simulation # For each run, pull the best validation score from bracket=4 one_run = [simulation_results(i, bracket=4) for i in range(n)] # values in "1 repeat". two_runs = [simulate(one_run, 2) for _ in range(1000)] three_runs = [simulate(one_run, 3) for _ in range(1000)]
The plot I showed above is only for the final score. It does not characterize how the score changes over time.
Adding more runs is not the same as repeating a bracket I think, because you'd get the best 50% of the overall, not the best 50% from each repeat. So combining them would be less randomized.
Hyperband is as expensive as 4 repeats but it's worse when looking at the histogram, right? It looks even worse than two repeats to me. So doing repeats seems better than hyperband according to your histogram, right?