New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skorch + HyperbandSearchCV Example #664
Comments
I know that @stsievert has a full example somewhere, but I wasn't able to easily find it in his repositories. Hopefully he can find the link. @ToddMorrill you're passing NumPy arrays to Can you provide a full example & traceback that gives a |
It's the "image-denoising" model at https://github.com/stsievert/dask-hyperband-comparison/. This is a PyTorch model, defined in image-denoising/autoencoder.py.
I also tune the batch size in my example. The batch size is used by PyTorch internals for optimization to approximate the loss function's gradient. The relevant line in my hyperparameters is params = {
...
'batch_size': [32, 64, 128, 256, 512],
} The notebook that tunes the hyperparameters is at image-denoising/Run.ipynb
I think Dask arrays should be passed too, but it looks like that's okay: dask-ml/dask_ml/model_selection/_incremental.py Lines 162 to 165 in e6183d6
|
Thanks for the rapid response @TomAugspurger, @stsievert. I actually got unlucky with my choice of chunk size. Essentially, the last chunk had a size of 1 (i.e. len(X) % chunks == 1), which clearly doesn't leave any data to train/test on. I changed chunks and resolved that issue. I'm now facing a new error I suspect this has to do with the number of cross validation splits and/or chunk size, but
In my case How do I choose chunks/batches to avoid these issues? Can I lower the cross validation to 3 instead of 5? |
What's the traceback for the |
Here's everything
|
Thanks for the traceback. This error looks to be internal to your model. It looks like the Skorch model is doing some of it's own cross validation. It looks like passing I passed the same parameters in image-denoising/Run.ipynb: model = TrimParams( # wrapper around NeuralNetRegressor
module=Autoencoder,
criterion=torch.nn.BCELoss,
warm_start=True,
train_split=None,
max_epochs=1,
callbacks=[]
) |
Fantastic. That did it! I'm up and running now. Thank you. |
@stsievert I have some questions on some rules of thumb for hyperband. Let's say my development dataset (train+validation) is 3000 data points and let's say that I would typically train on 80% of that data and validate on 20% and further, I'd make 5 passes through my dataset. In other words, my model converges when it sees 5*3000*.8 = 12000 data points. My grid search parameter dictionary yields 24 unique combinations. Let's say 250 data points is a good chunk size. Then the number of The results from Hyperband look great! The reason I ask is that randomized grid search in Skorch takes less time (i.e. restricted to 8 parameter combinations * 3 cross validation splits for 5 passes through my training data) than the strategy I cited above for hyperband. Let's say I want to only try out 8 of my 24 parameter combinations in hyperband, how do I do that while being mindful of the number of training data points necessary for convergence? |
When I use Hyperband, I specify the chunk size according to this rule of thumb: https://ml.dask.org/hyper-parameter-search.html#hyperband-parameters-rule-of-thumb.
A space of 24 total parameters doesn't sound like much. Are they all discrete, or are some continuous? I ask because it doesn't sound like the search is "compute bound" as mentioned at https://ml.dask.org/hyper-parameter-search.html#scaling-hyperparameter-searches. I tend to favor using continuous parameters rather than discrete parameters after reading "Random search for hyper-parameter optimization" by Bergstra and Bengio. ## not preferred
# params = {
# "lr": [1e-3, 1e-2, 1e-1, 1e0],
# "weight_decay": [1e-5, 1e-4],
# "alpha": [1e-3, 1e-2, 1e-1],
# }
## preferred
from scipy.stats import loguniform
params = {
"lr": loguniform(1e-3, 1e0), # preferred
"weight_decay": loguniform(1e-5, 1e-4),
"alpha": loguniform(1e-3, 1e-1),
} |
Good stuff, thank you. I think that dialed it in. Now I'm getting high quality results in much less time. I'm just running some experiments on my laptop but will scale up to a bigger search space on another machine soon. Currently it's discrete but will give the continuous sampling a try for the ones where it makes sense (e.g. dropout). |
@ToddMorrill could you re-open this issue? Judging by your experience it'd be really useful to include an example of Skorch + Hyperband in dask-examples. I'd certainly appreciate a PR! |
Sure, let me see if I can clean up my example. |
I'd be happy to clean up your example too. If you could make a PR, I could send in another PR to your branch. |
Can you post a working example that uses Skorch and HyperbandSearchCV? I haven't been able to find an actual working example.
The biggest challenge I've faced so far is how to determine the batch size being fed to the model. It's unclear if that is the chunk size and beyond that, it's unclear how Skorch's batching interacts with that.
If I run
search.fit(X, y)
with numpy arrays, the chunk size is massive and the grid search is very slow. If I try to chunk X, y in dask arrays, I get the following error:ValueError: With n_samples=1, test_size=0.058823529411764705 and train_size=0.9411764705882353, the resulting train set will be empty. Adjust any of the aforementioned parameters.
The text was updated successfully, but these errors were encountered: