New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise documentation on hyperparameter search #88
Comments
Perhaps it would also make sense to change the benchmarks below to iterate over a small fixed set of values for random_state so that the benchmark is as apples to apples as possible? |
@jimmywan it would be great to get updated benchmarks. Is this something that you would feel comfortable contributing? |
My primary environment is a Ubuntu 14.04 VM running on a Windows 10 host so I was thinking you might want less variables involved, but if that's not a concern, I'd be happy to try and help out a bit. From what I have read, I like what you guys have been doing and/or the direction you're trying to go with dask-* so happy to help. |
I presume the differences would cancel eachother out, but perhaps not. Either way, I'd be happy to run the benchmark as well if you're able to put one together. Also, this issue may interest you: scikit-learn/scikit-learn#10068 |
Took a stab at this and here were the results. Hope this helps. Results
Benchmark codeI started with this, and made some modifications that I thought would make the benchmarks more comparable:
I didn't put a whole lot more effort into changing the pipeline. I'm not super familiar with classification tasks or this dataset.
EnvironmentPython virtualenv contents:
Host OS details:
VM details:
Guest OS details:
|
Thanks @jimmywan! That'll be very helpful. I'd like to add a "Benchmarks" or "Performance" section to
Are you interested in making a PR with that? Otherwise, I'll get to it by the end of the week. |
@TomAugspurger I don't think I have time to put more work into it this week. Feel free to take and modify as you see fit. |
Right around the time that I stumbled upon dask-searchcv, scikit-learn 0.19 was released where pipelines now support the memory parameter:
http://scikit-learn.org/stable/whats_new.html#version-0-19
As such, perhaps this section of the docs should be revised:
I'd be interested to hear if any work was done to compare/contrast dask-searchcv vs scikit-learn's changes in 0.19. Presumably dask-searchcv allows you to more easily harness multiple machines, but perhaps some rudimentary benchmarks could/would be interesting/appropriate.
From:
http://dask-ml.readthedocs.io/en/latest/hyper-parameter-search.html#efficient-search
The text was updated successfully, but these errors were encountered: