-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multidimensional vw hyperparameter optimization with hyperopt #867
Conversation
Multidimensional vw hyperparameter optimization with hyperopt
This seems like an obviously good idea and it's entirely isolated so I Ariel has a number of good suggestions. I'm sure working through them -John P.S. If you have happen to be going to NIPS, tell me and we'll squeeze you On Tue, Nov 24, 2015 at 6:08 PM, Кирилл Владимирович Рыбачук <
|
Hi Ariel, thank you for the response! Here are the answers to your comments:
Nevertheless, it seems like I can unify the syntax of the command-line. Intuitively, this expression: So, should I: (a) change separator of range and distribution options? (b) make On top of this, |
@JohnLangford @arielf Thank you for merging! I'll work through your suggestions and ideas about improving it. |
All functionality (and even a bit more) and almost all syntax is in the line with @martinpopel 's suggestion here: https://github.com/martinpopel/vowpal_wabbit/wiki/vw-hyperopt-plans.
See also SO questions:
http://stackoverflow.com/questions/33262598/get-holdout-loss-in-vowpal-wabbit
http://stackoverflow.com/questions/33242742/multidimensional-hyperparameter-search-with-vw-hypersearch-in-vowpal-wabbit
Thank you @martinpopel @arielf for inspiration! I did it!
Unlike
vw-hypersearch
,vw-hyperopt.py
can be multidimensional. It implements Tree of Parzen Estimators and Random search algorithms fromhyperopt
. TPE uses adaptive sampling strategy and addresses explore-exploit dilemma. There is some research showing that TPE may return better hyperparameter configuration than manual expert choice or grid search.Here is an example of using
vw-hyperopt
:python vw-hyperopt.py --train train.dat --holdout holdout.dat --max_evals 200 --outer_loss_function logistic --vw_space '--algorithms=[ftrl,sgd] --l2=[1e-8..1e-4]LO --l1=[1e-8..1e-4]LO -l=[0.01..10]L --ftrl_alpha=[5e-5..5e-1]L --ftrl_beta=[0.01..1] --passes=[1..10]I -q=[SE+SZ+DR,SE]O --ignore=[T]O --loss_function=[logistic] -b=[29]'
The quoted part after
--vw_space
flag means literally the same thing as @martinpopel described. It is converted tohyperopt
tree-like search space. There is even additional functionality: you can list different combinations of quadratic features, just separating namespace combinations by "+" symbol (see an example).Another new functionality is that you can optimize hyperparameters with respect to a custom metric, such as ROC-AUC, that can be different from inner vw loss function. This can be useful sometimes. The flag corresponding to this is
--outer_loss_function
. By now, onlylogistic
(default) androc-auc
are implemented, but this list can be easily expanded.Yet another additional feature is that you can specify several algorithms at once. They will be converted to
hyperopt.hp.choice()
object. Currently SGD and FTRL-Proximal are supported. If there are prohibited flags, such as--ftrl_alpha
for SGD, they will simply be excluded fromhyperopt
search space for this particular algorithm.You can also specify the maximum number of hyperparameter combinations to explore with
--max_evals
(default=100).The possible (non-critical, I hope) problems and ways to improve my module, as I see it now:
vw-hyperopt
.--holdout_off
flag in order to make use all the training data, and also to make sure that model is always saved (see the mentioned issue).scikit-learn
methods using these files.