Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter optimization #128

Merged
merged 33 commits into from
Jan 28, 2016

Conversation

acrellin
Copy link
Member

Add estimator hyperparameter optimization capability; misc. performance improvements and bug fixes. Fixes #103

@acrellin
Copy link
Member Author

Still need to improve the web UI, but this works for now.

The model/estimator whose hyperparameters are to be optimized.
model_params : dict or list of dict
Dictionary with parameter names as keys and lists of parameter values
to try as values, or a list of such dictionaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this a list of dicts versus a dict? Trying to wrap my head around the different use cases...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never will be in the webapp usage, but package-API users who know a bit about hyperparameter optimization may want to utilize this - check out http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html

@acrellin
Copy link
Member Author

Sphinx is suddenly failing in the Drone build... All tests passing locally though.

@bnaul
Copy link
Contributor

bnaul commented Dec 22, 2015

👹

Did you rebase everything after the latest merge of doc stuff? The error I see in Drone shouldn't be happening with the new import trickery.

@acrellin
Copy link
Member Author

I agree - but only the last few commits are failing and I definitely didn't
change anything Sphinx or Drone-related... Take a look at
http://nipy.bic.berkeley.edu:8080/github.com/mltsp/mltsp/hyperparameter_optimization/5a5e0a30c3c181c27168930add729f417c44525c
Then after that it started failing.

On Tue, Dec 22, 2015 at 2:30 PM, Brett Naul notifications@github.com
wrote:

[image: 👹]

Did you rebase everything after the latest merge of doc stuff? The error I
see in Drone shouldn't be happening with the new import trickery.


Reply to this email directly or view it on GitHub
#128 (comment).

@bnaul
Copy link
Contributor

bnaul commented Dec 22, 2015

Does anything happen if you git pull --rebase [your_upstream] master locally? Something with this branch being out-of-date seems by far the most likely culprit.

@acrellin
Copy link
Member Author

Ok, just rebased again and pushed - looks happy now.

@acrellin
Copy link
Member Author

What do you all think of the idea of the optimize-hyperparameters-and-fit-to-training-data function returning the the best-fitting estimator only, as opposed to the GridSearchCV object, which contains the estimator as an attribute and implements the predict and predict_proba methods? I can see arguments for both sides, like easier access to the classes_, etc. attributes vs. retaining potentially useful info like grid_scores.

dest_type) or \
(params_to_optimize and k in params_to_optimize and \
isinstance(ast.literal_eval(model_params[k]), list) \
and (type(x) in dest_types_list for x in \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes...guess there's no way for this to be simple but maybe it would help a bit to define ast.literal_eval(model_params[k]) before the if block. Also, should there be an all around the last list comprehension?

@bnaul
Copy link
Contributor

bnaul commented Dec 31, 2015

@acrellin I can also see arguments for both sides but I guess I'd vote for returning the GridSearchCV object, it's easy enough to extract the model but going the other direction would be impossible.

@bnaul
Copy link
Contributor

bnaul commented Jan 21, 2016

This looks good to me 👍; wanna rebase so we can see that sweet green 💵?

@bnaul
Copy link
Contributor

bnaul commented Jan 21, 2016

This failed the first time for some reason (said RethinkDB wasn't running) but re-building fixed it.

@acrellin
Copy link
Member Author

I've seen that before, too... A bit troubling


"""
# To fit with fixed, non-optimized params, must be wrapped in list
if isinstance(model_params, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(model_params, dict):
    model_params = [model_params]
for param_dict in model_params:
    ...

@acrellin
Copy link
Member Author

All tests are passing on my machine (both with Python 3.5 and a clean 2.7 virtual env), and none of the relevant data files have changes between http://nipy.bic.berkeley.edu:8080/github.com/mltsp/mltsp/hyperparameter_optimization/6761809af2e20bafe7c3ec7b9b415047382f7134 and http://nipy.bic.berkeley.edu:8080/github.com/mltsp/mltsp/hyperparameter_optimization/fa3c460e84b35c9f2c28f80f80110efc2595a3af. Anyone have any ideas re: the sudden xray errors?

@acrellin
Copy link
Member Author

@stefanv @bnaul Please take a look and let me know what you think when you have a chance.

@bnaul
Copy link
Contributor

bnaul commented Jan 27, 2016

👍 @stefanv ?

@@ -199,3 +208,20 @@ def extract_data_archive(archive_path, extract_dir=None):
file_paths = [f for f in all_paths if not os.path.isdir(f)]
archive.close()
return file_paths


def robust_literal_eval_dict(input_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, how about:

def robust_literal_eval(value):
    try:
        return ast.literal_eval(value)
    except:
        return value

and then

d = {k: robust_literal_eval(v) for k, v in some_dict.items()} 

@stefanv
Copy link
Contributor

stefanv commented Jan 28, 2016

👍 looks really good--thanks for your patience with us, Ari :)

stefanv added a commit that referenced this pull request Jan 28, 2016
@stefanv stefanv merged commit 8aa727c into cesium-ml:master Jan 28, 2016
@acrellin acrellin deleted the hyperparameter_optimization branch November 22, 2016 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add functionality for model hyperparameter optimization & add corresponding input fields in browser UI
3 participants