Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearchCV for data transformers? #24

Closed
monktastic opened this issue May 18, 2017 · 6 comments
Closed

GridSearchCV for data transformers? #24

monktastic opened this issue May 18, 2017 · 6 comments

Comments

@monktastic
Copy link

There's a note that says:

Don’t forget that you can treat some of the data preparation steps as hyperparameters. For example, the grid search will automatically find out whether or not to add a feature you were not sure about (e.g., using the add_bedrooms_per_room hyperparameter of your CombinedAttributesAdder transformer).

It's not obvious to me how to do this. The given example only shows how to search the parameters of the model, not the data pipeline.

Looking online, it seems that one way is to add the model to the pipeline, and then identify the hyperparameters as stagename__hyperparamname?

Thanks for your fantastic book!

@monktastic
Copy link
Author

Ah, I now see that the exercise 5 solution shows us how to do this.

@ageron
Copy link
Owner

ageron commented May 19, 2017

Hey, thanks for the kind words, it's always nice to hear! :)
I'm glad you found the answer to your question, indeed the exercise 5 covers this.
Cheers!

@ageron ageron closed this as completed May 19, 2017
@monktastic
Copy link
Author

monktastic commented May 19, 2017

If I may make a suggestion: it may be helpful to change the wording so that it's clear that this is an exercise, not something we've already learned ("don't forget that..."; "the grid search will automatically..."). And that it's not something we can figure out on our own, but requires us to look up the solution (either once we've gotten to the exercises, or -- since we don't know that a question and answer will be forthcoming -- somewhere on the internet).

@ageron
Copy link
Owner

ageron commented May 20, 2017

I think I wrote "don't forget" because I already mentioned the idea at the top of page 63:

In this example the transformer has one hyperparameter, add_bedrooms_per_room ,
set to True by default (it is often helpful to provide sensible defaults). This hyperparameter will allow you to easily find out whether adding this attribute helps the Machine Learning algorithms or not. More generally, you can add a hyperparameter to gate any data preparation step that you are not 100% sure about. The more you automate these data preparation steps, the more combinations you can automatically try out, making it much more likely that you will find a great combination (and saving you a lot of time).

That said, it may indeed help to be more explicit, thanks for the suggestion.

@monktastic
Copy link
Author

Yes, indeed you did. I think what confused me was that we'd only seen GridSearchCV called on an instance of the regressor class, and I don't think we'd seen an example of a pipeline that contained both a transformer and regressor, so it wasn't clear how it was possible to apply it to a transformer.

@ageron
Copy link
Owner

ageron commented May 21, 2017

Got it, thanks for your feedback, I'll see what I can do to clarify this for future editions. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants