New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use make_column_selector where appropriate. #92
Use make_column_selector where appropriate. #92
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a single question. Otherwise good to be merged.
'workclass', 'education', 'marital-status', 'occupation', | ||
'relationship', 'race', 'native-country', 'sex'] | ||
categorical_columns_selector = selector(dtype_include=object) | ||
categorical_columns = categorical_columns_selector(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can split using a new cell here just to show the output of using the selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we do it already in some notebooks previously e.g. here:
https://inria.github.io/scikit-learn-mooc/python_scripts/03_categorical_pipeline.html#working-with-categorical-variables
I am wondering whether it is better to do in all notebooks or only in one of the beginning at the beginning.
Yep it is true that it is the fourth notebook.
This is also fine.
…On Tue, 17 Nov 2020 at 15:41, Loïc Estève ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python_scripts/04_parameter_tuning_search.py
<#92 (comment)>
:
> from sklearn.preprocessing import OrdinalEncoder
-categorical_columns = [
- 'workclass', 'education', 'marital-status', 'occupation',
- 'relationship', 'race', 'native-country', 'sex']
+categorical_columns_selector = selector(dtype_include=object)
+categorical_columns = categorical_columns_selector(data)
I guess we do it already in some notebooks previously e.g. here:
https://inria.github.io/scikit-learn-mooc/python_scripts/03_categorical_pipeline.html#working-with-categorical-variables
[image: image]
<https://user-images.githubusercontent.com/1680079/99403694-0ae2e680-28eb-11eb-8e5a-35884bd6bc6e.png>
I am wondering whether it is better to do in all notebooks or only in one
of the beginning at the beginning.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#92 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32P6JAKZLLFUPD57ZBCTSQKDR3ANCNFSM4TYN2GJQ>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
I left the one in the data exploration notebook because at this stage we don't want to introduce
make_column_selector
.