Skip to content

Encode features with "unknown" class in categorical #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 16, 2018

Conversation

WillKoehrsen
Copy link
Contributor

Addresses #280 by checking for "unknown" class in a categorical column to be encoded and adding an unknown_token parameter to specify the string used to stand in for the unknown class. If "unknown" is present in the column and unknown_token="unknown", an AssertionError is raised telling the user to change the unknown_token parameter.

@kmax12 kmax12 self-requested a review October 15, 2018 16:48
@WillKoehrsen
Copy link
Contributor Author

WillKoehrsen commented Oct 15, 2018

Changed the fix to be much simpler: Unknown classes are now encoded as "{feature_name} is unknown". Thanks to @kmax12 for the suggestion.

defaults to True
to_encode (list[str]): List of feature names to encode.
features not in this list are unencoded in the output matrix
defaults to encode all necessary features.
inplace (bool): Encode feature_matrix in place. Defaults to False.
verbose (str): Print progress info.
unknown_token (str): token used to replace unknown class in categorical column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need this anymore

@kmax12
Copy link
Contributor

kmax12 commented Oct 15, 2018

fix linting and address my one comment, then it's good to merge.

@kmax12
Copy link
Contributor

kmax12 commented Oct 16, 2018

Looks good. Merging

@kmax12 kmax12 merged commit 4938bb5 into master Oct 16, 2018
@gsheni gsheni deleted the encode_features_unknown branch October 24, 2018 15:37
@rwedge rwedge mentioned this pull request Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants