Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to filter which features to encode for OneHotEncoder #1249

Merged
merged 7 commits into from
Oct 2, 2020

Conversation

angela97lin
Copy link
Contributor

Closes #1237.

I chose to raise a ValueError if any column in features_to_encode does not exist in the input DataFrame.

@angela97lin angela97lin self-assigned this Sep 30, 2020
@codecov
Copy link

codecov bot commented Sep 30, 2020

Codecov Report

Merging #1249 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1249   +/-   ##
=======================================
  Coverage   99.93%   99.93%           
=======================================
  Files         207      207           
  Lines       12997    13031   +34     
=======================================
+ Hits        12988    13022   +34     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/tests/component_tests/test_components.py 100.00% <ø> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.00% <ø> (ø)
...components/transformers/encoders/onehot_encoder.py 100.00% <100.00%> (ø)
...alml/tests/component_tests/test_one_hot_encoder.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fba86a...197daac. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review October 1, 2020 15:00
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I think this is great! Right now, we fit and transform on different columns which can cause bugs. I'm a fan of letting users fit/transform on the columns they pass in as features_to_encode to get around this but open to other solutions.

@angela97lin angela97lin added this to the October 2020 milestone Oct 1, 2020
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Thanks this looks great!

@angela97lin angela97lin merged commit 95e05e4 into main Oct 2, 2020
@angela97lin angela97lin deleted the 1237_ohe_filter branch October 2, 2020 16:52
@dsherry dsherry mentioned this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneHotEncoder: allow user to filter which features to encode
2 participants