Skip to content

Allow user to filter which features to encode for OneHotEncoder#1249

Merged
angela97lin merged 7 commits intomainfrom
1237_ohe_filter
Oct 2, 2020
Merged

Allow user to filter which features to encode for OneHotEncoder#1249
angela97lin merged 7 commits intomainfrom
1237_ohe_filter

Conversation

@angela97lin
Copy link
Copy Markdown
Contributor

Closes #1237.

I chose to raise a ValueError if any column in features_to_encode does not exist in the input DataFrame.

@angela97lin angela97lin self-assigned this Sep 30, 2020
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 30, 2020

Codecov Report

Merging #1249 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1249   +/-   ##
=======================================
  Coverage   99.93%   99.93%           
=======================================
  Files         207      207           
  Lines       12997    13031   +34     
=======================================
+ Hits        12988    13022   +34     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/tests/component_tests/test_components.py 100.00% <ø> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.00% <ø> (ø)
...components/transformers/encoders/onehot_encoder.py 100.00% <100.00%> (ø)
...alml/tests/component_tests/test_one_hot_encoder.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fba86a...197daac. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review October 1, 2020 15:00
Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I think this is great! Right now, we fit and transform on different columns which can cause bugs. I'm a fan of letting users fit/transform on the columns they pass in as features_to_encode to get around this but open to other solutions.

Comment thread evalml/tests/component_tests/test_one_hot_encoder.py
Comment thread evalml/pipelines/components/transformers/encoders/onehot_encoder.py
@angela97lin angela97lin added this to the October 2020 milestone Oct 1, 2020
Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Thanks this looks great!

@angela97lin angela97lin merged commit 95e05e4 into main Oct 2, 2020
@angela97lin angela97lin deleted the 1237_ohe_filter branch October 2, 2020 16:52
@dsherry dsherry mentioned this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OneHotEncoder: allow user to filter which features to encode

2 participants