Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-hot encoder: fix bug with column names, so all data types are supported #897

merged 3 commits into from Jun 30, 2020


Copy link

@dsherry dsherry commented Jun 29, 2020

Fix #882

The bug was that sklearn's one-hot encoder's get_feature_names method requires the input to be str, but in the case when the input dataframe has a non-string index, we were trying to pass that index in, causing an error.

The fix is to first convert all the column names to str, then pass them into get_feature_names.

Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #897 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #897   +/-   ##
  Coverage   99.78%   99.78%           
  Files         195      195           
  Lines        8950     8966   +16     
+ Hits         8931     8947   +16     
  Misses         19       19           
Impacted Files Coverage Δ
...components/transformers/encoders/ 100.00% <100.00%> (ø)
...alml/tests/component_tests/ 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24cc793...b0f1ca4. Read the comment docs.

@dsherry dsherry marked this pull request as ready for review Jun 29, 2020
Copy link

@angela97lin angela97lin left a comment

LGTM! Glad we got this in 😄

@dsherry dsherry merged commit cf6ad68 into master Jun 30, 2020
2 checks passed
@dsherry dsherry deleted the ds_882_ohe_bug branch Jun 30, 2020
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

One-hot encoder breaks for any string/categorical-typed inputs
2 participants