Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-hot-encode categorical column #22

Merged
merged 3 commits into from
Nov 9, 2017
Merged

Conversation

michcio1234
Copy link

Previously it was impossible.

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #22 into master will increase coverage by 0.22%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
+ Coverage   82.73%   82.96%   +0.22%     
==========================================
  Files           6        6              
  Lines         672      681       +9     
==========================================
+ Hits          556      565       +9     
  Misses        116      116
Impacted Files Coverage Δ
sparsity/dask/reshape.py 94.73% <ø> (ø) ⬆️
sparsity/sparse_frame.py 85.6% <91.66%> (+0.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b1d05f...5c3ceb3. Read the comment docs.

@michcio1234
Copy link
Author

@kayibal I'd like to merge it.

Copy link

@kayibal kayibal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm not sure if it is necessary to have this complicated logic to check which categories to use? One could just say if the column is categorical everything is already known and the categories argument could be ignored. But I think it doesn't do much harm either!

@michcio1234
Copy link
Author

Thanks for the comment. You're partially right - if we have categorical column, then everything should be known. But we cannot ignore categories argument, because it actually tells the function which columns should be encoded. And I wouldn't write this complicated logic if tests didn't actually rely on the order of returned one-hot-encoded columns.

@kayibal
Copy link

kayibal commented Nov 9, 2017

Ah ok I understand :)

@michcio1234 michcio1234 merged commit 05867f6 into master Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants