New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop_first attribute added in encode features #647
Conversation
Codecov Report
@@ Coverage Diff @@
## master #647 +/- ##
=========================================
Coverage ? 97.42%
=========================================
Files ? 118
Lines ? 9560
Branches ? 0
=========================================
Hits ? 9314
Misses ? 246
Partials ? 0
Continue to review full report at Codecov.
|
Thank you for the contribution, @ayushpatidar. A couple things
Once you are passing all the github checks, we can review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Once addressed, i believe this is ready to merge!
@@ -23,6 +23,9 @@ def encode_features(feature_matrix, features, top_n=10, include_unknown=True, | |||
defaults to encode all necessary features. | |||
inplace (bool): Encode feature_matrix in place. Defaults to False. | |||
verbose (str): Print progress info. | |||
drop_first (bool): Whether to get l-1 dummies out of l categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use k
instead of l
unique = val_counts.head(top_n).index.tolist() | ||
select_n = top_n | ||
if drop_first: | ||
if len(val_counts) != 1 or top_n != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we remove the conditional and just do these two lines?
select_n = min(len(val_counts), top_n)
select_n = max(select_n - 1, 1) # make sure at least 1 category is selected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmax12
But we have to keep outer conditional statement ( if drop_first ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sorry, I was just referring to inner conditional
es.entity_from_dataframe(entity_id='a', dataframe=df, index='index', make_index=True) | ||
features, feature_defs = dfs(entityset=es, target_entity='a') | ||
features_enc, feature_defs_enc = encode_features(features, feature_defs, | ||
drop_first=True, top_n=10, include_unknown=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to include top_n
in this test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for the contribution!
Pull Request Description
drop_first attribute is added as param in encode_feature() function.
This feature will help user to control on redundant feature in dataframe
Fixes #635
After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rst
to include this pull request.