New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep non-feature columns when encoding feature matrix #111
Conversation
Codecov Report
@@ Coverage Diff @@
## master #111 +/- ##
=========================================
+ Coverage 88.24% 88.3% +0.05%
=========================================
Files 73 73
Lines 7412 7447 +35
=========================================
+ Hits 6541 6576 +35
Misses 871 871
Continue to review full report at Codecov.
|
@@ -65,6 +65,8 @@ def encode_features(feature_matrix, features, top_n=10, include_unknown=True, | |||
X = feature_matrix.copy() | |||
|
|||
encoded = [] | |||
feature_names = [f.get_name() for f in features] | |||
extra_columns = [col for col in X.columns if col not in feature_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make this safer, can we add in a check that all the features you pass in are columns in the feature matrix? with this change, we might silence issues where the feature matrix and features don't match
looks good to me. merging in. |
#78 added the option of including non-feature columns with the calculated feature matrix, but
encode_features
was not including them when creating an encoded feature matrix. This PR fixesencode_features
so it includes non-feature columns in the encoded feature matrix.