-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to directly set feature output column names and save during serialization #2142
Allow users to directly set feature output column names and save during serialization #2142
Conversation
Codecov Report
@@ Coverage Diff @@
## serialization-updates #2142 +/- ##
=========================================================
+ Coverage 99.21% 99.23% +0.01%
=========================================================
Files 143 143
Lines 16912 17033 +121
=========================================================
+ Hits 16780 16902 +122
+ Misses 132 131 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?
) | ||
|
||
direct_name = ["device_direct"] | ||
direct_feat.set_feature_names(direct_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for single output features, do we want set_feature_names
and rename
to both work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure - are you suggesting that we just allow set_feature_names
to work for multi-output features and renaming feature columns for single output should be done through rename
as you can be done currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think about it I think I like the idea of having set_feature_names
operate only on multi-output features - seems more consistent with the method name, and would reduce the scope of the change. We could add a check in the method to raise an error if number_output_features == 1
.
@rwedge What do you think about that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thehomebrewnerd I like it. It makes things simpler if only features with number_output_features > 1
can use this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwedge Sounds good. I'll make this change and add the additional test case you recommended.
@rwedge I added a test that checks the name of a feature created by stacking on a |
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
…2136) * serialization and deserialization improvements * add pr number * Improve feature deserialization to use common primitive instances (#2127) * update feature deserialization * update release notes * lint fix * fix comment * fix for list inputs and test cleanup * remove files * remove file * use tmp_path * Allow users to directly set feature output column names and save during serialization (#2142) * add set_feature_names method * initial serialization updates * update release notes * fix tests * code cleanup - only store names when set * only use on multi-output features * fix test * lint fix * remove unused functions * add more test cases * update serialization test * Update featuretools/feature_base/feature_base.py Co-authored-by: Roy Wedge <roy.wedge@alteryx.com> Co-authored-by: Roy Wedge <roy.wedge@alteryx.com> * Refactor feature serialization to avoid storing duplicate primitive information (#2144) * initial refactor of serialization * update release notes * lint and remove breakpoint * new approach without hash * refactor and update tests * remove comment * update s3 file * update feature base args * lint fix * misc cleanup * remove extra str casting * fix spelling error * remove instance cache * update save and load docstring examples * lint fix * more docstring cleanup * update release notes * tweak serialization * update json Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
Allow users to directly set feature output column names and save during serialization
Adds a
set_feature_names
method toFeatureBase
to directly allow for setting of feature column names to override the default name. Also updates serialization to save and restore these names.