Allow users to directly set feature output column names and save during serialization#2142
Conversation
Codecov Report
@@ Coverage Diff @@
## serialization-updates #2142 +/- ##
=========================================================
+ Coverage 99.21% 99.23% +0.01%
=========================================================
Files 143 143
Lines 16912 17033 +121
=========================================================
+ Hits 16780 16902 +122
+ Misses 132 131 -1
Continue to review full report at Codecov.
|
rwedge
left a comment
There was a problem hiding this comment.
Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?
| ) | ||
|
|
||
| direct_name = ["device_direct"] | ||
| direct_feat.set_feature_names(direct_name) |
There was a problem hiding this comment.
for single output features, do we want set_feature_names and rename to both work?
There was a problem hiding this comment.
Not sure - are you suggesting that we just allow set_feature_names to work for multi-output features and renaming feature columns for single output should be done through rename as you can be done currently?
There was a problem hiding this comment.
As I think about it I think I like the idea of having set_feature_names operate only on multi-output features - seems more consistent with the method name, and would reduce the scope of the change. We could add a check in the method to raise an error if number_output_features == 1.
@rwedge What do you think about that approach?
There was a problem hiding this comment.
@thehomebrewnerd I like it. It makes things simpler if only features with number_output_features > 1 can use this method
There was a problem hiding this comment.
@rwedge Sounds good. I'll make this change and add the additional test case you recommended.
@rwedge I added a test that checks the name of a feature created by stacking on a |
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
…2136) * serialization and deserialization improvements * add pr number * Improve feature deserialization to use common primitive instances (#2127) * update feature deserialization * update release notes * lint fix * fix comment * fix for list inputs and test cleanup * remove files * remove file * use tmp_path * Allow users to directly set feature output column names and save during serialization (#2142) * add set_feature_names method * initial serialization updates * update release notes * fix tests * code cleanup - only store names when set * only use on multi-output features * fix test * lint fix * remove unused functions * add more test cases * update serialization test * Update featuretools/feature_base/feature_base.py Co-authored-by: Roy Wedge <roy.wedge@alteryx.com> Co-authored-by: Roy Wedge <roy.wedge@alteryx.com> * Refactor feature serialization to avoid storing duplicate primitive information (#2144) * initial refactor of serialization * update release notes * lint and remove breakpoint * new approach without hash * refactor and update tests * remove comment * update s3 file * update feature base args * lint fix * misc cleanup * remove extra str casting * fix spelling error * remove instance cache * update save and load docstring examples * lint fix * more docstring cleanup * update release notes * tweak serialization * update json Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
Allow users to directly set feature output column names and save during serialization
Adds a
set_feature_namesmethod toFeatureBaseto directly allow for setting of feature column names to override the default name. Also updates serialization to save and restore these names.