Skip to content

Allow users to directly set feature output column names and save during serialization#2142

Merged
thehomebrewnerd merged 14 commits intoserialization-updatesfrom
custom-feature-col-names
Jun 27, 2022
Merged

Allow users to directly set feature output column names and save during serialization#2142
thehomebrewnerd merged 14 commits intoserialization-updatesfrom
custom-feature-col-names

Conversation

@thehomebrewnerd
Copy link
Contributor

Allow users to directly set feature output column names and save during serialization

Adds a set_feature_names method to FeatureBase to directly allow for setting of feature column names to override the default name. Also updates serialization to save and restore these names.

@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #2142 (7626192) into serialization-updates (97b93f0) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@                    Coverage Diff                    @@
##           serialization-updates    #2142      +/-   ##
=========================================================
+ Coverage                  99.21%   99.23%   +0.01%     
=========================================================
  Files                        143      143              
  Lines                      16912    17033     +121     
=========================================================
+ Hits                       16780    16902     +122     
+ Misses                       132      131       -1     
Impacted Files Coverage Δ
featuretools/feature_base/feature_base.py 97.87% <100.00%> (+0.33%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 100.00% <100.00%> (ø)
...retools/tests/primitive_tests/test_feature_base.py 100.00% <100.00%> (ø)
...ests/primitive_tests/test_feature_serialization.py 99.60% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97b93f0...7626192. Read the comment docs.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?

)

direct_name = ["device_direct"]
direct_feat.set_feature_names(direct_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for single output features, do we want set_feature_names and rename to both work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - are you suggesting that we just allow set_feature_names to work for multi-output features and renaming feature columns for single output should be done through rename as you can be done currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think about it I think I like the idea of having set_feature_names operate only on multi-output features - seems more consistent with the method name, and would reduce the scope of the change. We could add a check in the method to raise an error if number_output_features == 1.

@rwedge What do you think about that approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thehomebrewnerd I like it. It makes things simpler if only features with number_output_features > 1 can use this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwedge Sounds good. I'll make this change and add the additional test case you recommended.

@thehomebrewnerd
Copy link
Contributor Author

Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?

@rwedge I added a test that checks the name of a feature created by stacking on a FeatureOutputSlice of a multi-output feature. Also updated tests to include a multi-output GroupbyTransformFeature.

@thehomebrewnerd thehomebrewnerd requested a review from rwedge June 27, 2022 18:49
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
@thehomebrewnerd thehomebrewnerd merged commit 1e7d881 into serialization-updates Jun 27, 2022
@thehomebrewnerd thehomebrewnerd deleted the custom-feature-col-names branch June 27, 2022 21:12
thehomebrewnerd added a commit that referenced this pull request Jun 30, 2022
…2136)

* serialization and deserialization improvements

* add pr number

* Improve feature deserialization to use common primitive instances (#2127)

* update feature deserialization

* update release notes

* lint fix

* fix comment

* fix for list inputs and test cleanup

* remove files

* remove file

* use tmp_path

* Allow users to directly set feature output column names and save during serialization (#2142)

* add set_feature_names method

* initial serialization updates

* update release notes

* fix tests

* code cleanup - only store names when set

* only use on multi-output features

* fix test

* lint fix

* remove unused functions

* add more test cases

* update serialization test

* Update featuretools/feature_base/feature_base.py

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

* Refactor feature serialization to avoid storing duplicate primitive information (#2144)

* initial refactor of serialization

* update release notes

* lint and remove breakpoint

* new approach without hash

* refactor and update tests

* remove comment

* update s3 file

* update feature base args

* lint fix

* misc cleanup

* remove extra str casting

* fix spelling error

* remove instance cache

* update save and load docstring examples

* lint fix

* more docstring cleanup

* update release notes

* tweak serialization

* update json

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
@ozzieD ozzieD mentioned this pull request Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants