Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to directly set feature output column names and save during serialization #2142

Merged
merged 14 commits into from
Jun 27, 2022

Conversation

thehomebrewnerd
Copy link
Contributor

Allow users to directly set feature output column names and save during serialization

Adds a set_feature_names method to FeatureBase to directly allow for setting of feature column names to override the default name. Also updates serialization to save and restore these names.

@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #2142 (7626192) into serialization-updates (97b93f0) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@                    Coverage Diff                    @@
##           serialization-updates    #2142      +/-   ##
=========================================================
+ Coverage                  99.21%   99.23%   +0.01%     
=========================================================
  Files                        143      143              
  Lines                      16912    17033     +121     
=========================================================
+ Hits                       16780    16902     +122     
+ Misses                       132      131       -1     
Impacted Files Coverage Δ
featuretools/feature_base/feature_base.py 97.87% <100.00%> (+0.33%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 100.00% <100.00%> (ø)
...retools/tests/primitive_tests/test_feature_base.py 100.00% <100.00%> (ø)
...ests/primitive_tests/test_feature_serialization.py 99.60% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97b93f0...7626192. Read the comment docs.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?

)

direct_name = ["device_direct"]
direct_feat.set_feature_names(direct_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for single output features, do we want set_feature_names and rename to both work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - are you suggesting that we just allow set_feature_names to work for multi-output features and renaming feature columns for single output should be done through rename as you can be done currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think about it I think I like the idea of having set_feature_names operate only on multi-output features - seems more consistent with the method name, and would reduce the scope of the change. We could add a check in the method to raise an error if number_output_features == 1.

@rwedge What do you think about that approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thehomebrewnerd I like it. It makes things simpler if only features with number_output_features > 1 can use this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwedge Sounds good. I'll make this change and add the additional test case you recommended.

@thehomebrewnerd
Copy link
Contributor Author

Can we test that a feature stacked on a renamed column from a feature with multiple outputs uses the renamed name?

@rwedge I added a test that checks the name of a feature created by stacking on a FeatureOutputSlice of a multi-output feature. Also updated tests to include a multi-output GroupbyTransformFeature.

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
@thehomebrewnerd thehomebrewnerd merged commit 1e7d881 into serialization-updates Jun 27, 2022
@thehomebrewnerd thehomebrewnerd deleted the custom-feature-col-names branch June 27, 2022 21:12
thehomebrewnerd added a commit that referenced this pull request Jun 30, 2022
…2136)

* serialization and deserialization improvements

* add pr number

* Improve feature deserialization to use common primitive instances (#2127)

* update feature deserialization

* update release notes

* lint fix

* fix comment

* fix for list inputs and test cleanup

* remove files

* remove file

* use tmp_path

* Allow users to directly set feature output column names and save during serialization (#2142)

* add set_feature_names method

* initial serialization updates

* update release notes

* fix tests

* code cleanup - only store names when set

* only use on multi-output features

* fix test

* lint fix

* remove unused functions

* add more test cases

* update serialization test

* Update featuretools/feature_base/feature_base.py

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

* Refactor feature serialization to avoid storing duplicate primitive information (#2144)

* initial refactor of serialization

* update release notes

* lint and remove breakpoint

* new approach without hash

* refactor and update tests

* remove comment

* update s3 file

* update feature base args

* lint fix

* misc cleanup

* remove extra str casting

* fix spelling error

* remove instance cache

* update save and load docstring examples

* lint fix

* more docstring cleanup

* update release notes

* tweak serialization

* update json

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
@ozzieD ozzieD mentioned this pull request Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants