Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature and primitive serialization and deserialization improvements #2136

Merged
merged 23 commits into from
Jun 30, 2022

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Jun 23, 2022

Serialization and deserialization improvements

Implements several changes to improve serialization and deserialization of Features:

  • Uses a common primitive object when deserializing features that use the same primitive rather than creating separate primitive instances for each primitive (Improve feature deserialization to use common primitive instances #2127)
  • Serialize each primitive only once rather than saving duplicate primitive information for features that use the same primitive
  • Allow users to easily rename Feature output column names and serialize/deserialize these custom names properly.

@thehomebrewnerd thehomebrewnerd marked this pull request as draft June 23, 2022 13:27
@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #2136 (2ba8139) into main (3c99fad) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2136      +/-   ##
==========================================
+ Coverage   99.22%   99.23%   +0.01%     
==========================================
  Files         143      143              
  Lines       16937    17199     +262     
==========================================
+ Hits        16805    17068     +263     
+ Misses        132      131       -1     
Impacted Files Coverage Δ
...aturetools/tests/primitive_tests/test_agg_feats.py 99.51% <ø> (-0.01%) ⬇️
...ools/tests/primitive_tests/test_direct_features.py 100.00% <ø> (ø)
...imitive_tests/test_groupby_transform_primitives.py 100.00% <ø> (ø)
...ls/tests/primitive_tests/test_identity_features.py 100.00% <ø> (ø)
...s/tests/primitive_tests/test_transform_features.py 99.86% <ø> (-0.01%) ⬇️
featuretools/feature_base/feature_base.py 97.85% <100.00%> (+0.32%) ⬆️
featuretools/feature_base/features_deserializer.py 100.00% <100.00%> (ø)
featuretools/feature_base/features_serializer.py 100.00% <100.00%> (ø)
featuretools/primitives/utils.py 99.60% <100.00%> (+<0.01%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 100.00% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c99fad...2ba8139. Read the comment docs.

thehomebrewnerd and others added 5 commits June 23, 2022 10:29
)

* update feature deserialization

* update release notes

* lint fix

* fix comment

* fix for list inputs and test cleanup

* remove files

* remove file

* use tmp_path
…ng serialization (#2142)

* add set_feature_names method

* initial serialization updates

* update release notes

* fix tests

* code cleanup - only store names when set

* only use on multi-output features

* fix test

* lint fix

* remove unused functions

* add more test cases

* update serialization test

* Update featuretools/feature_base/feature_base.py

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
…nformation (#2144)

* initial refactor of serialization

* update release notes

* lint and remove breakpoint

* new approach without hash

* refactor and update tests

* remove comment

* update s3 file

* update feature base args

* lint fix

* misc cleanup

* remove extra str casting
@thehomebrewnerd thehomebrewnerd changed the title [DRAFT] Feature and primitive serialization and deserialization improvements Feature and primitive serialization and deserialization improvements Jun 29, 2022
@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review June 29, 2022 16:21
featuretools/feature_base/feature_base.py Show resolved Hide resolved
featuretools/primitives/utils.py Show resolved Hide resolved
featuretools/primitives/utils.py Outdated Show resolved Hide resolved
@gsheni gsheni requested review from ozzieD and rwedge June 30, 2022 15:40
Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention set_feature_names in the release notes as an improvement

@thehomebrewnerd
Copy link
Contributor Author

Let's mention set_feature_names in the release notes as an improvement

Separated that PR out into a separate entry in the Enhancements section instead of lumping it in with the other changes in the Changes section.

ozzieD
ozzieD previously approved these changes Jun 30, 2022
@thehomebrewnerd thehomebrewnerd enabled auto-merge (squash) June 30, 2022 18:30
@thehomebrewnerd thehomebrewnerd merged commit 8c55cfb into main Jun 30, 2022
@thehomebrewnerd thehomebrewnerd deleted the serialization-updates branch June 30, 2022 18:54
@ozzieD ozzieD mentioned this pull request Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants