Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relationship_path to agg and direct features #544

Merged
merged 24 commits into from
May 29, 2019

Conversation

CJStadler
Copy link
Contributor

This adds an optional constructor parameter, and stores the relationship_path on the feature. If no relationship_path is passed to the constructor we search for a path between the target entity and base entity. If none or more than one is found we raise an error. When there are more than one possible paths between entities the feature name will include the full path.

Also add parent_name and child_name to Relationship.

This does not update any other logic to use the paths, as this will be added in later commits.

See #543 for context.

@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #544 into master will increase coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.35%   96.46%   +0.11%     
==========================================
  Files         116      117       +1     
  Lines        9234     9511     +277     
==========================================
+ Hits         8897     9175     +278     
+ Misses        337      336       -1
Impacted Files Coverage Δ
...eaturetools/computational_backends/feature_tree.py 100% <ø> (ø) ⬆️
...etools/tests/entityset_tests/test_serialization.py 100% <ø> (ø) ⬆️
...tests/computational_backend/test_pandas_backend.py 100% <ø> (ø) ⬆️
featuretools/synthesis/deep_feature_synthesis.py 96.76% <100%> (ø) ⬆️
featuretools/entityset/entityset.py 94.97% <100%> (+0.11%) ⬆️
.../tests/primitive_tests/test_features_serializer.py 100% <100%> (ø) ⬆️
featuretools/feature_base/features_deserializer.py 98.07% <100%> (+0.11%) ⬆️
featuretools/entityset/deserialize.py 100% <100%> (ø) ⬆️
featuretools/entityset/relationship.py 97.67% <100%> (+2.43%) ⬆️
featuretools/entityset/serialize.py 100% <100%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c0d698...d86b9dc. Read the comment docs.

@CJStadler CJStadler force-pushed the feature-relationship-paths branch from 6c9b644 to 79a3e41 Compare May 16, 2019 14:53
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
@kmax12 kmax12 self-requested a review May 16, 2019 15:57
CJStadler added 4 commits May 20, 2019 15:13
This adds an optional constructor parameter, and stores the
relationship_path on the feature. If no relationship_path is passed to
the constructor we search for a path between the target entity and base
entity. If none or more than one is found we raise an error.  When there
are more than one possible paths between entities the feature name will
include the full path.

Also add parent_name and child_name to Relationship.

This does not update any other logic to use the paths, as this will be
added in later commits.
Since we need to detect nodes to which there are multiple paths, we only
want to stop recursion the second time we visit a node.
Comparing the entities directly is very expensive, and this was
significantly slowing down dfs.
So that the caller can access it without needing to check the feature
type.
@CJStadler CJStadler force-pushed the feature-relationship-paths branch from bb5d1c1 to 3d087ea Compare May 20, 2019 19:14
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall approach looks good. I left a few comments about methods that were implemented that look similar to stuff we already have. In particular, I wonder if there is overlap with the path find logic and the methods we already have on EntitySet.

featuretools/entityset/relationship.py Outdated Show resolved Hide resolved
featuretools/entityset/relationship.py Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/entityset/relationship.py Show resolved Hide resolved
CJStadler added 10 commits May 22, 2019 09:41
And add logic for detecting old version.
- Remove description_to_relationship and relationship_to_description.
- Rename get_arguments to to_dictionary.
And the same for the backward version.

The new methods are generators which yield each path. This allows us to
stop searching if we only need one path.

Note that we ignore paths with cycles, meaning that a path may be said
to be unique if the only other paths contain cycles.
@CJStadler CJStadler requested a review from kmax12 May 22, 2019 20:52
featuretools/feature_base/feature_base.py Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/primitives/standard/aggregation_primitives.py Outdated Show resolved Hide resolved
featuretools/tests/entityset_tests/test_es_metadata.py Outdated Show resolved Hide resolved
featuretools/feature_base/feature_base.py Outdated Show resolved Hide resolved
featuretools/entityset/entityset.py Outdated Show resolved Hide resolved
@CJStadler CJStadler requested a review from kmax12 May 28, 2019 19:41
These were previously overlooked, and untested.
kmax12
kmax12 previously approved these changes May 28, 2019
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment, but otherwise I think this looks good!

featuretools/feature_base/feature_base.py Show resolved Hide resolved
This is equivalent, but makes the flow clearer.
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Go ahead and merge!

@CJStadler CJStadler merged commit ecd722c into master May 29, 2019
@CJStadler CJStadler deleted the feature-relationship-paths branch May 29, 2019 17:07
@rwedge rwedge mentioned this pull request Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants