-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relationship_path to agg and direct features #544
Conversation
Codecov Report
@@ Coverage Diff @@
## master #544 +/- ##
==========================================
+ Coverage 96.35% 96.46% +0.11%
==========================================
Files 116 117 +1
Lines 9234 9511 +277
==========================================
+ Hits 8897 9175 +278
+ Misses 337 336 -1
Continue to review full report at Codecov.
|
6c9b644
to
79a3e41
Compare
This adds an optional constructor parameter, and stores the relationship_path on the feature. If no relationship_path is passed to the constructor we search for a path between the target entity and base entity. If none or more than one is found we raise an error. When there are more than one possible paths between entities the feature name will include the full path. Also add parent_name and child_name to Relationship. This does not update any other logic to use the paths, as this will be added in later commits.
Since we need to detect nodes to which there are multiple paths, we only want to stop recursion the second time we visit a node.
Comparing the entities directly is very expensive, and this was significantly slowing down dfs.
So that the caller can access it without needing to check the feature type.
bb5d1c1
to
3d087ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the overall approach looks good. I left a few comments about methods that were implemented that look similar to stuff we already have. In particular, I wonder if there is overlap with the path find logic and the methods we already have on EntitySet
.
featuretools/tests/computational_backend/test_pandas_backend.py
Outdated
Show resolved
Hide resolved
And add logic for detecting old version.
- Remove description_to_relationship and relationship_to_description. - Rename get_arguments to to_dictionary.
And the same for the backward version. The new methods are generators which yield each path. This allows us to stop searching if we only need one path. Note that we ignore paths with cycles, meaning that a path may be said to be unique if the only other paths contain cycles.
This can be inferred from relationship_path, but we don't want to change the API.
Between the same entities.
This is no reason to override the parent method here.
These were previously overlooked, and untested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one minor comment, but otherwise I think this looks good!
This is equivalent, but makes the flow clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Go ahead and merge!
This adds an optional constructor parameter, and stores the
relationship_path
on the feature. If norelationship_path
is passed to the constructor we search for a path between the target entity and base entity. If none or more than one is found we raise an error. When there are more than one possible paths between entities the feature name will include the full path.Also add
parent_name
andchild_name
toRelationship
.This does not update any other logic to use the paths, as this will be added in later commits.
See #543 for context.