-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare entities by ID in DFS #637
Conversation
This is much faster than fully comparing the entities.
Codecov Report
@@ Coverage Diff @@
## master #637 +/- ##
==========================================
- Coverage 97.43% 97.42% -0.02%
==========================================
Files 118 118
Lines 9538 9539 +1
==========================================
Hits 9293 9293
- Misses 245 246 +1
Continue to review full report at Codecov.
|
@@ -108,7 +108,8 @@ def _is_unique(self): | |||
"""Is there any other relationship with same parent and child entities?""" | |||
es = self.child_entity.entityset | |||
relationships = es.get_forward_relationships(self._child_entity_id) | |||
n = len([r for r in relationships if r.parent_entity == self.parent_entity]) | |||
n = len([r for r in relationships | |||
if r._parent_entity_id == self._parent_entity_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case wasn't causing the observed performance drop, but I also don't see any reason to compare the actual entities here.
Even though |
for v in self.variables: | ||
if v not in other.variables: | ||
return False | ||
if set(self.variables) != set(other.variables): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed most of the places where this would be hit, but I thought I might as well make this comparison faster while we're looking at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
docs/source/changelog.rst
Outdated
@@ -14,6 +14,7 @@ Changelog | |||
* Keep dataframe sorted by time during feature calculation (:pr:`626`) | |||
* Fix bug in encode_features that created duplicate columns of | |||
features with multiple outputs (:pr:`622`) | |||
* Fix performance regression in DFS (:pr:`637`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to Future Release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is much faster than fully comparing the entities.