Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare entities by ID in DFS #637

Merged
merged 13 commits into from Jul 8, 2019
3 changes: 2 additions & 1 deletion docs/source/changelog.rst
Expand Up @@ -5,6 +5,7 @@ Changelog
**Future Release**
* Enhancements
* Fixes
* Fix performance regression in DFS (:pr:`637`)
* Changes
* Moved dask, distributed imports (:pr:`634`)
* Documentation Changes
Expand All @@ -13,7 +14,7 @@ Changelog
* Miscellaneous changes ()

Thanks to the following people for contributing to this release:
:user:`rwedge`, :user:`gsheni`
:user:`rwedge`, :user:`gsheni`, :user:`CJStadler`

**v0.9.1 July 3, 2019**
* Enhancements
Expand Down
5 changes: 2 additions & 3 deletions featuretools/entityset/entity.py
Expand Up @@ -117,9 +117,8 @@ def __eq__(self, other, deep=False):
return False
if len(self.variables) != len(other.variables):
return False
for v in self.variables:
if v not in other.variables:
return False
if set(self.variables) != set(other.variables):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed most of the places where this would be hit, but I thought I might as well make this comparison faster while we're looking at it.

return False
if deep:
if self.last_time_index is None and other.last_time_index is not None:
return False
Expand Down
3 changes: 2 additions & 1 deletion featuretools/entityset/relationship.py
Expand Up @@ -108,7 +108,8 @@ def _is_unique(self):
"""Is there any other relationship with same parent and child entities?"""
es = self.child_entity.entityset
relationships = es.get_forward_relationships(self._child_entity_id)
n = len([r for r in relationships if r.parent_entity == self.parent_entity])
n = len([r for r in relationships
if r._parent_entity_id == self._parent_entity_id])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case wasn't causing the observed performance drop, but I also don't see any reason to compare the actual entities here.


assert n > 0, 'This relationship is missing from the entityset'

Expand Down
8 changes: 4 additions & 4 deletions featuretools/feature_base/feature_base.py
Expand Up @@ -397,10 +397,10 @@ def __init__(self, base_feature, child_entity, relationship=None, name=None):
def _handle_relationship(self, child_entity, relationship):
if relationship:
relationship_child = relationship.child_entity
assert child_entity == relationship_child, \
assert child_entity.id == relationship_child.id, \
'child_entity must be the relationship child entity'

assert self.parent_entity == relationship.parent_entity, \
assert self.parent_entity.id == relationship.parent_entity.id, \
'Base feature must be defined on the relationship parent entity'
else:
child_relationships = child_entity.entityset.get_forward_relationships(child_entity.id)
Expand Down Expand Up @@ -528,11 +528,11 @@ def _handle_relationship_path(self, parent_entity, relationship_path):

_is_forward, first_relationship = relationship_path[0]
first_parent = first_relationship.parent_entity
assert parent_entity == first_parent, \
assert parent_entity.id == first_parent.id, \
'parent_entity must match first relationship in path.'

_is_forward, last_relationship = relationship_path[-1]
assert self.child_entity == last_relationship.child_entity, \
assert self.child_entity.id == last_relationship.child_entity.id, \
'Base feature must be defined on the entity at the end of relationship_path'

path_is_unique = parent_entity.entityset \
Expand Down
3 changes: 3 additions & 0 deletions featuretools/variable_types/variable.py
Expand Up @@ -43,6 +43,9 @@ def __eq__(self, other):
self.id == other.id and \
self.entity_id == other.entity_id

def __hash__(self):
return hash((self.id, self.entity_id))

def __repr__(self):
ret = u"<Variable: {} (dtype = {})>".format(self.name, self.type_string)

Expand Down