Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare entities by ID in DFS #637

Merged
merged 13 commits into from
Jul 8, 2019
1 change: 1 addition & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Changelog
* Keep dataframe sorted by time during feature calculation (:pr:`626`)
* Fix bug in encode_features that created duplicate columns of
features with multiple outputs (:pr:`622`)
* Fix performance regression in DFS (:pr:`637`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to Future Release

* Changes
* Remove unused variance_selection.py file (:pr:`613`)
* Remove Timedelta data param (:pr:`619`)
Expand Down
5 changes: 2 additions & 3 deletions featuretools/entityset/entity.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,8 @@ def __eq__(self, other, deep=False):
return False
if len(self.variables) != len(other.variables):
return False
for v in self.variables:
if v not in other.variables:
return False
if set(self.variables) != set(other.variables):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed most of the places where this would be hit, but I thought I might as well make this comparison faster while we're looking at it.

return False
if deep:
if self.last_time_index is None and other.last_time_index is not None:
return False
Expand Down
3 changes: 2 additions & 1 deletion featuretools/entityset/relationship.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ def _is_unique(self):
"""Is there any other relationship with same parent and child entities?"""
es = self.child_entity.entityset
relationships = es.get_forward_relationships(self._child_entity_id)
n = len([r for r in relationships if r.parent_entity == self.parent_entity])
n = len([r for r in relationships
if r._parent_entity_id == self._parent_entity_id])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case wasn't causing the observed performance drop, but I also don't see any reason to compare the actual entities here.


assert n > 0, 'This relationship is missing from the entityset'

Expand Down
8 changes: 4 additions & 4 deletions featuretools/feature_base/feature_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,10 +397,10 @@ def __init__(self, base_feature, child_entity, relationship=None, name=None):
def _handle_relationship(self, child_entity, relationship):
if relationship:
relationship_child = relationship.child_entity
assert child_entity == relationship_child, \
assert child_entity.id == relationship_child.id, \
'child_entity must be the relationship child entity'

assert self.parent_entity == relationship.parent_entity, \
assert self.parent_entity.id == relationship.parent_entity.id, \
'Base feature must be defined on the relationship parent entity'
else:
child_relationships = child_entity.entityset.get_forward_relationships(child_entity.id)
Expand Down Expand Up @@ -528,11 +528,11 @@ def _handle_relationship_path(self, parent_entity, relationship_path):

_is_forward, first_relationship = relationship_path[0]
first_parent = first_relationship.parent_entity
assert parent_entity == first_parent, \
assert parent_entity.id == first_parent.id, \
'parent_entity must match first relationship in path.'

_is_forward, last_relationship = relationship_path[-1]
assert self.child_entity == last_relationship.child_entity, \
assert self.child_entity.id == last_relationship.child_entity.id, \
'Base feature must be defined on the entity at the end of relationship_path'

path_is_unique = parent_entity.entityset \
Expand Down
3 changes: 3 additions & 0 deletions featuretools/variable_types/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ def __eq__(self, other):
self.id == other.id and \
self.entity_id == other.entity_id

def __hash__(self):
return hash((self.id, self.entity_id))

def __repr__(self):
ret = u"<Variable: {} (dtype = {})>".format(self.name, self.type_string)

Expand Down