-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary sorting from normalize_entity #535
Conversation
We can assume the base entity dataframe is sorted by time index, so if we are using the same time index for the new entity then there is no need to sort.
@@ -730,7 +730,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index, | |||
|
|||
if isinstance(make_time_index, str): | |||
base_time_index = make_time_index | |||
new_entity_time_index = base_entity[make_time_index].id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I can see for this is to check that the make_time_index
exists in the base entity (if it doesn't then this would raise). If that should be checked here then I think it would be better to add an explicit assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's add an explicit assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is already checked when the entity is created:
https://github.com/Featuretools/featuretools/blob/4b6a5a4d38de94a64c08ad15b5b8cefa9954b6df/featuretools/entityset/entity.py#L574-L575
Is it worthwhile to add an earlier check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's split this off to a separate issue to be dealt with later. Looks like there are two situations where we could improve the error message:
- Specifying an existing column for
make_time_index
but not adding that column toadditional_variables
orcopy_variables
- Setting
make_time_index
to a column that doesn't exist in the base frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I can open an issue.
Codecov Report
@@ Coverage Diff @@
## master #535 +/- ##
==========================================
+ Coverage 96.26% 96.26% +<.01%
==========================================
Files 114 114
Lines 9253 9258 +5
==========================================
+ Hits 8907 8912 +5
Misses 346 346
Continue to review full report at Codecov.
|
If make_time_index=True then the base time index will be used, so the df is already sorted, but it will be given a new name.
@@ -790,6 +792,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index, | |||
new_entity_id, | |||
new_entity_df, | |||
index, | |||
already_sorted=already_sorted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the normalize_entityset
tests to confirm that created time indexes are sorted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
We can assume the base entity dataframe is sorted by time index, so if
we are using the same time index for the new entity then there is no
need to sort.