Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary sorting from normalize_entity #535

Merged
merged 3 commits into from
May 9, 2019

Conversation

CJStadler
Copy link
Contributor

We can assume the base entity dataframe is sorted by time index, so if
we are using the same time index for the new entity then there is no
need to sort.

We can assume the base entity dataframe is sorted by time index, so if
we are using the same time index for the new entity then there is no
need to sort.
@@ -730,7 +730,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index,

if isinstance(make_time_index, str):
base_time_index = make_time_index
new_entity_time_index = base_entity[make_time_index].id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I can see for this is to check that the make_time_index exists in the base entity (if it doesn't then this would raise). If that should be checked here then I think it would be better to add an explicit assertion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's add an explicit assertion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is already checked when the entity is created:
https://github.com/Featuretools/featuretools/blob/4b6a5a4d38de94a64c08ad15b5b8cefa9954b6df/featuretools/entityset/entity.py#L574-L575
Is it worthwhile to add an earlier check?

Copy link
Contributor

@rwedge rwedge May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split this off to a separate issue to be dealt with later. Looks like there are two situations where we could improve the error message:

  • Specifying an existing column for make_time_index but not adding that column to additional_variables or copy_variables
  • Setting make_time_index to a column that doesn't exist in the base frame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I can open an issue.

@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

Merging #535 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #535      +/-   ##
==========================================
+ Coverage   96.26%   96.26%   +<.01%     
==========================================
  Files         114      114              
  Lines        9253     9258       +5     
==========================================
+ Hits         8907     8912       +5     
  Misses        346      346
Impacted Files Coverage Δ
featuretools/entityset/entityset.py 95.08% <100%> (+0.02%) ⬆️
featuretools/tests/entityset_tests/test_es.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b6a5a4...ba30d54. Read the comment docs.

If make_time_index=True then the base time index will be used, so the df
is already sorted, but it will be given a new name.
@@ -790,6 +792,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index,
new_entity_id,
new_entity_df,
index,
already_sorted=already_sorted,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the normalize_entityset tests to confirm that created time indexes are sorted

@CJStadler CJStadler requested a review from rwedge May 9, 2019 18:43
Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@CJStadler CJStadler merged commit a7fcbd8 into master May 9, 2019
@CJStadler CJStadler deleted the remove-normalize-sort branch May 9, 2019 18:46
@rwedge rwedge mentioned this pull request May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants