-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change normalize_entity
to update secondary_time_index
#59
Conversation
Codecov Report
@@ Coverage Diff @@
## master #59 +/- ##
==========================================
+ Coverage 87.19% 87.36% +0.16%
==========================================
Files 74 74
Lines 6973 6986 +13
==========================================
+ Hits 6080 6103 +23
+ Misses 893 883 -10
Continue to review full report at Codecov.
|
@Seth-Rothschild can you write a test for this? |
@Seth-Rothschild thanks for writing the test. this looks good to me to merge. any last concerns? |
The only thing I'm worried about is the standard format for the secondary time index. Right now it's set explicitly as |
es = entityset | ||
es.normalize_entity('log', 'values', 'value', | ||
make_time_index=True, | ||
make_secondary_time_index={'datetime': []}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a dictionary mapping a date column to be used as the secondary time index column, to all the columns that became "known" or were recorded in the data at that time. To test this functionality (which I can see would not get handled properly) you should include an additional column to get added to the secondary time index of the new entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was using the secondary_time_index
to create a "time_last" time index, but I'll pull along other columns in the test.
featuretools/entityset/entityset.py
Outdated
@@ -801,6 +802,8 @@ def normalize_entity(self, base_entity_id, new_entity_id, index, | |||
self.delete_entity_variables(base_entity_id, additional_variables) | |||
|
|||
new_entity = self.entity_stores[new_entity_id] | |||
if make_secondary_time_index: | |||
new_entity.secondary_time_index = {secondary_time_index: [secondary_time_index]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow the logic in BaseEntity.__init__()
. You'll want to append all the variables included in the values of the make_secondary_time_index dict here, except for the time index itself whose name changed and is already in the list. To be complete, we should probably allow for multiple secondary_time_index
's too, right? The dictionary format allows an arbitrary number of them. In this case the new_entity_secondary_time_index
parameter to normalize_entity()
would need to be a dictionary mapping the column names in the base entity to the ones you want in the new entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I'm understanding correctly, these two lines should be changed to
new_entity.secondary_time_index = secondary_time_index or {}
for ti, cols in new_entity.secondary_time_index.items():
if ti not in cols:
cols.append(ti)
It seems like multiple secondary time index is also asserted against with
assert len(make_secondary_time_index) == 1, "Can only provide 1 secondary time index"
in entityset.py
.
@bschreck I think this is is probably clean enough with the latest commit. We might consider making a long term issue to handle multiple secondary time indices, but that seems out of scope for this fix. |
Yup I agree! |
looks good to merge to me! |
For issue #58 the
secondary_time_index
attribute of an entityset isn't properly updated if it's set explicitly withnew_entity_secondary_tiime_index
innormalize_entity
. This is a fix but there might be a cleaner one.