Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change normalize_entity to update secondary_time_index #59

Merged
merged 8 commits into from
Jan 11, 2018

Conversation

Seth-Rothschild
Copy link
Contributor

For issue #58 the secondary_time_index attribute of an entityset isn't properly updated if it's set explicitly with new_entity_secondary_tiime_index in normalize_entity. This is a fix but there might be a cleaner one.

@codecov-io
Copy link

codecov-io commented Jan 4, 2018

Codecov Report

Merging #59 into master will increase coverage by 0.16%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #59      +/-   ##
==========================================
+ Coverage   87.19%   87.36%   +0.16%     
==========================================
  Files          74       74              
  Lines        6973     6986      +13     
==========================================
+ Hits         6080     6103      +23     
+ Misses        893      883      -10
Impacted Files Coverage Δ
tests/entityset_tests/test_pandas_es.py 99.74% <0%> (ø) ⬆️
entityset/entity.py 76.9% <0%> (+0.29%) ⬆️
entityset/entityset.py 90.02% <0%> (+2.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8919f2...a0457f5. Read the comment docs.

@kmax12
Copy link
Contributor

kmax12 commented Jan 4, 2018

@Seth-Rothschild can you write a test for this?

@kmax12
Copy link
Contributor

kmax12 commented Jan 4, 2018

@Seth-Rothschild thanks for writing the test. this looks good to me to merge. any last concerns?

@Seth-Rothschild
Copy link
Contributor Author

The only thing I'm worried about is the standard format for the secondary time index. Right now it's set explicitly as {'column_name': 'column_name'} both in normalize_entity and the test, but since it's a dictionary I don't necessarily trust that's the only form it can take.

es = entityset
es.normalize_entity('log', 'values', 'value',
make_time_index=True,
make_secondary_time_index={'datetime': []},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a dictionary mapping a date column to be used as the secondary time index column, to all the columns that became "known" or were recorded in the data at that time. To test this functionality (which I can see would not get handled properly) you should include an additional column to get added to the secondary time index of the new entity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was using the secondary_time_index to create a "time_last" time index, but I'll pull along other columns in the test.

@@ -801,6 +802,8 @@ def normalize_entity(self, base_entity_id, new_entity_id, index,
self.delete_entity_variables(base_entity_id, additional_variables)

new_entity = self.entity_stores[new_entity_id]
if make_secondary_time_index:
new_entity.secondary_time_index = {secondary_time_index: [secondary_time_index]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the logic in BaseEntity.__init__(). You'll want to append all the variables included in the values of the make_secondary_time_index dict here, except for the time index itself whose name changed and is already in the list. To be complete, we should probably allow for multiple secondary_time_index's too, right? The dictionary format allows an arbitrary number of them. In this case the new_entity_secondary_time_index parameter to normalize_entity() would need to be a dictionary mapping the column names in the base entity to the ones you want in the new entity.

Copy link
Contributor Author

@Seth-Rothschild Seth-Rothschild Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I'm understanding correctly, these two lines should be changed to

new_entity.secondary_time_index = secondary_time_index or {}
for ti, cols in new_entity.secondary_time_index.items():
    if ti not in cols:
        cols.append(ti)

It seems like multiple secondary time index is also asserted against with

assert len(make_secondary_time_index) == 1, "Can only provide 1 secondary time index"

in entityset.py.

@Seth-Rothschild
Copy link
Contributor Author

@bschreck I think this is is probably clean enough with the latest commit. We might consider making a long term issue to handle multiple secondary time indices, but that seems out of scope for this fix.

@bschreck
Copy link
Contributor

Yup I agree!

@kmax12
Copy link
Contributor

kmax12 commented Jan 11, 2018

looks good to merge to me!

@kmax12 kmax12 merged commit ec9593d into master Jan 11, 2018
@Seth-Rothschild Seth-Rothschild deleted the second_time_index_fix branch January 11, 2018 21:33
@rwedge rwedge mentioned this pull request Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants