Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved handling of integer time index #128

Merged
merged 27 commits into from Apr 6, 2018
Merged

Conversation

rwedge
Copy link
Contributor

@rwedge rwedge commented Apr 2, 2018

Addresses #99 and #125.

  • Time values in cutoff_time that are not numeric or datetime will raise an error
  • If target entity has an integer time index, feature matrix will also have integer time index
  • Extra columns in cutoff_time will now pass through correctly when using integer time index
  • Incompatible time index and cutoff time types will raise an error

@codecov-io
Copy link

codecov-io commented Apr 2, 2018

Codecov Report

Merging #128 into master will increase coverage by 0.35%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   88.47%   88.83%   +0.35%     
==========================================
  Files          73       73              
  Lines        7558     7730     +172     
==========================================
+ Hits         6687     6867     +180     
+ Misses        871      863       -8
Impacted Files Coverage Δ
featuretools/tests/testing_utils/mock_ds.py 87.02% <100%> (+1.07%) ⬆️
featuretools/variable_types/variable.py 79.06% <100%> (+0.24%) ⬆️
...computational_backends/calculate_feature_matrix.py 98.63% <100%> (+0.04%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 99.11% <100%> (+0.18%) ⬆️
...aturetools/tests/entityset_tests/test_pandas_es.py 99.76% <100%> (+0.03%) ⬆️
featuretools/tests/dfs_tests/test_dfs_method.py 98.07% <100%> (+0.03%) ⬆️
featuretools/entityset/entity.py 79.37% <100%> (+1.3%) ⬆️
featuretools/utils/wrangle.py 53.5% <100%> (+3.04%) ⬆️
featuretools/entityset/base_entityset.py 91.16% <100%> (+0.04%) ⬆️
featuretools/entityset/entityset.py 90.26% <100%> (-0.03%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59fc551...4fa9090. Read the comment docs.

if isinstance(group[target_time].iloc[0], _numeric_types):
grouped = [[np.inf, group]]
else:
grouped = [[datetime.now(), group]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this datetime.now() outside of calc_results. basically, we want every chunk that's part of a single cfm call to use the same time.

@@ -591,25 +591,19 @@ def _filter_and_sort(self, df, time_last=None,
"""
if self.time_index:
if time_last is not None and not df.empty:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for time last to be none by this point in the code? If not, let's remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entity.get_all_instances and a few others call this without specifying a time last

@@ -110,6 +113,11 @@ def calculate_feature_matrix(features, cutoff_time=None, instance_ids=None,
if not isinstance(cutoff_time, pd.DataFrame):
if cutoff_time is None:
cutoff_time = datetime.now()
# if integer time index, use max value as cutoff time instead
if target_entity.time_index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the target entity might not have a time index, but some other entity in the entityset might. perhaps it makes sense to add a method or attribute to an entityset that returns if it is a datetime or numeric time index.

we should assume all entities in an entityset are using the same units for the time index, so this method could do the check when being initialized or when set_time_index is called.

@@ -291,6 +291,11 @@ class TimeIndex(Variable):
_dtype_repr = "time_index"


class NumericTimeIndex(TimeIndex, Numeric):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add to API reference in documentation?

@kmax12
Copy link
Contributor

kmax12 commented Apr 6, 2018

Looks great. Merging

@kmax12 kmax12 changed the title Integer time index bugfixes Improved handling of integer time index Apr 6, 2018
@kmax12 kmax12 merged commit 906777b into master Apr 6, 2018
@rwedge rwedge mentioned this pull request Apr 13, 2018
rwedge added a commit that referenced this pull request Apr 13, 2018
**v0.1.20** Apr 13, 2018
* Improved chunking when calculating feature matrices  (#121)
* Primitives as strings in DFS parameters (#129)
* Integer time index bugfixes (#128)
* Add make_temporal_cutoffs utility function (#126)
* Show all entities, switch shape display to row/col (#124)
* fixed num characters nan fix (#118)
* modify ignore_variables docstring (#117)
@rwedge rwedge deleted the fix_int_time_index_cfm_bug branch June 3, 2019 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants