Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove list of times as on option for cutoff_time in calculate_feature_matrix #165

Merged
merged 2 commits into from Jun 8, 2018

Conversation

rwedge
Copy link
Contributor

@rwedge rwedge commented Jun 7, 2018

Using a list of times for cutoff_time was creating some unexpected behavior if there wasn't an instance_ids list of the same length. Some examples:

ft.calculate_feature_matrix(features=features,
                            entityset=es,
                            instance_ids=[0, 1, 2],
                            cutoff_time=[0, 1])

This calculates features for instance 0 at time 0, instance 1 at time 1, and doesn't calculate features for instance 2.

ft.calculate_feature_matrix(features=features,
                            entityset=es,
                            instance_ids=[0, 1],
                            cutoff_time=[0, 1, 2])

This calculates features for instance 0 at time 0, instance 1 at time 1, and not use time 2.

ft.calculate_feature_matrix(features=features,
                            entityset=es,
                            cutoff_time=[0, 1])

This would calculate features for the first instance in the entity's dataframe at time 0 and calculate features for the second instance in the entity's dataframe at time 1.

While requiring a list of instance_ids with the same length as the list of cutoff times is one way to resolve this issue, equal length lists of instances and times can easily be represented in DataFrame format and would leave one less input case to handle.

@codecov-io
Copy link

codecov-io commented Jun 7, 2018

Codecov Report

Merging #165 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
+ Coverage    92.9%   92.92%   +0.01%     
==========================================
  Files          72       72              
  Lines        7753     7785      +32     
==========================================
+ Hits         7203     7234      +31     
- Misses        550      551       +1
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 97.87% <100%> (ø) ⬆️
...utational_backend/test_calculate_feature_matrix.py 99.58% <100%> (+0.02%) ⬆️
featuretools/utils/gen_utils.py 64.44% <0%> (-2.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aabdfb9...a09c760. Read the comment docs.

@kmax12
Copy link
Contributor

kmax12 commented Jun 8, 2018

Looks good to me!

@kmax12 kmax12 merged commit 80286b6 into master Jun 8, 2018
@rwedge rwedge mentioned this pull request Jun 22, 2018
@rwedge rwedge deleted the no-cutoff-time-list branch June 10, 2019 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants