Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move query_by_values to EntitySet #1251

Merged
merged 10 commits into from Dec 15, 2020
Merged

Move query_by_values to EntitySet #1251

merged 10 commits into from Dec 15, 2020

Conversation

thehomebrewnerd
Copy link
Collaborator

Move query_by_values to EntitySet

Moves the query_by_values method from Entity to EntitySet. Also moves the method _vals_to_series to EntitySet as this method was only called from inside query_by_values.

Closes #1244

@codecov
Copy link

codecov bot commented Dec 8, 2020

Codecov Report

Merging #1251 (7e180be) into main (41cb8a4) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1251      +/-   ##
==========================================
+ Coverage   98.34%   98.35%   +0.01%     
==========================================
  Files         134      134              
  Lines       14430    14439       +9     
==========================================
+ Hits        14191    14202      +11     
+ Misses        239      237       -2     
Impacted Files Coverage Δ
featuretools/entityset/entity.py 94.26% <ø> (-0.08%) ⬇️
featuretools/tests/entityset_tests/test_entity.py 100.00% <ø> (ø)
...s/computational_backends/feature_set_calculator.py 98.69% <100.00%> (ø)
featuretools/entityset/entityset.py 97.80% <100.00%> (+0.21%) ⬆️
featuretools/tests/entityset_tests/test_es.py 98.84% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41cb8a4...7e180be. Read the comment docs.

Copy link
Collaborator

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple if statements in _vals_to_series aren't getting hit. Are there input combinations to featuretools that would result in these lines getting hit?

@thehomebrewnerd
Copy link
Collaborator Author

A couple if statements in _vals_to_series aren't getting hit. Are there input combinations to featuretools that would result in these lines getting hit?

I can take a closer look. This code was copied directly from the code that was in Entity, so I'm assuming it was uncovered there as well. I'll confirm.

If it was uncovered previously, do you want to try to add new tests to fix the coverage as part of this PR?

@thehomebrewnerd
Copy link
Collaborator Author

A couple if statements in _vals_to_series aren't getting hit. Are there input combinations to featuretools that would result in these lines getting hit?

I can take a closer look. This code was copied directly from the code that was in Entity, so I'm assuming it was uncovered there as well. I'll confirm.

If it was uncovered previously, do you want to try to add new tests to fix the coverage as part of this PR?

After further review, these lines also are uncovered currently in main. Looking at the code flow, I don't think it is possible for these lines to ever get hit during normal circumstances.

First of all, we specify that instance ids passed to calculate_feature_matirx should be of type list in the docstring. However, even if a single value were passed, this would get converted to a series here:

# convert list or range object into series
if not isinstance(instance_ids, pd.Series):
instance_ids = pd.Series(instance_ids)

If the user does not specify instance ids, we grab a series from the dataframe here:

if instance_ids is None:
index_var = target_entity.index
df = target_entity._handle_time(target_entity.df,
time_last=cutoff_time,
training_window=training_window,
include_cutoff_time=include_cutoff_time)
instance_ids = df[index_var]

So, given those two scenarios I don't think it is possible for our normal code flow to ever call query_by_values and consequently _vals_to_series with instances that are anything but a pd.Series. We do have several tests that call query_by_values passing a list of instance values.

Unless I have missed something, I do not believe that the uncovered code is necessary. The documentation says that the user should pass a list for instance ids. Even if a single value is passed, this will get converted to a series by the time _vals_to_series gets called. I haven't tested what would happen if a DataFrame is passed, but since we are not telling users that is supported in the documentation, I'm not sure we need to handle that anyway.

With all this in mind we could:

  1. Delete the uncovered code
  2. Add a couple new tests that use a single instance value and a DataFrame

I think the first options is better as it removes what appears to be dead code. Thoughts?

@rwedge
Copy link
Collaborator

rwedge commented Dec 10, 2020

It's dead code as far as the featuretools code flow is concerned, but those input options are described in the query_by_values docstring as valid inputs. Since query_by_values is a public method, it could be getting used. I know someone on the slack channel ended up using query by values for their purposes.

What if we added some tests now, and made an issue about removing those input options in the future? See if we get any feedback.

@thehomebrewnerd
Copy link
Collaborator Author

It's dead code as far as the featuretools code flow is concerned, but those input options are described in the query_by_values docstring as valid inputs. Since query_by_values is a public method, it could be getting used. I know someone on the slack channel ended up using query by values for their purposes.

What if we added some tests now, and made an issue about removing those input options in the future? See if we get any feedback.

The tests should be easy enough to add. As a side note to all this, query_by_values is not currently listed in the API Reference. Do you want it added there as well?

@rwedge
Copy link
Collaborator

rwedge commented Dec 10, 2020

Let's leave it out for now

@@ -6,13 +6,14 @@ Release Notes
* Enhancements
* Fixes
* Changes
* Move ``query_by_values`` method from ``Entity`` to ``EntitySet`` (:pr:`1251`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a "Breaking Changes" section to highlight the move

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

rwedge
rwedge approved these changes Dec 15, 2020
Copy link
Collaborator

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once updated to even with main

@thehomebrewnerd thehomebrewnerd merged commit 8e82989 into main Dec 15, 2020
1 check passed
@thehomebrewnerd thehomebrewnerd deleted the move-query-by-vals branch December 15, 2020 16:56
@rwedge rwedge mentioned this pull request Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move query_by_values from Entity to EntitySet
2 participants