Skip to content

Allow where clauses on direct features in Deep Feature Synthesis #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 15, 2018

Conversation

kmax12
Copy link
Contributor

@kmax12 kmax12 commented Oct 7, 2018

DFS will automatically add where clauses to aggregation features based on the values in the interesting_valuesproperty of another variable within that entity.

This PR allows DFS to add where clauses using the interesting values of a direct feature. To accomplish this I added a variable property to direct features that used to only be defined for identity features.

First reported by @favstats on stackoverflow: https://stackoverflow.com/questions/52673694/specifying-interesting-variables-with-featuretools-does-not-work

@codecov-io
Copy link

codecov-io commented Oct 7, 2018

Codecov Report

Merging #279 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #279      +/-   ##
==========================================
+ Coverage   94.45%   94.45%   +<.01%     
==========================================
  Files          71       71              
  Lines        7698     7705       +7     
==========================================
+ Hits         7271     7278       +7     
  Misses        427      427
Impacted Files Coverage Δ
featuretools/tests/testing_utils/mock_ds.py 87.4% <ø> (ø) ⬆️
featuretools/synthesis/deep_feature_synthesis.py 93.29% <100%> (+0.01%) ⬆️
featuretools/primitives/direct_feature.py 95.83% <100%> (+0.37%) ⬆️
...ols/tests/dfs_tests/test_deep_feature_synthesis.py 98.45% <100%> (ø) ⬆️
...sts/feature_function_tests/test_direct_features.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcb9cda...b1e7f9a. Read the comment docs.

@favstats
Copy link

favstats commented Oct 7, 2018

This fixed my issue! Thank you so much for the very quick help, this is really amazing!

@kmax12 kmax12 requested a review from rwedge October 10, 2018 21:49
@@ -34,6 +34,8 @@ def make_ecommerce_files(with_integer_time_index=False, base_path=None, file_loc
product_df = pd.DataFrame({'id': ['Haribo sugar-free gummy bears', 'car',
'toothpaste', 'brown bag', 'coke zero',
'taco clock'],
'department': ["food", "electronics", "food",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "health" for the toothpaste's department type?

@@ -544,6 +549,7 @@ def test_where_different_base_feats(es):
assert hashed not in where_feats


# TODO: not clear what this tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this as a backlog issue?

@kmax12
Copy link
Contributor Author

kmax12 commented Oct 15, 2018

@rwedge addressed your comments. does this look good to merge?

@rwedge
Copy link
Contributor

rwedge commented Oct 15, 2018

Looks good

@kmax12 kmax12 merged commit fcc93e7 into master Oct 15, 2018
@gsheni gsheni deleted the interesting-values-direct-features branch October 24, 2018 15:37
@rwedge rwedge mentioned this pull request Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants