Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing groupby features #754

Merged
merged 7 commits into from Sep 25, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/changelog.rst
Expand Up @@ -13,6 +13,7 @@ Changelog
* Added error message when DateTimeIndex is a variable but not set as the time_index (:pr:`723`)
* Fixed CumCount and other group-by transform primitives that take ID as input (:pr:`733`)
* Fix progress bar undercounting (:pr:`743`)
* Fixed issue where some group-by features weren't created (:pr:`754`)
frances-h marked this conversation as resolved.
Show resolved Hide resolved
* Updated training_window error assertion to only check against observations (:pr:`728`)
* Don't delete the whole destination folder while saving entityset (:pr:`717`)
* Changes
Expand Down
12 changes: 10 additions & 2 deletions featuretools/synthesis/deep_feature_synthesis.py
Expand Up @@ -517,16 +517,24 @@ def _build_transform_features(self, all_features, entity, max_depth=0,
entity,
new_max_depth,
input_types,
groupby_prim,
require_direct_input=require_direct_input)
groupby_prim)
# get IDs to use as groupby
id_matches = self._features_by_type(all_features=all_features,
entity=entity,
max_depth=new_max_depth,
variable_type=set([Id]))
# If require_direct_input, require a DirectFeature in input or as a
# groupby, and don't create features of inputs/groupbys which are
# all direct features with the same relationship path
for matching_input in matching_inputs:
if all(bf.number_output_features == 1 for bf in matching_input):
for id_groupby in id_matches:
if require_direct_input and (
_all_direct_and_same_path(matching_input + (id_groupby,)) or
not any([isinstance(feature, DirectFeature) for
feature in (matching_input + (id_groupby, ))])
):
continue
new_f = GroupByTransformFeature(list(matching_input),
groupby=id_groupby,
primitive=groupby_prim)
Expand Down
11 changes: 11 additions & 0 deletions featuretools/tests/synthesis/test_deep_feature_synthesis.py
Expand Up @@ -271,6 +271,17 @@ def test_make_groupby_features(es):
"CUM_SUM(value) by session_id"))


def test_make_indirect_groupby_features(es):
dfs_obj = DeepFeatureSynthesis(target_entity_id='log',
entityset=es,
agg_primitives=[],
trans_primitives=[],
groupby_trans_primitives=['cum_sum'])
features = dfs_obj.build_features()
assert (feature_with_name(features,
"CUM_SUM(products.rating) by session_id"))


def test_make_groupby_features_with_id(es):
dfs_obj = DeepFeatureSynthesis(target_entity_id='sessions',
entityset=es,
Expand Down