New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove transform stacking from DFS #1119
Conversation
featuretools/tests/primitive_tests/test_groupby_transform_primitives.py
Outdated
Show resolved
Hide resolved
docs/source/changelog.rst
Outdated
**Breaking Changes** | ||
|
||
* ``ft.dfs`` will no longer build features from transform primitives where one of the | ||
inputs was also built with a transform primitve. This will make some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would "where on of the inputs was a Transform feature, a GroupByTransform feature, or a Direct Feature of a Transform / GroupByTransform feature" be an accurate description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that would be the most descriptive way to say this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up adding this in
3e262da
to
a77b23d
Compare
The code changes look in a good spot, let's get it up to date with the main branch and see what's left |
a77b23d
to
71d6016
Compare
Codecov Report
@@ Coverage Diff @@
## main #1119 +/- ##
=======================================
Coverage 98.34% 98.35%
=======================================
Files 126 126
Lines 13268 13309 +41
=======================================
+ Hits 13049 13090 +41
Misses 219 219
Continue to review full report at Codecov.
|
51d36b9
to
2e74495
Compare
2e74495
to
2cb1e19
Compare
b95d601
to
d8ec514
Compare
@@ -242,7 +246,7 @@ def __init__(self, | |||
self.ignore_entities, | |||
self.ignore_variables, | |||
self.es) | |||
self.seed_features = seed_features or [] | |||
self.seed_features = sorted(seed_features or [], key=lambda f: f.primitive.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not sort by unique name, for features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point--changed
99ef591
to
f74a562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Hi @tamargrey, is there any way to manually replicate the transform stacking behaviour that was removed in this PR? For instance if I just have one table and I want to have |
Hi @amin-nejad, your best bet is going to be to create those depth 2 features as seed features to dfs. You could make the features directly and pass them into import featuretools as ft
from featuretools.primitives import (
AddNumeric,
DivideNumeric,
)
from featuretools.variable_types import Numeric
es = ft.demo.load_retail()
# Create the features directly
depth_one_feature = ft.Feature([es['order_products']['quantity'], es['order_products']['unit_price']],
primitive=AddNumeric)
depth_two_feature = ft.Feature([depth_one_feature, es['order_products']['total']],
primitive=DivideNumeric)
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="order_products",
agg_primitives=['mean'],
trans_primitives=[],
seed_features=[depth_two_feature]) # Just this one feature has been added
feature_defs Another option could be to take the results of feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="order_products",
agg_primitives=['mean'],
trans_primitives=[AddNumeric]) # Just this one feature has been added
# Can also get stacking on the results of dfs by creating the Features directly
[ft.Feature([feat, es['order_products']['unit_price']], primitive=DivideNumeric) for feat in feature_defs if feat.variable_type == Numeric] Let me know if that helps! |
Thanks very much @tamargrey and sorry for the late reply. I followed the latter approach which works well for me. MWE for anyone interested. I would suggest adding a small note to the docs about this as I found it a little confusing that transform primitives do not stack and are not subject to the |
Remove transform stacking from DFS
Dfs has been returning a different set of features depending on the order of the
trans_primitives
input list. Our goal with this PR is to make it such that both the set of features we get fromdfs
and their order are the same independent of the order of the input list.ToDo:
dfs
so it doesn't create features that contain transform primitives stacked directly on one anotherDeepFeatureSynthesis
object, sort all the primitive lists according to their names (this touches all types of primitives and not just transform primitives)Testing: