-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primitive refactor #364
Primitive refactor #364
Conversation
Codecov Report
@@ Coverage Diff @@
## master #364 +/- ##
=========================================
+ Coverage 95.33% 95.53% +0.2%
=========================================
Files 86 89 +3
Lines 8032 7555 -477
=========================================
- Hits 7657 7218 -439
+ Misses 375 337 -38
Continue to review full report at Codecov.
|
* remove incorrect commutative attributes * handle rsub override and test reverse overrides * rename weekend primitive is_weekend * updated weekend to is_weekend in docs * test values for scalar_subtract_numeric * rename subtract_numeric and scalar_subtract_numeric to subtract_numeric_feature and scalar_subtract_numeric_feature * revert subtract_numeric_feature to subtract_numeric
Args: | ||
entity (Entity): entity this feature is being calculated for | ||
base_featres (list[FeatureBase]): list of base features for primitive | ||
primitive (): primitive to calculate. if not initilized when passed, gets initialized with no arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing the type information for primitive
# if self.use_previous and self.use_previous.is_absolute(): | ||
# entity = self.entity | ||
# time_var = IdentityFeature(entity[entity.time_index]) | ||
# deps += [time_var] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been commented out for a while, I think we should either remove it or make an issue about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the time index automatically gets added, so we don't need to put the time index feature as a dependent. removed
|
||
base_entity = set([f.entity for f in base_features]) | ||
assert len(base_entity) == 1, \ | ||
"More than one entity for base features" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's potentially two checks in this init for whether the base features share the same entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. fixed
featuretools/tests/computational_backend/test_calculate_feature_matrix.py
Outdated
Show resolved
Hide resolved
featuretools/tests/computational_backend/test_pandas_backend.py
Outdated
Show resolved
Hide resolved
featuretools/tests/computational_backend/test_pandas_backend.py
Outdated
Show resolved
Hide resolved
seed_feature_sessions = Count(es['log']["id"], es['sessions']) > 2 | ||
seed_feature_log = Hour(es['log']['datetime']) | ||
session_agg = Last(seed_feature_log, es['sessions']) | ||
seed_feature_sessions = ft.Feature(es['log']["id"], parent_entity=es['sessions'], primitive=Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it changes the test but seed_feature_sessions
is missing the > 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, i noticed that. i dont think adding the >2
improved the test so i simplified it
This PR is separates the concept of a Primitive from a Feature. The internals of Featuretools change, but it is has minimal impact on the external API.
Compared to before, a primitive is now only aware of the data it take in. A feature is then defined by input variables and/or features, as well as the primitive that will be applied. Put another way, a feature takes the specific entities and variables of an entityset and the primitive to be applied as its input so the primitive doesn't have to work about it.
This has several advantages
To give an example, here is how a feature is currently defined using the count primitive
Now, you define the inputs to the feature and provide the primitive as an input.
if a primitive has parameters it can be used like this
the API for calling DFS doesn't change with the exception of being able to provide primitive with arguments