-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds preprocessing component to handle datetime featurization #838
Conversation
Codecov Report
@@ Coverage Diff @@
## master #838 +/- ##
========================================
Coverage 99.68% 99.68%
========================================
Files 193 195 +2
Lines 7598 7699 +101
========================================
+ Hits 7574 7675 +101
Misses 24 24
Continue to review full report at Codecov.
|
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Show resolved
Hide resolved
if len(invalid_features) > 0: | ||
raise ValueError("{} are not valid options for features_to_extract".format(", ".join([f"'{feature}'" for feature in invalid_features]))) | ||
|
||
parameters = {"features_to_extract": features_to_extract} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Note that if we change the name of these features in the future, we may have to include some sort of update logic here for backwards compatibility. So let's double-check the feature names.
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
return self | ||
|
||
def transform(self, X, y=None): | ||
"""Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a case where a user wouldn't want the datetime to be automatically dropped? And if so is it worthwhile to expose a drop_datetime_column
boolean in __init__
, or to simply not drop here and ask users to use our drop column component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I can't think of any cases right now, but I think we can always put up another issue/PR for it if we do see that that's the case :)
evalml/pipelines/components/transformers/preprocessing/datetime_featurization.py
Outdated
Show resolved
Hide resolved
@angela97lin reminder to please move this to "Review" status in zenhub |
|
||
def test_datetime_featurization_no_features_to_extract(): | ||
datetime_transformer = DateTimeFeaturization(features_to_extract=[]) | ||
rng = pd.date_range('2020-02-24', periods=20, freq='D') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know pandas had this, helpful! Not a concern for this PR but I wonder if they handle months properly. It's surprising how many edge-cases pop up with time series data, lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left comments about the impl and about what to make public vs keep private. Other than that, ready to merge from my perspective.
It's exciting to be taking what I believe is our first step towards supporting time series modeling!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once all of dylan's feedback is addressed
Closes #627