Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds preprocessing component to handle datetime featurization #838

Merged
merged 14 commits into from Jun 9, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jun 4, 2020

Closes #627

@angela97lin angela97lin self-assigned this Jun 4, 2020
@angela97lin angela97lin changed the title Adds component to handle datetime featurization Adds preprocessing component to handle datetime featurization Jun 4, 2020
@codecov
Copy link

codecov bot commented Jun 4, 2020

Codecov Report

Merging #838 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #838    +/-   ##
========================================
  Coverage   99.68%   99.68%            
========================================
  Files         193      195     +2     
  Lines        7598     7699   +101     
========================================
+ Hits         7574     7675   +101     
  Misses         24       24            
Impacted Files Coverage Δ
evalml/pipelines/components/__init__.py 100.00% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.00% <100.00%> (ø)
.../components/transformers/preprocessing/__init__.py 100.00% <100.00%> (ø)
...ansformers/preprocessing/datetime_featurization.py 100.00% <100.00%> (ø)
...sts/component_tests/test_datetime_featurization.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0d7f41...a9dbdcf. Read the comment docs.

@angela97lin angela97lin requested review from kmax12 and dsherry Jun 8, 2020
@angela97lin angela97lin marked this pull request as ready for review Jun 8, 2020
@angela97lin angela97lin requested a review from kmax12 Jun 9, 2020
if len(invalid_features) > 0:
raise ValueError("{} are not valid options for features_to_extract".format(", ".join([f"'{feature}'" for feature in invalid_features])))

parameters = {"features_to_extract": features_to_extract}
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Note that if we change the name of these features in the future, we may have to include some sort of update logic here for backwards compatibility. So let's double-check the feature names.

return self

def transform(self, X, y=None):
"""Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case where a user wouldn't want the datetime to be automatically dropped? And if so is it worthwhile to expose a drop_datetime_column boolean in __init__, or to simply not drop here and ask users to use our drop column component?

Copy link
Contributor Author

@angela97lin angela97lin Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can't think of any cases right now, but I think we can always put up another issue/PR for it if we do see that that's the case :)

@dsherry
Copy link
Collaborator

dsherry commented Jun 9, 2020

@angela97lin reminder to please move this to "Review" status in zenhub


def test_datetime_featurization_no_features_to_extract():
datetime_transformer = DateTimeFeaturization(features_to_extract=[])
rng = pd.date_range('2020-02-24', periods=20, freq='D')
Copy link
Collaborator

@dsherry dsherry Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know pandas had this, helpful! Not a concern for this PR but I wonder if they handle months properly. It's surprising how many edge-cases pop up with time series data, lol.

dsherry
dsherry approved these changes Jun 9, 2020
Copy link
Collaborator

@dsherry dsherry left a comment

LGTM! I left comments about the impl and about what to make public vs keep private. Other than that, ready to merge from my perspective.

It's exciting to be taking what I believe is our first step towards supporting time series modeling!

kmax12
kmax12 approved these changes Jun 9, 2020
Copy link
Contributor

@kmax12 kmax12 left a comment

LGTM once all of dylan's feedback is addressed

@angela97lin angela97lin merged commit 2b54ced into master Jun 9, 2020
2 checks passed
@angela97lin angela97lin deleted the 627_datetime branch Jun 9, 2020
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a component that handles datetime featurization
3 participants