-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow DFS to take in strings or datetime values into cutoff_time argument #2147
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2147 +/- ##
=======================================
Coverage 99.21% 99.22%
=======================================
Files 143 143
Lines 16917 16937 +20
=======================================
+ Hits 16785 16805 +20
Misses 132 132
Continue to review full report at Codecov.
|
featuretools/synthesis/dfs.py
Outdated
@@ -242,6 +244,9 @@ def dfs( | |||
if not isinstance(entityset, EntitySet): | |||
entityset = EntitySet("dfs", dataframes, relationships) | |||
|
|||
if isinstance(cutoff_time, str): | |||
cutoff_time = dateutil.parser.parse(cutoff_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this raises a ParserError or OverflowError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added handling for parseutil's ParserError and a plain OverflowError (looks like parseutil doesn't have a OverflowError defined, as far as I can tell). In the event of a ParserError is it best to just raise a ValueError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. @thehomebrewnerd any clue on what we usually do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tended to use pandas.to_datetime
a lot for these types of conversions. Is there a reason to move away from that here and use dateutil.parser.parse
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also probably need to think about situations where we have a numeric time index instead of a datetime time index. Do we also want to handle the conversion for those cases? "100"
-> 100
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used dateutil.parser.parse
only because it was in the issue description; I can change it to pandas.to_datetime
right now. Should I just re-raise if throws an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbadithe My suggestion would be to catch the pandas exception and raise an error with a more meaningful message alerting the user that they have supplied an invalid cutoff time value.
Merge branch 'dfs-cutoff_time-should-take-datetime-string' of https://github.com/alteryx/featuretools into dfs-cutoff_time-should-take-datetime-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fixes #2014
Makes it possible for users to pass in strings, or datetime values into dfs's
cutoff_time
argument