Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DFS to take in strings or datetime values into cutoff_time argument #2147

Merged
merged 24 commits into from
Jun 30, 2022

Conversation

sbadithe
Copy link
Contributor

Fixes #2014

Makes it possible for users to pass in strings, or datetime values into dfs's cutoff_time argument

@codecov
Copy link

codecov bot commented Jun 26, 2022

Codecov Report

Merging #2147 (66fff26) into main (20dbcb4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2147   +/-   ##
=======================================
  Coverage   99.21%   99.22%           
=======================================
  Files         143      143           
  Lines       16917    16937   +20     
=======================================
+ Hits        16785    16805   +20     
  Misses        132      132           
Impacted Files Coverage Δ
featuretools/synthesis/dfs.py 100.00% <ø> (ø)
featuretools/computational_backends/utils.py 96.60% <100.00%> (+0.15%) ⬆️
featuretools/tests/synthesis/test_dfs_method.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20dbcb4...66fff26. Read the comment docs.

@@ -242,6 +244,9 @@ def dfs(
if not isinstance(entityset, EntitySet):
entityset = EntitySet("dfs", dataframes, relationships)

if isinstance(cutoff_time, str):
cutoff_time = dateutil.parser.parse(cutoff_time)
Copy link
Contributor

@gsheni gsheni Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this raises a ParserError or OverflowError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added handling for parseutil's ParserError and a plain OverflowError (looks like parseutil doesn't have a OverflowError defined, as far as I can tell). In the event of a ParserError is it best to just raise a ValueError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. @thehomebrewnerd any clue on what we usually do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tended to use pandas.to_datetime a lot for these types of conversions. Is there a reason to move away from that here and use dateutil.parser.parse instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also probably need to think about situations where we have a numeric time index instead of a datetime time index. Do we also want to handle the conversion for those cases? "100" -> 100?

Copy link
Contributor Author

@sbadithe sbadithe Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used dateutil.parser.parse only because it was in the issue description; I can change it to pandas.to_datetime right now. Should I just re-raise if throws an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbadithe My suggestion would be to catch the pandas exception and raise an error with a more meaningful message alerting the user that they have supplied an invalid cutoff time value.

@sbadithe sbadithe marked this pull request as ready for review June 28, 2022 17:06
@sbadithe sbadithe changed the title [DRAFT] Dfs cutoff time takes datetime and string types dfs cutoff time takes datetime and string types Jun 28, 2022
@sbadithe sbadithe requested a review from dvreed77 June 29, 2022 23:44
Copy link
Contributor

@dvreed77 dvreed77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gsheni gsheni enabled auto-merge (squash) June 30, 2022 14:13
@gsheni gsheni changed the title dfs cutoff time takes datetime and string types Allow DFS to take a string cutoff time Jun 30, 2022
@gsheni gsheni changed the title Allow DFS to take a string cutoff time Allow DFS to take in strings or datetime values into cutoff_time argument Jun 30, 2022
@gsheni gsheni merged commit 3c99fad into main Jun 30, 2022
@gsheni gsheni deleted the dfs-cutoff_time-should-take-datetime-string branch June 30, 2022 14:56
@ozzieD ozzieD mentioned this pull request Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dfs cutoff_time should take a datetime string
4 participants