-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/datetime functions #91
Feature/datetime functions #91
Conversation
Codecov Report
@@ Coverage Diff @@
## main #91 +/- ##
===========================================
- Coverage 100.00% 99.71% -0.29%
===========================================
Files 34 34
Lines 1381 1427 +46
Branches 189 196 +7
===========================================
+ Hits 1381 1423 +42
- Misses 0 3 +3
- Partials 0 1 +1
Continue to review full report at Codecov.
|
Thank you so much @Yuhuishishishi and a very warm welcome to the project! Happy to have you with me. That is definitely not a "noob" PR!. Just some general thoughts (which you have probably already thought of), before I add some detailed comments:
I guess you are planning to do these things anyway, just to have them written somewhere. |
dask_sql/mappings.py
Outdated
except TypeError: | ||
# interval type is not recognized, fall back to default case | ||
pass | ||
|
||
# Calcite will always convert to milliseconds | ||
# no matter what the actual interval is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to not be correct, or? Maybe we should change the comment then.
Thanks for finding out!
# Calcite will always convert to milliseconds | ||
# no matter what the actual interval is | ||
# I am not sure if this breaks somewhere, | ||
# but so far it works | ||
# Issue: if sql_type is INTERVAL MICROSECOND, and value <= 1000, literal_value will be rounded to 0 | ||
return timedelta(milliseconds=float(str(literal_value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of your class (pd.tseries.offsets.DateOffset
) to the datetime
timedelta? Just for me to understand - I am very fine with changing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With pd.tseries.offsets
, my understanding is that it's more convenient to specify timedelta with data-dependent length. E.g., when incrementing the current timestamp by "1 month", with standard timedelta, we need to compute if to add 30/31/28/29 days depending on the month/year of the current timestamp.
With pd.tseries.offsets
, we can use `pd.tseries.offsets.DateOffset(months=1), and pandas shall take care of the day conversion under the hood.
Similar thing happens to "year" increment as well, due to leap years.
For last_day
implementation, it is convenient to use pd.tseries.offsets.MonthEnd
, which shifts the time to the last day of the month.
For more generic cases (such as + hours/minutes/seconds), I think the standard timedelta and pd.tseries.offsets
will all do the tricks.
dask_sql/physical/rex/core/call.py
Outdated
}, f"Round method can only be either ceil or floor" | ||
|
||
super().__init__( | ||
lambda x: pd.api.types.is_datetime64_any_dtype(x) or isinstance(x, datetime), # if the series is dt type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a temporary variable for the lambda function
dask_sql/physical/rex/core/call.py
Outdated
def _round_datetime(self, *operands): | ||
df, unit = operands | ||
|
||
if is_frame(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we need this part multiple times (here and e.g. also in the "last_day" function): should we create a utility function?
@@ -37,26 +37,32 @@ def of(self, op: "Operation") -> "Operation": | |||
return Operation(lambda x: self(op(x))) | |||
|
|||
|
|||
class TensorScalarOperation(Operation): | |||
class PredicteBasedOperation(Operation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
The
I am pretty new Codecov, but I think I am going to figure out the uncovered lines and will add tests to cover them (mostly exception handling logics). Adding a "WIP" mark to the PR before they are completed. |
Hm, that is strange (because lines with wrong formatting should not be committable). Which version of black are you using? I am using
Probably you already know, but you can also run the tests locally with coverage display:
But if you need any help, I am happy to help you find out! |
95be7ab
to
a608483
Compare
@nils-braun I have updated the PR to address some of your comments. Thanks for all the nice suggestions. |
Very nice @Yuhuishishishi |
@nils-braun Appreciate all the comments and help. Glad to help with the project! |
Noob here.
I am trying to help with #11.
I have done a few remaining datetime related functions:
I am still working on:
Would mind helping me review the diff and provide a few directions? Thanks