Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/datetime functions #91

Merged

Conversation

Yuhuishishishi
Copy link
Contributor

Noob here.

I am trying to help with #11.
I have done a few remaining datetime related functions:

  • TIMESTAMPADD
  • LASTDAY
  • CEIL (time unit=YEAR, WEEK, MONTH, QUARTER currently do not work)
  • FLOOR (time unit=YEAR, WEEK, MONTH, QUARTER currently do not work)

I am still working on:

  • TIMESTAMPDIFF

Would mind helping me review the diff and provide a few directions? Thanks

@codecov-io
Copy link

codecov-io commented Nov 29, 2020

Codecov Report

Merging #91 (95be7ab) into main (ad53a36) will decrease coverage by 0.28%.
The diff coverage is 88.57%.

Impacted file tree graph

@@             Coverage Diff             @@
##              main      #91      +/-   ##
===========================================
- Coverage   100.00%   99.71%   -0.29%     
===========================================
  Files           34       34              
  Lines         1381     1427      +46     
  Branches       189      196       +7     
===========================================
+ Hits          1381     1423      +42     
- Misses           0        3       +3     
- Partials         0        1       +1     
Impacted Files Coverage Δ
dask_sql/mappings.py 96.20% <66.66%> (-3.80%) ⬇️
dask_sql/physical/rex/core/call.py 99.58% <96.15%> (-0.42%) ⬇️
dask_sql/input_utils.py 100.00% <0.00%> (ø)
dask_sql/physical/rel/logical/join.py 100.00% <0.00%> (ø)
dask_sql/physical/rel/logical/filter.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad53a36...95be7ab. Read the comment docs.

@nils-braun
Copy link
Collaborator

nils-braun commented Nov 30, 2020

Thank you so much @Yuhuishishishi and a very warm welcome to the project! Happy to have you with me.

That is definitely not a "noob" PR!. Just some general thoughts (which you have probably already thought of), before I add some detailed comments:

  • could you fix the style errors so that we have a consistent style? Running "black" on the files should probably do the trick.
  • do you plan to add some tests for the remaining uncoverd lines or should I help you with this?

I guess you are planning to do these things anyway, just to have them written somewhere.

except TypeError:
# interval type is not recognized, fall back to default case
pass

# Calcite will always convert to milliseconds
# no matter what the actual interval is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to not be correct, or? Maybe we should change the comment then.
Thanks for finding out!

# Calcite will always convert to milliseconds
# no matter what the actual interval is
# I am not sure if this breaks somewhere,
# but so far it works
# Issue: if sql_type is INTERVAL MICROSECOND, and value <= 1000, literal_value will be rounded to 0
return timedelta(milliseconds=float(str(literal_value)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of your class (pd.tseries.offsets.DateOffset) to the datetime timedelta? Just for me to understand - I am very fine with changing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pd.tseries.offsets, my understanding is that it's more convenient to specify timedelta with data-dependent length. E.g., when incrementing the current timestamp by "1 month", with standard timedelta, we need to compute if to add 30/31/28/29 days depending on the month/year of the current timestamp.
With pd.tseries.offsets, we can use `pd.tseries.offsets.DateOffset(months=1), and pandas shall take care of the day conversion under the hood.

Similar thing happens to "year" increment as well, due to leap years.

For last_day implementation, it is convenient to use pd.tseries.offsets.MonthEnd, which shifts the time to the last day of the month.

For more generic cases (such as + hours/minutes/seconds), I think the standard timedelta and pd.tseries.offsets will all do the tricks.

}, f"Round method can only be either ceil or floor"

super().__init__(
lambda x: pd.api.types.is_datetime64_any_dtype(x) or isinstance(x, datetime), # if the series is dt type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a temporary variable for the lambda function

def _round_datetime(self, *operands):
df, unit = operands

if is_frame(df):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we need this part multiple times (here and e.g. also in the "last_day" function): should we create a utility function?

@@ -37,26 +37,32 @@ def of(self, op: "Operation") -> "Operation":
return Operation(lambda x: self(op(x)))


class TensorScalarOperation(Operation):
class PredicteBasedOperation(Operation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@Yuhuishishishi
Copy link
Contributor Author

Yuhuishishishi commented Dec 1, 2020

could you fix the style errors so that we have a consistent style? Running "black" on the files should probably do the trick.

The black command triggers reformatting some of the lines that are not originally included in this PR, thus creating some additional fmt only line diffs. Is it fine to do so?

do you plan to add some tests for the remaining uncovered lines or should I help you with this?

I am pretty new Codecov, but I think I am going to figure out the uncovered lines and will add tests to cover them (mostly exception handling logics). Adding a "WIP" mark to the PR before they are completed.

@Yuhuishishishi Yuhuishishishi changed the title Feature/datetime functions [WIP] Feature/datetime functions Dec 1, 2020
@nils-braun
Copy link
Collaborator

The black command triggers reformatting some of the lines that are not originally included in this PR, thus creating some additional fmt only line diffs. Is it fine to do so?

Hm, that is strange (because lines with wrong formatting should not be committable). Which version of black are you using? I am using black, version 19.10b0.

I am pretty new Codecov, but I think I am going to figure out the uncovered lines and will add tests to cover them (mostly exception handling logics). Adding a "WIP" mark to the PR before they are completed.

Probably you already know, but you can also run the tests locally with coverage display:

pytest tests --cov-report term-missing

But if you need any help, I am happy to help you find out!

@Yuhuishishishi Yuhuishishishi changed the title [WIP] Feature/datetime functions Feature/datetime functions Dec 7, 2020
@Yuhuishishishi
Copy link
Contributor Author

@nils-braun I have updated the PR to address some of your comments. Thanks for all the nice suggestions.

@nils-braun
Copy link
Collaborator

Very nice @Yuhuishishishi
That one is ready to merge. If you have any more additions to it, feel free to open a new PR.
Very happy to have you on board in this project!

@nils-braun nils-braun merged commit 6fa88ed into dask-contrib:main Dec 7, 2020
@Yuhuishishishi
Copy link
Contributor Author

@nils-braun Appreciate all the comments and help. Glad to help with the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants