Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate date variables to, e.g., "date'2019-12-10'" in generated SQL #44

Closed
OssiLehtinen opened this issue Dec 10, 2019 · 10 comments
Closed
Labels

Comments

@OssiLehtinen
Copy link

@OssiLehtinen OssiLehtinen commented Dec 10, 2019

Issue Description

I'm sorry for bombarding you with all the issues. In any case, this one is more of a feature request.

The thing is, unlike most other databases, Athena does not interpret iso-date strings as dates, so a query such as:
select * from table where date_col <= '2019-12-10'
fails.

To make it work, one needs to explicitly cast the date-string to a date. That is, this works:
select * from table where date_col <= date'2019-12-10'

To make things work on the R side when using dplyr, something like this needs to be used (with 'today' a R variable of class Date):
... %>% filter(date_col <= as.Date(today))
So, one needs to cast the Date as a Date for the generated sql to be correct.

This is something one can live with, so no rush with this one, but it would be a big convenience, if RAthena would detect when a date variable is used and add the date-prefix automatically, so code such as
... %>% filter(date_col <= today)
would work similar to most other databases.

Unfortunately I can't figure out how to do this so I'm just throwing it here if some solution comes to mind. In principle, I think, overloading the method escape.Date from dbplyr could help here, but I'm drawing a blank on how to accomplish this.

I want to emphasize, that this is more of an issue/feature of Athena/Presto itself, but this kind of functionality in RAthena would hide things from the user with no downside I can immediately think of.

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

will select * from table where date_col <= date'2019-12-10' fix this issue or does it have been something like: select * from table where date_col <= cast('2019-12-10' as date)

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 10, 2019

The shorthand version of the first one works. The latter one works also, and I wouldn't be surprised if both were treated in an identical way under the hood.

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

This looks promising: https://prestodb.io/docs/current/functions/datetime.html it looks like the Presto and Athena accept the function date(var1) and passes it to cast(var1 as date)

This should work in the current implementation of RAthena:

library(DBI)

con <- dbConnect(RAthena::athena())

today<- Sys.Date()
tbl(con, "iris") %>% 
  mutate(DATE = date(today))

Is this what you meant or did you mean another implementation?

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

As Presto can use MySQL Date Functions

MySQL Date Functions:
The functions in this section use a format string that is compatible with the MySQL date_parse and str_to_date functions. The following table, based on the MySQL manual, describes the format specifiers

It might be worth inverstigating RMySQL: https://github.com/r-dbi/RMySQL to see how Date functions are implemented in sql_translate_env.R

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 10, 2019

This looks promising: https://prestodb.io/docs/current/functions/datetime.html it looks like the Presto and Athena accept the function date(var1) and passes it to cast(var1 as date)

This should work in the current implementation of RAthena:

library(DBI)

con <- dbConnect(RAthena::athena())

today<- Sys.Date()
tbl(con, "iris") %>% 
  mutate(DATE = date(today))

Is this what you meant or did you mean another implementation?

This snippet does work, but does not address the issue of having to cast the date as a date.

This does not work, but perhaps could be made to work (R after all knows today is a Date and not a string):

library(DBI)

con <- dbConnect(RAthena::athena())

today<- Sys.Date()
tbl(con, "iris") %>% 
  mutate(DATE = today)

Or to be precise, that one produces a column of char type and the one you should produces a date-column. In filter statements an error if produced if date(today) (or as.Date(today)) is not used.

Currently the date variables are translated to quoted strings in SQL, but maybe they could be translated to date('2019-01-01') (or date'2019-01-01, or even cast('2019-01-01' as date), each one works)?

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

Not a 100% sure how to implement a solution like this. Will have to raise a ticket on dplyr around how this would work if at all.

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 11, 2019

Yeah, I've been trying to figure it out too, and am starting to think some changes to dplyr/dbplyr would be needed.

The best thing I've come up with is to redefine escape.Date function and use assingInNamespace to override dbplyr's own internal method, but this is ugly as h*** and also something CRAN will reject, I think.

I mean something like this in, e.g., zzz.R accomplishes what I was going after, but...

escape.Date.AthenaConnection <- function(x, parens = NA, collapse = ", ", con = NULL) {
   x <- as.character(x)
   paste0("date'", x, "'")
}

.onLoad <- function(libname, pkgname) {
  ...
  assignInNamespace("escape.Date", escape.Date.AthenaConnection, ns="dbplyr")  
  ...
}

As a bonus, there is an error message when assignInNamespace is called.

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 11, 2019

Anyway, especially since there doesn't seem to be a straightforward (and proper) way to do this, I wouldn't put too high of a priority here. Having to cast the dates is something one can live with. It's pretty much the same issue with any tool when using Athena. 'Fixing' this would of course set RAthena apart from the rest of the pack :)

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 11, 2019

I will set this to low priority. If you are able to get a working solution I will be happy to review and see how feasible it will to add to the package.

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 12, 2019

I posted a question at dbplyr asking for guidance on how this could be addressed:

tidyverse/dbplyr#383

DyfanJones added a commit that referenced this issue Jan 22, 2020
Add an S3 method for special handling of dates and iso-formatted date strings in dbplyr
DyfanJones added a commit to DyfanJones/noctua that referenced this issue Jan 24, 2020
@DyfanJones DyfanJones mentioned this issue Jan 31, 2020
25 of 25 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.