New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Implement lubridate's individual date/time parsers #30030
Comments
Dewey Dunnington / @paleolimbot: Some testing that might be useful when putting together a PR: library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
test_dates <- tibble::tibble(
string_ymd = c("2021-09-10", "2021/09/10", "20210910", "2021 Sep 10", "2021 September 10", NA),
string_dmy = c("10-09-2021", "10/09/2021", "10092021", "10 Sep 2021", "10 September 2021", NA),
string_mdy = c("09-10-2021", "09/10/2021", "09102021", "Sep 10 2021", "September 10 2021", NA),
date = c(rep(as.Date("2021-09-10"), 5), NA),
date_midnight = c(rep(as.POSIXct("2021-09-10 00:00:00", tz = "UTC"), 5), NA)
)
# these get dropped by as.POSIXct if the system tz is UTC?
attr(test_dates$date_midnight, "tzone") <- "UTC"
test_datetimes <- tibble::tibble(
string_ymd_hms = stringr::str_c(test_dates$string_ymd, "01:23:45"),
string_dmy_hms = stringr::str_c(test_dates$string_dmy, "01:23:45"),
string_mdy_hms = stringr::str_c(test_dates$string_mdy, "01:23:45"),
string_ymd_hm = stringr::str_c(test_dates$string_ymd, "01:23"),
string_dmy_hm = stringr::str_c(test_dates$string_dmy, "01:23"),
string_mdy_hm = stringr::str_c(test_dates$string_mdy, "01:23"),
string_ymd_h = stringr::str_c(test_dates$string_ymd, "01"),
string_dmy_h = stringr::str_c(test_dates$string_dmy, "01"),
string_mdy_h = stringr::str_c(test_dates$string_mdy, "01"),
date_second = c(rep(as.POSIXct("2021-09-10 01:23:45", tz = "UTC"), 5), NA),
date_minute = c(rep(as.POSIXct("2021-09-10 01:23", tz = "UTC"), 5), NA),
date_hour = c(rep(as.POSIXct("2021-09-10", tz = "UTC") + 60 * 60, 5), NA)
)
# these get dropped by as.POSIXct if the system tz is UTC?
attr(test_datetimes$date_second, "tzone") <- "UTC"
attr(test_datetimes$date_minute, "tzone") <- "UTC"
attr(test_datetimes$date_hour, "tzone") <- "UTC"
# tests with lubridate, R eval
library(testthat, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
expect_identical(ymd(test_dates$string_ymd), test_dates$date)
expect_identical(dmy(test_dates$string_dmy), test_dates$date)
expect_identical(mdy(test_dates$string_mdy), test_dates$date)
expect_identical(ymd(test_dates$string_ymd, tz = "UTC"), test_dates$date_midnight)
expect_identical(dmy(test_dates$string_dmy, tz = "UTC"), test_dates$date_midnight)
expect_identical(mdy(test_dates$string_mdy, tz = "UTC"), test_dates$date_midnight)
expect_identical(
ymd_hms(test_datetimes$string_ymd_hms, tz = "UTC"),
test_datetimes$date_second
)
expect_identical(
dmy_hms(test_datetimes$string_dmy_hms, tz = "UTC"),
test_datetimes$date_second
)
expect_identical(
mdy_hms(test_datetimes$string_mdy_hms, tz = "UTC"),
test_datetimes$date_second
)
expect_identical(
ymd_hm(test_datetimes$string_ymd_hm, tz = "UTC"),
test_datetimes$date_minute
)
expect_identical(
dmy_hm(test_datetimes$string_dmy_hm, tz = "UTC"),
test_datetimes$date_minute
)
expect_identical(
mdy_hm(test_datetimes$string_mdy_hm, tz = "UTC"),
test_datetimes$date_minute
)
expect_identical(
ymd_h(test_datetimes$string_ymd_h, tz = "UTC"),
test_datetimes$date_hour
)
expect_identical(
dmy_h(test_datetimes$string_dmy_h, tz = "UTC"),
test_datetimes$date_hour
)
expect_identical(
mdy_h(test_datetimes$string_mdy_h, tz = "UTC"),
test_datetimes$date_hour
) |
Nicola Crane / @thisisnic: |
Dewey Dunnington / @paleolimbot: |
Nicola Crane / @thisisnic: |
Dragoș Moldovan-Grünfeld / @dragosmg: suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(arrow))
suppressPackageStartupMessages(library(lubridate))
df <- tibble(x = c("09-01-01", "09-01-02", "09-01-03"))
df
#> # A tibble: 3 × 1
#> x
#> <chr>
#> 1 09-01-01
#> 2 09-01-02
#> 3 09-01-03
# lubridate::ymd()
df %>%
mutate(y = ymd(x))
#> # A tibble: 3 × 2
#> x y
#> <chr> <date>
#> 1 09-01-01 2009-01-01
#> 2 09-01-02 2009-01-02
#> 3 09-01-03 2009-01-03
# y = short year correct
df %>%
record_batch() %>%
mutate(y = strptime(x, format = "%y-%m-%d", unit = "us")) %>%
collect()
#> # A tibble: 3 × 2
#> x y
#> <chr> <dttm>
#> 1 09-01-01 2009-01-01 00:00:00
#> 2 09-01-02 2009-01-02 00:00:00
#> 3 09-01-03 2009-01-03 00:00:00
# Y = long year this should fail in order for us to rely on coalesce
df %>%
record_batch() %>%
mutate(y = strptime(x, format = "%Y-%m-%d", unit = "us")) %>%
collect()
#> # A tibble: 3 × 2
#> x y
#> <chr> <dttm>
#> 1 09-01-01 0008-12-31 23:58:45
#> 2 09-01-02 0009-01-01 23:58:45
#> 3 09-01-03 0009-01-02 23:58:45 Therefore, my early (and somewhat naive) conclusion would be that we cannot implement |
Dragoș Moldovan-Grünfeld / @dragosmg: |
Dragoș Moldovan-Grünfeld / @dragosmg: |
Dragoș Moldovan-Grünfeld / @dragosmg: |
Parse dates with year, month, and day components:
ymd() ydm() mdy() myd() dmy() dym() yq() ym() my()
Parse date-times with year, month, and day, hour, minute, and second components:
ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() mdy_h() ydm_hms() ydm_hm() ydm_h()
Parse periods with hour, minute, and second components:
ms() hm() hms()
Reporter: Nicola Crane / @thisisnic
Assignee: Rok Mihevc / @rok
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-14471. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: