Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Reconsider behavior of as.<type>.ArrowDatum functions #28097

Closed
asfimport opened this issue Apr 8, 2021 · 2 comments
Closed

[R] Reconsider behavior of as.<type>.ArrowDatum functions #28097

asfimport opened this issue Apr 8, 2021 · 2 comments

Comments

@asfimport
Copy link

asfimport commented Apr 8, 2021

As discussed at #9942 (comment) the as.double(), as.integer(), and as.character() methods for ArrowDatum return R vectors of the specified R types, whereas in dplyr, these same functions perform casts to the analogous Arrow types 

Compare the definitions:

  • ArrowDatum methods:

    arrow/r/R/arrow-datum.R

    Lines 139 to 145 in ace2bfc

    as.double.ArrowDatum <- function(x, ...) as.double(as.vector(x), ...)
    #' @export
    as.integer.ArrowDatum <- function(x, ...) as.integer(as.vector(x), ...)
    #' @export
    as.character.ArrowDatum <- function(x, ...) as.character(as.vector(x), ...)

  • dplyr functions:

    arrow/r/R/dplyr.R

    Lines 399 to 432 in f2db785

    as.character = function(x) {
    FUN("cast", x, options = cast_options(to_type = string()))
    },
    as.double = function(x) {
    FUN("cast", x, options = cast_options(to_type = float64()))
    },
    as.integer = function(x) {
    FUN(
    "cast",
    x,
    options = cast_options(
    to_type = int32(),
    allow_float_truncate = TRUE,
    allow_decimal_truncate = TRUE
    )
    )
    },
    as.integer64 = function(x) {
    FUN(
    "cast",
    x,
    options = cast_options(
    to_type = int64(),
    allow_float_truncate = TRUE,
    allow_decimal_truncate = TRUE
    )
    )
    },
    as.logical = function(x) {
    FUN("cast", x, options = cast_options(to_type = boolean()))
    },
    as.numeric = function(x) {
    FUN("cast", x, options = cast_options(to_type = float64()))
    },

    Consider whether the ArrowDatum methods should instead perform casts but keep the data in Arrow so that the user would have to also call as.vector() to return the data as an R vector.

Reporter: Ian Cook / @ianmcook
Assignee: Ian Cook / @ianmcook

Related issues:

Note: This issue was originally created as ARROW-12292. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Ian Cook / @ianmcook:
We might also want to define ArrowDatum methods for dplyr's pull() and collect() generics. These would do the same thing as as.vector() when dplyr is loaded.

@asfimport
Copy link
Author

Ian Cook / @ianmcook:
Closing this for now because the current behavior seems fine. Can reopen later if we decide to reconsider this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants