-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] handling of various blob representations #343
Comments
Thanks. I don't understand "root cause". In what way do you propose that DBI should change? I don't mind changing DBItest here and testing only a |
@krlmlr Yeah, sorry, that's what I meant: Currently, being able to handle |
DBI aside, I do think that it's pretty reasonable to expect tibble::tibble(x = list(raw())) |> nanoarrow::as_nanoarrow_array()
#> Error in infer_nanoarrow_schema.default(X[[i]], ...): Can't infer Arrow type for object of class list For comparison, Arrow handles this fine: tibble::tibble(x = list(raw())) |> arrow::as_arrow_array()
#> StructArray
#> <struct<x: binary>>
#> -- is_valid: all not null
#> -- child 0 type: binary
#> [
#>
#> ] Created on 2023-12-22 with reprex v2.0.2 |
@paleolimbot the following does the trick for me: infer_nanoarrow_schema.AsIs <- function(x, ...) {
# unfortunately NextMethod() goes directly to `default`
class(x) <- class(x)[-1]
infer_nanoarrow_schema(x)
}
infer_nanoarrow_schema.list <- function(x, ...) {
is_raw <- vapply(x, is.raw, logical(1))
if (!all(is_raw)) {
stop("Only lists of raw vectors are currently supported", call. = FALSE)
}
if (length(x) > 0 && sum(lengths(x)) > .Machine$integer.max) {
nanoarrow::na_large_binary()
} else {
nanoarrow::na_binary()
}
} where the list implementation would be simple in nanoarrow, as it's a copy of the blob implementation. |
I'd ideally push that into C but will at the very least add the R version to the next nanoarrow release (early Jan). |
There are several ways to handle "blob" data in a
data.frame
:of which only the first currently is accepted by nanoarrow. In DBI this is handled by
AsIs
class: https://github.com/r-dbi/DBI/blob/main/R/dbiDataType_AsIs.RFor now I'm doing this in adbi by implementing the generic for the two types i need (https://github.com/r-dbi/adbi/blob/main/R/nanoarrow.R).
Not sure how you feel about this. If you think this could also live in nanoarrow, I'm happy to submit a PR.
One problem with this is that it might pose a conflict with arbitrary nested types such as
But then again, this could (and I believe currently is) handled via
vctrs_list_of
. Maybe bare lists are only allowed to contain raw vectors and this is some sort of "legacy" interface?@krlmlr maybe DBI should change here to fix the root cause? Then again with that there's always backwards compatibility to worry about there.
The text was updated successfully, but these errors were encountered: