Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] dplyr n function cannot be called with dplyr::n() #20246

Closed
asfimport opened this issue May 13, 2022 · 1 comment
Closed

[R] dplyr n function cannot be called with dplyr::n() #20246

asfimport opened this issue May 13, 2022 · 1 comment

Comments

@asfimport
Copy link

asfimport commented May 13, 2022

I am trying to summarize an arrow dataset in R using the n function from dplyr, but I noticed that it does not work when called via the dplyr::n syntax, even though it works fine just as n. I also tried the n_distinct function with the same issue

library(arrow)
#
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#
#>     timestamp
library(dplyr)
#
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#
#>     filter, lag
#> The following objects are masked from 'package:base':
#
#>     intersect, setdiff, setequal, union
dir<-file.path(tempdir(), "test-data")
test_data <- data.frame(A=1:10)
write_dataset(test_data, dir)

1. This does work
   data2<-open_dataset(dir)%>%
       summarise(N=n())
   data2
#> FileSystemDataset (query)
#> N: int32
#
#> See $.data for the source Arrow object
collect(data2)
#> # A tibble: 1 × 1
#>       N
#>   <int>
#> 1    10

1. But this does not work
   data1<-open_dataset(dir)%>%
       summarise(N=dplyr::n())
#> Error: Error : Expression dplyr::n() not supported in Arrow
#> Call collect() first to pull data into R.
data1
#> Error in eval(expr, envir, enclos): object 'data1' not found

Created on 2022-05-13 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/Los_Angeles
#>  date     2022-05-13
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     \* version date (UTC) lib source
#>  arrow       \* 8.0.0   2022-05-09 [1] CRAN (R 4.2.0)
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.2.0)
#>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.2.0)
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr       \* 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#
#>  [1] C:/Users/sbashevkin/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.0/library
#
#> ──────────────────────────────────────────────────────────────────────────────

Reporter: Sam Bashevkin

Related issues:

Note: This issue was originally created as ARROW-16577. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Jonathan Keane / @jonkeane:
Thanks for the report! We don't currently support calling functions with the package namespace attached — though it is something we are thinking about + something we plan to support (see ARROW-14575 for some discussion and possible approaches). We don't have a timeline for this, but it helps knowing that someone is looking for it!

If you don't mind, I'm going to close this issue, but please to feel free to continue the discussion on ARROW-14575

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant