New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-18487: [R] Read Text (CSV/JSON) from character vector #33968
Conversation
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
|
r/R/json.R
Outdated
#' { "hello": 3.25, "world": null } | ||
#' { "hello": 0.0, "world": true, "yo": null } | ||
#' ', tf, useBytes = TRUE) | ||
#' json <- ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lintr seems to complain about this line... (r-lib/lintr#1908)
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Thanks for making these changes. A couple of suggestions, and a question which may warrant a comment addition, but otherwise this looks good to me. |
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Benchmark runs are scheduled for baseline = 32c7130 and contender = 0074a66. 0074a66 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
### Rationale for this change Make small improvements that I noticed while reading the documentation. ### What changes are included in this PR? - Fix typo of inline code (``` `format = "parquet"`` ``` to `` `format = "parquet"` ``) - Link `FileFormat$create()` to `FileFormat` - Updating Rd file I forgot to update in #33968 ### Are these changes tested? Only the documentation is changed, and the generated Rd files are included in this PR. ### Are there any user-facing changes? No. Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
Rationale for this change
Allows literal strings to be read directly through the
I()
function in the same way as thereadr::read_csv()
function.This is useful for checking behavior without the need to create temporary files.
What changes are included in this PR?
In
read_csv_arrow
andread_json_arrow
, if the first argumentfile
inheritsAsIs
class,file
is now interpreted as literal data.This is consistent with the behavior of
readr::read_csv()
, which is widely used to read text files as data frames.This is a breaking change; the behavior of wrapping a path as a string with
I()
is changed.For example
readr::read_csv
arrow 10.01's arrow::read_csv_arrow
This PR's arrow::read_csv_arrow