-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Read CSV from character vector #18487
Comments
Antoine Pitrou / @pitrou: |
Weston Pace / @westonpace: |
Neal Richardson / @nealrichardson: > x <- c("a,b", "1,2", "3,4")
> b <- buffer(charToRaw(paste(x, collapse = "\n")))
> read_csv_arrow(b)
a b
1 1 2
2 3 4 read_csv_arrow already can read an arrow::Buffer, we just have to put the character vector into a Buffer. There's surely a more efficient way to do that, but this would work. |
Apache Arrow JIRA Bot: |
Note that this has not worked since readr::read_csv(c("a,b", "1,2", "3,4"))
#> Error: 'a,b' does not exist in current working directory ('/tmp/Rtmp5bv5aV/reprex-4864c762e4-bared-pika'). Created on 2023-01-19 with reprex v2.0.2 AsIs readr::read_csv(I(c("a,b", "1,2", "3,4")))
#> Rows: 2 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 3 4 Created on 2023-01-19 with reprex v2.0.2 Perhaps |
### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: #18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
readr::read_csv()
lets you read in data from a character vector, useful for (e.g.) taking the results of a system call and reading it in as a data.frame.One solution would be similar to ARROW-9235, perhaps, treating it as a textConnection.
Another solution is to write to a tempfile.
Reporter: Neal Richardson / @nealrichardson
Note: This issue was originally created as ARROW-11441. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: