Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Read CSV from character vector #18487

Closed
Tracked by #33370
asfimport opened this issue Jan 30, 2021 · 5 comments · Fixed by #33968
Closed
Tracked by #33370

[R] Read CSV from character vector #18487

asfimport opened this issue Jan 30, 2021 · 5 comments · Fixed by #33968
Assignees
Milestone

Comments

@asfimport
Copy link
Collaborator

readr::read_csv() lets you read in data from a character vector, useful for (e.g.) taking the results of a system call and reading it in as a data.frame.

> readr::read_csv(c("a,b", "1,2", "3,4"))
# A tibble: 2 x 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     4

One solution would be similar to ARROW-9235, perhaps, treating it as a textConnection.

Another solution is to write to a tempfile.

Reporter: Neal Richardson / @nealrichardson

Note: This issue was originally created as ARROW-11441. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
cc @thisisnic

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
If you can expose the character vector as an Arrow buffer and then wrap it with arrow::io::BufferReader you should be able to read it with the CSV reader.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Weston's suggestion works:

> x <- c("a,b", "1,2", "3,4")
> b <- buffer(charToRaw(paste(x, collapse = "\n")))
> read_csv_arrow(b)
  a b
1 1 2
2 3 4

read_csv_arrow already can read an arrow::Buffer, we just have to put the character vector into a Buffer. There's surely a more efficient way to do that, but this would work.

@asfimport
Copy link
Collaborator Author

Apache Arrow JIRA Bot:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@eitsupi
Copy link
Contributor

eitsupi commented Jan 19, 2023

Note that this has not worked since readr 2.0.0.

readr::read_csv(c("a,b", "1,2", "3,4"))
#> Error: 'a,b' does not exist in current working directory ('/tmp/Rtmp5bv5aV/reprex-4864c762e4-bared-pika').

Created on 2023-01-19 with reprex v2.0.2

AsIs I() must be used.

readr::read_csv(I(c("a,b", "1,2", "3,4")))
#> Rows: 2 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2
#> 2     3     4

Created on 2023-01-19 with reprex v2.0.2

Perhaps I() could be used here as well to make it do the same?

thisisnic added a commit that referenced this issue Feb 6, 2023
### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: #18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@thisisnic thisisnic added this to the 12.0.0 milestone Feb 6, 2023
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this issue Feb 17, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants