Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-18487: [R] Read Text (CSV/JSON) from character vector #33968

Merged
merged 6 commits into from Feb 6, 2023

Conversation

eitsupi
Copy link
Contributor

@eitsupi eitsupi commented Feb 1, 2023

Rationale for this change

Allows literal strings to be read directly through the I() function in the same way as the readr::read_csv() function.
This is useful for checking behavior without the need to create temporary files.

> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4

What changes are included in this PR?

In read_csv_arrow and read_json_arrow, if the first argument file inherits AsIs class, file is now interpreted as literal data.
This is consistent with the behavior of readr::read_csv(), which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with I() is changed.

For example

readr::read_csv

> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csvUse `spec()` to retrieve the full column specification for this data.Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names

arrow 10.01's arrow::read_csv_arrow

> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows

This PR's arrow::read_csv_arrow

> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
@github-actions
Copy link

github-actions bot commented Feb 1, 2023

@github-actions
Copy link

github-actions bot commented Feb 1, 2023

⚠️ GitHub issue #18487 has been automatically assigned in GitHub to PR creator.

@eitsupi eitsupi changed the title GH-18487: [R] Read CSV/JSON from character vector GH-18487: [R] Read Text (CSV/JSON) from character vector Feb 1, 2023
r/R/json.R Outdated
#' { "hello": 3.25, "world": null }
#' { "hello": 0.0, "world": true, "yo": null }
#' ', tf, useBytes = TRUE)
#' json <- '
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintr seems to complain about this line... (r-lib/lintr#1908)

Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
r/R/csv.R Show resolved Hide resolved
@thisisnic
Copy link
Member

Thanks for making these changes. A couple of suggestions, and a question which may warrant a comment addition, but otherwise this looks good to me.

eitsupi and others added 4 commits February 3, 2023 13:54
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
@thisisnic thisisnic merged commit 0074a66 into apache:master Feb 6, 2023
@eitsupi eitsupi deleted the r-read-asis branch February 6, 2023 11:06
@ursabot
Copy link

ursabot commented Feb 6, 2023

Benchmark runs are scheduled for baseline = 32c7130 and contender = 0074a66. 0074a66 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.28% ⬆️0.03%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.1% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 0074a661 ec2-t3-xlarge-us-east-2
[Finished] 0074a661 test-mac-arm
[Finished] 0074a661 ursa-i9-9960x
[Finished] 0074a661 ursa-thinkcentre-m75q
[Finished] 32c71306 ec2-t3-xlarge-us-east-2
[Failed] 32c71306 test-mac-arm
[Finished] 32c71306 ursa-i9-9960x
[Finished] 32c71306 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Feb 10, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this pull request Feb 17, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
eitsupi added a commit to eitsupi/arrow that referenced this pull request Feb 24, 2023
Signed-off-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Feb 24, 2023
…e#33968)

### Rationale for this change

Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function.
This is useful for checking behavior without the need to create temporary files.

```r
> read_csv_arrow(I("x,y\n1,2\n3,4"))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c(
        "x,y
        1,2
        3,4"
    )))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

```r
> read_csv_arrow(I(c("x,y", "1,2", "3,4")))
# A tibble: 2 × 2
      x     y
  <int> <int>
1     1     2
2     3     4
```

### What changes are included in this PR?

In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data.
This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames.

This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed.

For example

#### readr::read_csv

```r
> readr::read_csv(I(readr::readr_example("mtcars.csv")))
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 0 × 1
# … with 1 variable:
#   /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr>
# ℹ Use `colnames()` to see all variable names
```
#### arrow 10.01's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
```

#### This PR's arrow::read_csv_arrow

```r
> arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv")))
Error:
! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns
Run `rlang::last_error()` to see where the error occurred.
```
* Closes: apache#18487

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
paleolimbot pushed a commit that referenced this pull request Feb 28, 2023
### Rationale for this change

Make small improvements that I noticed while reading the documentation.

### What changes are included in this PR?

- Fix typo of inline code (``` `format = "parquet"`` ``` to `` `format = "parquet"` ``)
- Link `FileFormat$create()` to `FileFormat`
- Updating Rd file I forgot to update in #33968

### Are these changes tested?

Only the documentation is changed, and the generated Rd files are included in this PR.

### Are there any user-facing changes?

No.

Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Read CSV from character vector
3 participants