Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] open_csv_dataset() error if schema supplied and col_names left as TRUE (the default) #34092

Closed
thisisnic opened this issue Feb 9, 2023 · 0 comments · Fixed by #34217
Closed
Assignees
Milestone

Comments

@thisisnic
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

I was putting together an example to diagnose a user error, and got an unexpected error message:

library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
tf <- tempfile()
dodgy_vals <- "x,y\n0,1\n' ', \n3,4"
cat(dodgy_vals)
#> x,y
#> 0,1
#> ' ', 
#> 3,4
writeLines(dodgy_vals, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()))
#> Error in `check_schema()`:
#> ! Values in `column_names` must match `schema` field names
#> ✖ `x` and `y` not present in `column_names`

#> Backtrace:
#>     ▆
#>  1. └─arrow (local) `<fn>`(...)
#>  2.   └─arrow::open_dataset(...)
#>  3.     └─DatasetFactory$create(...)
#>  4.       └─FileFormat$create(match.arg(format), ...)
#>  5.         └─CsvFileFormat$create(schema = schema, ...)
#>  6.           └─arrow:::check_schema(options[["schema"]], options[["read_options"]]$column_names)
#>  7.             └─rlang::abort(...)

This may be due to the fact that the col_names param has a default value of TRUE.

Component(s)

R

@thisisnic thisisnic changed the title [R] bad default in open_csv_dataset() [R] open_csv_dataset() error if schema supplied and col_names left as TRUE (the default) Feb 16, 2023
assignUser pushed a commit that referenced this issue Feb 23, 2023
…es left as TRUE (the default) (#34217)

Before this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> Error in `check_schema()`:
#> ! Values in `column_names` must match `schema` field names
#> ✖ `x` and `y` not present in `column_names`

#> Backtrace:
#>     ▆
#>  1. └─arrow (local) `<fn>`(...)
#>  2.   └─arrow::open_dataset(...)
#>  3.     └─DatasetFactory$create(...)
#>  4.       └─FileFormat$create(match.arg(format), ...)
#>  5.         └─CsvFileFormat$create(schema = schema, ...)
#>  6.           └─arrow:::check_schema(options[["schema"]], options[["read_options"]]$column_names)
#>  7.             └─rlang::abort(...)
```

After this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> FileSystemDataset with 1 csv file
#> x: int64
#> y: int64
```
* Closes: #34092

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
@assignUser assignUser added this to the 12.0.0 milestone Feb 23, 2023
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…ol_names left as TRUE (the default) (apache#34217)

Before this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> Error in `check_schema()`:
#> ! Values in `column_names` must match `schema` field names
#> ✖ `x` and `y` not present in `column_names`

#> Backtrace:
#>     ▆
#>  1. └─arrow (local) `<fn>`(...)
#>  2.   └─arrow::open_dataset(...)
#>  3.     └─DatasetFactory$create(...)
#>  4.       └─FileFormat$create(match.arg(format), ...)
#>  5.         └─CsvFileFormat$create(schema = schema, ...)
#>  6.           └─arrow:::check_schema(options[["schema"]], options[["read_options"]]$column_names)
#>  7.             └─rlang::abort(...)
```

After this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> FileSystemDataset with 1 csv file
#> x: int64
#> y: int64
```
* Closes: apache#34092

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
thisisnic added a commit to thisisnic/arrow that referenced this issue Mar 1, 2023
…ol_names left as TRUE (the default) (apache#34217)

Before this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> Error in `check_schema()`:
#> ! Values in `column_names` must match `schema` field names
#> ✖ `x` and `y` not present in `column_names`

#> Backtrace:
#>     ▆
#>  1. └─arrow (local) `<fn>`(...)
#>  2.   └─arrow::open_dataset(...)
#>  3.     └─DatasetFactory$create(...)
#>  4.       └─FileFormat$create(match.arg(format), ...)
#>  5.         └─CsvFileFormat$create(schema = schema, ...)
#>  6.           └─arrow:::check_schema(options[["schema"]], options[["read_options"]]$column_names)
#>  7.             └─rlang::abort(...)
```

After this PR:

``` r
library(arrow)
tf <- tempfile()
df <- tibble::tibble(x = 1, b = 2)
write_csv_arrow(df, tf)
open_csv_dataset(tf, schema = schema(x = int64(), y = int64()), skip = 1)
#> FileSystemDataset with 1 csv file
#> x: int64
#> y: int64
```
* Closes: apache#34092

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants