Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Better error handling for DatasetFactory$Finish() when no format specified #28529

Closed
asfimport opened this issue May 14, 2021 · 3 comments
Closed

Comments

@asfimport
Copy link

When I call the following code:

 

tf <- tempfile()
dir.create(tf)
on.exit(unlink(tf))
write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))

I get the following error: 

 Error: IOError: Could not open parquet input source '/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

However, in the documentation for open_dataset(), there is nothing saying that the input source cannot be a CSV or must be a Parquet file.  

I think this is due to calling DataSetFactory$Finish() when schema is NULL and input files have no inherent schema (i.e. are CSVs).

Reporter: Nicola Crane / @thisisnic
Assignee: Nicola Crane / @thisisnic

PRs and other links:

Note: This issue was originally created as ARROW-12791. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Judging from the error message, it's because "parquet" is the default file format, and those aren't Parquet files. https://arrow.apache.org/docs/r/reference/dataset_factory.html

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
You could try to catch the "Parquet magic bytes not found in footer" error message inside open_dataset() and return a different/helpful message like "Looks like these are not parquet files, did you mean to specify a 'format'?" or something.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Issue resolved by pull request 10326
#10326

@asfimport asfimport added this to the 5.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants