-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Expose FileSystemFactoryOptions #30771
Comments
Dewey Dunnington / @paleolimbot: Where this lives in Python: arrow/python/pyarrow/parquet.py Lines 1181 to 1186 in ad073b7
We don' have something similar to this approach in R, but it looks like these are exposed in the C++ API as well: arrow/cpp/src/arrow/dataset/discovery.h Lines 146 to 195 in f8661e0
...which we could wire up to R here: Lines 171 to 172 in 03219e2
and here: Line 181 in 03219e2
|
Neal Richardson / @nealrichardson:
|
Joris Van den Bossche / @jorisvandenbossche:
Yes, that's correct (so this is actually also an issue for python)
I was thinking exactly the same. Based on the python snippet from the legacy code, it is clear that it's ignoring some files both based on prefixes and suffixes. It might be possible to have some more advanced / smarter (callback based?) filename filter option, but I suppose that a simpler prefix+suffix ignore options will cover almost all use cases? |
Neal Richardson / @nealrichardson: |
ARROW-4406 notes that:
This was fixed in the pyarrow but R still has this issue.
The R side does not seem to have something similar to:
def _should_silently_exclude(self, file_name):
return (file_name.endswith('.crc') or # Checksums
file_name.endswith('_$folder$') or # HDFS directories in S3
file_name.startswith('.') or # Hidden files starting with .
file_name.startswith('_') or # Hidden files starting with _
file_name in EXCLUDED_PARQUET_PATHS)
Reporter: Bob Rudis
Assignee: Neal Richardson / @nealrichardson
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-15280. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: