-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Different get_file_info behaviour between pyarrow.fs.S3FileSystem and s3fs #36983
Comments
Ok, so here is what the the Python docstring says: def get_file_info(self, paths_or_selector):
"""
Get info for the given files.
[...]
Parameters
----------
paths_or_selector : FileSelector, path-like or list of path-likes
Either a selector object, a path-like object or a list of
path-like objects. The selector's base directory will not be
part of the results, even if it exists. If it doesn't exist,
use `allow_not_found`. Quoting the relevant part above:
So with |
Hmm potentially related issue? #37555 |
The issue with the
I guess we want the behaviour in |
This is tricky to workaround, because a genuine 0-length file could have been created by the user. I would suggest simply ignoring the issue in the tests. |
One thing to note: I believe that "directories" provide the ending |
You're right, that would probably work indeed. |
…fs.S3FileSystem and s3fs (#37768) ### What changes are included in this PR? Update `test_get_file_info_with_selector ` to check that the base directory is not included in the `S3FileSystem` or `s3fs` filesystem. Also remove old comments and flexible checks. * Closes: #36983 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…arrow.fs.S3FileSystem and s3fs (apache#37768) ### What changes are included in this PR? Update `test_get_file_info_with_selector ` to check that the base directory is not included in the `S3FileSystem` or `s3fs` filesystem. Also remove old comments and flexible checks. * Closes: apache#36983 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…arrow.fs.S3FileSystem and s3fs (apache#37768) ### What changes are included in this PR? Update `test_get_file_info_with_selector ` to check that the base directory is not included in the `S3FileSystem` or `s3fs` filesystem. Also remove old comments and flexible checks. * Closes: apache#36983 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…arrow.fs.S3FileSystem and s3fs (apache#37768) ### What changes are included in this PR? Update `test_get_file_info_with_selector ` to check that the base directory is not included in the `S3FileSystem` or `s3fs` filesystem. Also remove old comments and flexible checks. * Closes: apache#36983 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
I need to use s3fs as the filesystem in the
dataset
constructor due to performance considerations raised in #33169. However, when I try to do so I get the following stack trace:This is due to inconsistencies in the result of
get_file_info
from the s3fs filesystem when mapped into aPyFileSystem
vs the pyarrow S3 filesystem.Code to demonstrate:
This 0 length file is created by S3 when the folder is created, and can be seen by calling:
I assume, but cannot find, that somewhere in the pyarrow S3 filesystem code these 0 length files are ignored. The filtering behaviour should be the same for fsspec filesystems.
Component(s)
Python
The text was updated successfully, but these errors were encountered: