You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Listing files on S3 is slow due to the recursive nature of the algorithm.
The following change modifies the behavior of the S3Result to include all objects but no "grouping" (directories). This lower dramatically the number of HTTP calls.
If you issue a list request with a delimiter, you can browse your hierarchy at only one level, skipping over and summarizing the (possibly millions of) keys nested at deeper levels.
My conclusion is that the delimiter's purpose is to reduce the number of files returned when you do not need to retrieve all the files. If we are doing a recursive listing then I think it is consistent with other projects and S3's intentions that we do not specify the delimiter.
Listing files on S3 is slow due to the recursive nature of the algorithm.
The following change modifies the behavior of the S3Result to include all objects but no "grouping" (directories). This lower dramatically the number of HTTP calls.
The suggested change is to add an option to Selector, e.g.
no_directory_result
or something like this.Reporter: Francois Saint-Jacques / @fsaintjacques
Related issues:
Note: This issue was originally created as ARROW-8884. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: