Skip to content

Allow to pass exact file paths #5167

@bes1002t

Description

@bes1002t

I want to sync just a few hundred specific files from my bucket to my local machine. The bucket contains about 500 000 files.

Issue1: The sync process needs a lot of time, because aws cli checks each file, in bucket and local, for changes
Solution1: Using 'aws s3api list-objects' and jmespath to query all files which have changed since the last 14 days.

Issue2: 'aws s3api list-objects' returns a list of file paths. How could I sync each file path?
Solution2: Add each file path as '--include=<FILE_PATH>' to the 'aws s3 sync' command

Issue3: With a few hundred '--include' arguments the sync command takes a lot of time, because for each include, sync is iterating over all files checking whether one file matches the --include pattern.
Solution3: ???

There are two feature requests I want to address here. Fortunately for one of them there is already a ticket for a few days: #5160

The other feature request would be a command line argument for 'aws s3 sync' that allows just passing paths instead of patterns. It's needed to iterate over all files if you pass a pattern, but if you are sure that you just pass paths, it's not. This would create a huge speed up for sync processes of large buckets with many files, if you just want to sync a few files.

A alternative would be calling the cp command for each file path, appending the file path to the bucket url. But there are two issues using this way:

  1. It is slow as hell, because the whole sync command is executed x times, which includes the connection creation to aws servers.
  2. cp does not check whether the file has to be downloaded because it was changed, it just downloads the file, which makes the script also slower

Metadata

Metadata

Assignees

No one assigned

    Labels

    closed-for-stalenessfeature-requestA feature should be added or improved.p3This is a minor priority issueresponse-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 7 days.s3s3sync

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions