Skip to content

andyil/s3shutil

Repository files navigation

Pythonic copy, move and sync for Amazon S3

Unittests License Downloads Language PyVersions

s3shutil logo

s3shutil is the easiest to use and fastest way of moving around directories and files in s3.

December 2023: New! sync operation

Sync operation allows you to incrementally copy to destination files that were added to source since the last copy Supports all directions: s3 to s3, s3 to local drive, local drive to s3.

Installation

We recommend installing from the official PyPI repository.

$ pip install s3shutil

Design Principles ---------------* A simple and intuitive string based API. * Symmetric API: download and uploads work equally * Exposes powerful and performant one-liners. * Emulate the well known shutil standard module API. * Use performance boosts behind the scenes (multithreading, batching, server to server operations) * No dependencies except boto3

Similar Projects ---------------aws-cli is amazing if you need to run from the command line or shell script. But does not expose a python API.

s3fs is solid and has amassed a sizeable community. s3fs exposes a object oriented API. s3shutil exposes a simple string oriented API that enables really simple one liners. Another advangtage of s3shutil is that it is symmetric and agnostic to the direction data is moving: with s3shutil it is the same to upload or download or sync files.

You can configure bucket lifecyele rules to mass delete objects, but that is more of a governance feature than a library you can run inside code following your own login and timing.

S3 replication is good to for copying new files, but it will not copy retroactively existing objects.

Using s3shutil

s3shutil uses boto3 internally and we assume you have your credentials set up properly.

Using s3shutil is super easy:

Import is mandatory, no suprises here:

import s3shutil

Then you can do powerful things with simple one liners::

# download a tree from s3
s3shutil.copytree('s3://bucket/my/path', '/home/myuser/files/')

# upload a tree to s3
s3shutil.copytree('/home/users/pics/', 's3://bucket/path/archive/')

# copy between two s3 locations
# same or different bucket
s3shutil.copytree('s3://bucket2/files/someth/', 's3://bucket1/backup/old/')

# delete (recursively) entire prefix
s3shutil.rmtree('s3://bucket/my-files/documents/')

Just released! (December 2023), tree_sync operation:

Only copies files that are missing in the destination. Also deletes extra files.

# sync download
s3shutil.tree_sync('s3://bucket/files/docs/', '/home/myuser/docs')

# sync upload
s3shutil.tree_sync('/home/myuser/files/', 's3://bucket/files/docs-v2/')

# sync two bucket locations
s3shutil.tree_sync('s3://bucket/files/docs/', 's3://bucket2/a/b/c')

Conclusions

s3shutil will notice alone if the location is s3 (starts with s3://) or not All operations have a similar string based API of powerfull one liners

Test Matrix

s3shutil is thoroughly unit tested in all the combinations of:

Python Versions:

  • 3.12
  • 3.11
  • 3.10
  • 3.9
  • 3.8
  • 3.7

And boto3 Versions:

  • 1.33
  • 1.30
  • 1.28
  • 1.27
  • 1.26
  • 1.25
  • 1.24
  • 1.23

Contact me

Just use it! You can send an email as well andyworms@gmail.com. All emails are (eventually) answered. Also read the code, fork, open a PR, start a discussion.