s3rocket uploads and downloads large volumes of streaming data to/from amazon s3 very quickly. It is excellent for uploading large directory trees or data that streams out of a process.
You don't need to know the size of the data in advance in order to upload it. The data is split among s3 objects of a fixed size, and is uploaded on the fly in multiple threads.
It is overkill for small files. If you're not transferring at least several gigabytes of data in each go, this is probably not for you. This tool was designed to upload or download ~50gb of data spread across ~100,000 files to or from an EC2 instance in 10 minutes or less.
s3rocket is a victim of the GNU GPLv3 virus. See the file COPYING for more information.
libs3 >= 2.0
You can get it at http://libs3.ischo.com.s3.amazonaws.com/index.html
I recommend building libs3 from source.
Make sure you have header files installed for libcurl. On some systems, you will need to install a separate "-dev" or "-devel" package. If you have a 'curl-config' command, you're probably fine.
gnutls, gcrypt (linux only)
libcurl has an unfortunate tendency to depend on gnutls and gcrypt on some linux system. Even worse, gnutls blows up in multi-threaded apps unless you initialize it very carefully.
The s3rocket makefile assumes that this initialization is needed on linux, since popular distributions seem to require it.
As a result, you'll need dev packages for gnutls and gcrypt as well.
If you want to disable the gnutls/gcrypt initialization, perhaps because you know that your libcurl uses an alternative SSL library or because you don't intend to use SSL, you can do so by editing the makefile.
$ make && sudo make install
Usage: s3rocket [OPTION]... put FILE S3BUCKET[:PREFIX] copy a file or stream to S3BUCKET. Use '-' to read data from stdin or: s3rocket [OPTION]... get S3BUCKET[:PREFIX] FILE copy data uploaded using s3rocket from S3BUCKET to FILE. Use '-' to place data on stdout. S3BUCKET[:PREFIX] addresses a single unit of data uploaded by s3rocket S3BUCKET is the bucket name on S3. PREFIX is an optional prefix that should be prepended to object names in that bucket. Environment: S3_ACCESS_KEY_ID : S3 access key ID (required) S3_SECRET_ACCESS_KEY : S3 secret access key (required) S3_HOSTNAME : specify alternative S3 host (optional) Options: -j [N], --jobs=[N] use up to N threads (default=10) -b [SIZE], --block-size=[SIZE] set the block size for uploads (default=64M) -v, --verbose print verbose log information -u, --unencrypted use HTTP instead of HTTPS --help display help and exit Performance Notes: The performance of s3rocket depends on large in-memory buffers to smooth out out inconsistent performance of transfers to and from S3 without causing extra disk I/O. Plan to make available (block-size * jobs) bytes of memory for the duration of the transfer. On fast EC2 machines with 10gbe connections, unencrypted, jobs=20, block-size=64M seems to give the best throughput. In most cases, the bottleneck is going to be the disk that you are reading from or writing to, or the process consuming/producing the data
Upload a directory tree to s3
This example disables https, splits the file into 64mb blocks, and uploads
in 20 threads. Objects are placed in
tar cf - my_directory | s3rocket -u -j 20 -b 64M put - test:foo
Download a directory tree from s3
This example disables https and downloads in 20 threads.
s3rocket -u -j 20 get test:foo - | tar xf -