Deprecated / Unsupported

This repository is no longer under active development or support. It has been deprecated in favor of https://github.com/chanzuckerberg/s3parcp.

S3MI

Transfer big files fast between S3 and EC2.

Pronounced semi.

INSTALLATION

pip install 'git+git://github.com/chanzuckerberg/s3mi.git'

COMMANDS

`s3mi cp s3://huge_file destination`

fast 2GB/sec download from S3
may be constrained by destination write bandwidth
- when the destination is an EBS gp2 volume, write bandwidth is only 250 MB/sec
- RAID can increase that to 1.75GB/sec on select instance types [1]
- suspected Linux kernel write limit 1.4 GB/sec [4]

`s3mi cat s3://huge_file | some_command`

use cases
- expand uncompressed or lz4-compressed archives
  
  s3mi cat s3://gigabytes.tar | tar xf -
- stream through a really fast computation
  
  s3mi cat s3://gigabytes_of_text | wc -l
in AWS, use lz4 instead of gzip or bzip2 [5]

`s3mi raid array-name [number-of-slices] [slice-size]`

Use RAID 0 over EBS volumes to overcome destination bandwidth limits.

Example:

s3mi raid my_raid 3 668

Creates 3 x 668 GB EBS gp2 volumes, RAIDs those together, and mounts on /mnt/my_raid. The my_raid identifier must be unique across all your instances, and its slices will be named my_raid_3_{0..2}.
Lifecycle:

Depending on the value of the instance DeleteOnTermination attribute the RAID volumes may be deleted when the instance terminates, or may persist and remain available to be mounted again under the special device /dev/md127 on another instance.
Optimal RAID configuration:

The ideal number-of-slices is the per-instance EBS bandwidth limit [1] divided by the per-volume EBS bandwidth limit [2].

The slice-size must be large enough for the full per-volume bandwidth to remain available even after the volume's initial credits have been exhausted [2].

In Dec 2018, the ideal settings for a c5.9xlarge with gp2 EBS are
```
* number-of-slices = 3

* slice-size >= 334 GB   (increase this if you need more IOPS or space)
```

`s3mi raid array-name <block_device> <block_device> ...`

Use RAID 0 with instance NVME devices. For example,

s3mi raid my_raid /dev/nvme{1..8}n1

will create RAID 0 from the 8 slices /dev/nvme{1..8}n1, and mount it on /mnt/my_raid.

In 2018, this is especially useful on i3.metal instances.

To see what devices exist on the instance, try lsblk.

Any pre-existing data on the devices will be lost.

`s3mi tweak-vm`

Configure VM parameters to delay the onset of synchronous (slow) I/O. This helps write operations complete faster through more aggressive caching.

REFERENCES

Per-instance EBS bandwidth limits http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-config.html
Per-volume EBS bandwidth limits http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
Better Linux Disk Caching & Performance With vm.dirty_ratio & vm.dirty_background_ratio https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
Toward Less Annoying Background Writeback https://lwn.net/Articles/682582/
Faster compression algorithms with lower compression ratio, like lz4, perform far better overall in AWS compared to gzip and bzip2, due to the high availability of S3 bandwidth to each compute node. Compare and contrast to a university compute cluster where filer bandwith per CPU may be severely constrained, making bzip2 more appropriate for that circumstance. For this reason, s3mi isn't of any value with bzip2 or gzip data, but substantially accelerates work with lz4 or uncompressed data in the AWS elastic compute cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
scripts		scripts
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

.flake8

.flake8

.gitignore

.gitignore

.pylintrc

.pylintrc

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

Deprecated / Unsupported

S3MI

INSTALLATION

COMMANDS

`s3mi cp s3://huge_file destination`

`s3mi cat s3://huge_file | some_command`

`s3mi raid array-name [number-of-slices] [slice-size]`

`s3mi raid array-name <block_device> <block_device> ...`

`s3mi tweak-vm`

REFERENCES

About

Releases

Packages

Contributors 2

Languages

License

chanzuckerberg/s3mi

Folders and files

Latest commit

History

Repository files navigation

Deprecated / Unsupported

S3MI

INSTALLATION

COMMANDS

s3mi cp s3://huge_file destination

s3mi cat s3://huge_file | some_command

s3mi raid array-name [number-of-slices] [slice-size]

s3mi raid array-name <block_device> <block_device> ...

s3mi tweak-vm

REFERENCES

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

`s3mi cp s3://huge_file destination`

`s3mi cat s3://huge_file | some_command`

`s3mi raid array-name [number-of-slices] [slice-size]`

`s3mi raid array-name <block_device> <block_device> ...`

`s3mi tweak-vm`