GitHub - awslabs/aws-crt-s3-benchmarks: Benchmarking for multiple AWS S3 libraries.

aws-crt-s3-benchmarks

This project is for benchmarking different S3 workloads using various languages and S3 clients.

This project is under active development and subject to change.

Running Benchmarks

Requirements

To start:
- Python 3.9+ with pip
On Amazon Linux 2023, a script is provided to install further tools. Otherwise, depending on the language you want to benchmark, you'll need:
- CMake 3.22+
- C99 / C++20 compiler (e.g. gcc, clang)
- JDK17+ (e.g. corretto, openjdk)
- Maven
- Python C extension headers and libraries (e.g. python3-devel)

To benchmark ALL the workloads, your machine needs 300+ GiB of disk space available, and fast enough internet to upload a terabyte to S3 within your lifetime. But if you're only running 1 workload, you'll upload fewer files and use less disk space.

Your machine must have AWS credentials, with permission to read and write to an S3 bucket.

Get Started

First, clone this repo.

Then install the requirements listed above. On Amazon Linux 2023, you can simply run this script:

sudo ./aws-crt-s3-benchmarks/scripts/install-tools-AL2023.py

Then, install packages needed by the python scripts:

python3 -m pip install -r aws-crt-s3-benchmarks/scripts/requirements.txt

Prepare S3 Files

Next, run scripts/prep-s3-files.py. This script creates and configures an S3 bucket, put files in S3 for benchmarks to download, and create files on disk for benchmarks to upload:

usage: prep-s3-files.py [-h] --bucket BUCKET --region REGION --files-dir FILES_DIR
                        [--workloads WORKLOADS [WORKLOADS ...]]

Create files (on disk, and in S3 bucket) needed to run the benchmarks

optional arguments:
  -h, --help            show this help message and exit
  --bucket BUCKET       S3 bucket (will be created if necessary)
  --region REGION       AWS region (e.g. us-west-2)
  --files-dir FILES_DIR
                        Root directory for files to upload and download (e.g. ~/files)
  --workloads WORKLOADS [WORKLOADS ...]
                        Path to specific workload.run.json file. If not specified,
                        everything in workloads/ is prepared (uploading
                        100+ GiB to S3 and creating 100+ GiB on disk).

This script can be run repeatedly. It skips unnecessary work (e.g. won't upload a file that already exists).

S3 Clients

Here are the IDs used for various S3 Clients, and the runner you must build to benchmark them:

S3_CLIENT	Actual S3 Client Used	Language	Benchmark Runner
`crt-c`	aws-c-s3	`c`	runners/s3-benchrunner-c
`crt-python`	aws-crt-python	`python`	runners/s3-benchrunner-python
`boto3-crt`	boto3 using CRT	`python`	runners/s3-benchrunner-python
`boto3-classic`	boto3 with pure-python transfer manager	`python`	runners/s3-benchrunner-python
`cli-crt`	AWS CLI v2 using CRT	`python`	runners/s3-benchrunner-python
`cli-classic`	AWS CLI v2 with pure-python transfer manager	`python`	runners/s3-benchrunner-python
`crt-java`	aws-crt-java	`java`	runners/s3-benchrunner-java
`sdk-java-client-crt`	aws-sdk-java-v2 with CRT based S3AsyncClient	`java`	runners/s3-benchrunner-java
`sdk-java-tm-crt`	aws-sdk-java-v2 with CRT based S3TransferManager	`java`	runners/s3-benchrunner-java
`sdk-java-client-classic`	aws-sdk-java-v2 with NON-CRT based S3AsyncClient	`java`	runners/s3-benchrunner-java
`sdk-java-tm-classic`	aws-sdk-java-v2 with NON-CRT based S3TransferManager	`java`	runners/s3-benchrunner-java

Build a Runner

You must build a "runner" for the S3 client you'll be benchmarking. For example, build runners/s3-benchrunner-python to benchmark aws-crt-python, boto3, or AWS CLI.

Run scripts/build-runner.py:

usage: build-runner.py [-h] --lang {c,python,java} --build-dir BUILD_DIR [--branch BRANCH]

Build a runner and its dependencies

optional arguments:
  -h, --help            show this help message and exit
  --lang {c,python,java}
                        Build s3-benchrunner-<lang>
  --build-dir BUILD_DIR
                        Root dir for build artifacts
  --branch BRANCH       Git branch/commit/tag to use when pulling dependencies

The last line of output from build-runner.py displays the RUNNER_CMD you'll need in the next step.

NOTE: Each runner has a README.md with more advanced instructions. build-runner.py isn't meant to handle advanced use cases like tweaking dependencies, iterating locally, DEBUG builds, etc.

Run a Benchmark

All runners have the same command line interface, and expect to be run from the FILES_DIR you passed to the prep-s3-files.py script.

cd FILES_DIR

RUNNER_CMD S3_CLIENT WORKLOAD BUCKET REGION TARGET_THROUGHPUT

S3_CLIENT: ID of S3 client to use (See table above)
RUNNER_CMD: Command to launch runner (e.g. java -jar path/to/runner.jar) This is the last line printed by build-runner.py in the previous step.
WORKLOAD: Path to workload .run.json file (see: workloads/)
BUCKET: S3 bucket name (e.g. my-test-bucket)
REGION: AWS Region (e.g. us-west-2)
TARGET_THROUGHPUT: Target throughput, in gigabits per second. Floating point allowed. Enter the EC2 type's "Network Bandwidth (Gbps)" (e.g. "100.0" for c5n.18xlarge)

Most runners should search for AWS credentials something like this.

If you want to run multiple workloads (or ALL workloads) in one go, use this helper script: run-benchmarks.py.

Authoring New Workloads

See workloads/

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
cdk		cdk
runners		runners
scripts		scripts
workloads		workloads
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

License

awslabs/aws-crt-s3-benchmarks

Folders and files

Latest commit

History

Repository files navigation

aws-crt-s3-benchmarks

Running Benchmarks

Requirements

Get Started

Prepare S3 Files

S3 Clients

Build a Runner

Run a Benchmark

Authoring New Workloads

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages