# Learn S3 transfer configuration with Syne Tune
In this repo, we show how to tune and learn a good configuration for Boto3 `download_file` function, using Bayesian optimization from the open-source Syne Tune library https://github.com/awslabs/syne-tune.
This notebook was developed from the `conda_python3` kernel of an ml.m5d.12xlarge SageMaker Notebook instance. However, you can test this anywhere you can (1) have a connectivity and permissions to use Amazon S3 and (2) install the associated dependencies. **You do NOT need Jupyter notebooks nor Amazon SageMaker in order to use Syne Tune**. They are used here as convenience to support the example.

In [None]:
# Install dependencies
! pip install -r requirements.txt

# Make a big file
In this example we tune the S3 download using a random file

In [None]:
file_name = 'random_file.txt'
file_path = '/home/ec2-user/SageMaker/'

In [None]:
! fallocate -l 10GiB $file_name

In [None]:
# we will tune download from this bucket
bucket = '<enter an S3 bucket here>'

In [None]:
%%time

# send the random file to the S3 bucket
! aws s3 cp $file_name "s3://$bucket/$file_name"

# Test default download speed

In [None]:
from pathlib import Path
import tempfile
import time


import boto3


s3 = boto3.resource('s3')

durations = []
for _ in range(3):
    
    # write with tempfile to avoid overwrites

    
    with tempfile.TemporaryDirectory(dir=file_path) as local_path:
        
        t1 = time.time()
        s3.Object(bucket_name=bucket, key=file_name).download_file(
            Filename=str(Path(local_path) / file_name)
        )
        duration = time.time() - t1
    
    print(duration)
    durations.append(duration)

print(f"avg: {sum(durations)/len(durations)}")

# Run the tuner locally

In [None]:
%%time

! python launcher.py \
    --bucket $bucket \
    --key $file_name \
    --file_path $file_path \
    --file_name $file_name \
    --init boto3_defaults \
    --n_downloads 3 \
    --search bayes_fifo \
    --max_tuning_time 12000

# Analyze results
While the tuner is running, you can go to the notebook `Evaluation.ipynb` to check tuning progression in real-time

# Test best config
You can test below a specific Transfer config. Below is an example good config found by Syne Tune on a 5GiB file

In [None]:
from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    max_io_queue=100,
    max_concurrency=317,
    io_chunksize=7271130,
    multipart_chunksize=864349587,
)

durations = []
for _ in range(5):

    with tempfile.TemporaryDirectory(dir=file_path) as local_path:
        
        t1 = time.time()
        s3.Object(bucket_name=bucket, key=file_name).download_file(
            Filename=str(Path(local_path) / file_name),
            Config=config
        )
        duration = time.time() - t1
    
    print(duration)
    durations.append(duration)
    
print(f"avg: {sum(durations)/len(durations)}")