# **S3**


# Opérations de base

https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

To use Boto3, you must first import it and indicate which service or services you’re going to use:

In [2]:
import boto3

# Let's use Amazon S3
s3 = boto3.resource("s3")
display(s3)

s3.ServiceResource()

Now that you have an `s3` resource, you can make send requests to the service. The following code uses the `buckets` collection to print out all bucket names:

In [1]:
import boto3

# Print out bucket names
s3 = boto3.resource("s3")
for bucket in s3.buckets.all():
    print(bucket.name)

pepper-bucket
pepper-labs-fruits


You can also upload and download binary data. For example, the following uploads a new file to S3, assuming that the bucket `my-bucket` already exists:

In [4]:
import os
from pepper.env import get_project_dir
project_dir = get_project_dir()
apple_1_path = os.path.join(project_dir, "data", "im", "sample_300", "Apple Braeburn", "91_100.jpg")
print(apple_1_path)

C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300\Apple Braeburn\91_100.jpg


In [5]:
# Upload a new file
apple_1 = open(apple_1_path, "rb")

In [6]:
s3.Bucket("pepper-labs-fruits").put_object(Key="Apple Braeburn/91_100.jpg", Body=apple_1)

s3.Object(bucket_name='pepper-labs-fruits', key='Apple Braeburn/91_100.jpg')

In [None]:
# Upload a new file
data = open("test.jpg", "rb")
s3.Bucket("pepper-labs-fruits").put_object(Key='test.jpg', Body=data)

# Exemples plus avancés

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html

## [Uploading files](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html)

The AWS SDK for Python provides a pair of methods to upload a file to an S3 bucket.

The `upload_file` method accepts a file name, a bucket name, and an object name. The method handles large files by splitting them into smaller chunks and uploading each chunk in parallel.

In [10]:
import logging
import boto3
from botocore.exceptions import ClientError
import os


def upload_file(file_name, bucket, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: File to upload
    :param bucket: Bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = os.path.basename(file_name)

    # Upload the file
    s3_client = boto3.client("s3")
    try:
        #response =
        s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

In [8]:
import os
from pepper.env import get_project_dir
project_dir = get_project_dir()
apple_2_path = os.path.join(project_dir, "data", "im", "sample_300", "Apple Braeburn", "r_326_100.jpg")
print(apple_2_path)

C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300\Apple Braeburn\r_326_100.jpg


In [11]:
upload_file(apple_2_path, "pepper-labs-fruits", "Apple Braeburn/r_326_100.jpg")

True

The `upload_fileobj` method accepts a readable file-like object. The file object must be opened in binary mode, not text mode.

In [None]:
s3 = boto3.client('s3')
with open("FILE_NAME", "rb") as f:
    s3.upload_fileobj(f, "BUCKET_NAME", "OBJECT_NAME")

The `upload_file` and `upload_fileobj` methods are provided by the S3 `Client`, `Bucket`, and `Object` classes. The method functionality provided by each class is identical. No benefits are gained by calling one class’s method over another’s. Use whichever class is most convenient.

### The ExtraArgs parameter

Both `upload_file` and `upload_fileobj` accept an optional `ExtraArgs` parameter that can be used for various purposes. The list of valid `ExtraArgs` settings is specified in the `ALLOWED_UPLOAD_ARGS` attribute of the `S3Transfer` object at [`boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS).

The following `ExtraArgs` setting specifies metadata to attach to the S3 object.

```python
s3.upload_file(
    'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME',
    ExtraArgs={'Metadata': {'mykey': 'myvalue'}}
)
```

The following `ExtraArgs` setting assigns the canned ACL (access control list) value ‘public-read’ to the S3 object.

```python
s3.upload_file(
    'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME',
    ExtraArgs={'ACL': 'public-read'}
)
```

The `ExtraArgs` parameter can also be used to set custom or multiple ACLs.

```python
s3.upload_file(
    'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME',
    ExtraArgs={
        'GrantRead': 'uri="http://acs.amazonaws.com/groups/global/AllUsers"',
        'GrantFullControl': 'id="01234567890abcdefg"',
    }
)
```

### The Callback parameter

Both `upload_file` and `upload_fileobj` accept an optional `Callback` parameter. The parameter references a class that the Python SDK invokes intermittently during the transfer operation.

Invoking a Python class executes the class’s `__call__` method. For each invocation, the class is passed the number of bytes transferred up to that point. This information can be used to implement a progress monitor.

The following `Callback` setting instructs the Python SDK to create an instance of the ProgressPercentage class. During the upload, the instance’s `__call__` method will be invoked intermittently.

```python
s3.upload_file(
    'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME',
    Callback=ProgressPercentage('FILE_NAME')
)
```

An example implementation of the `ProcessPercentage` class is shown below.

In [12]:
import os
import sys
import threading

class ProgressPercentage(object):

    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        # To simplify, assume this is hooked up to a single filename
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            sys.stdout.write(
                "\r%s  %s / %s  (%.2f%%)" % (
                    self._filename, self._seen_so_far, self._size,
                    percentage))
            sys.stdout.flush()

In [18]:
import os
from pepper.env import get_project_dir
project_dir = get_project_dir()
sample_300_dir = os.path.join(project_dir, "data", "im", "sample_300")
bucket_name = "pepper-labs-fruits"
object_class = "Apple Golden 1"
object_id = "63_100.jpg"
object_name = f"{object_class}/{object_id}"
file_path = os.path.join(sample_300_dir, object_class, object_id)

print(f"bucket_name: {bucket_name}")
print(f"object_name: {object_name}")
print(f"file_path: {file_path}")

s3 = boto3.client("s3")
s3.upload_file(
    file_path, bucket_name, object_name,
    Callback=ProgressPercentage(file_path)
)

bucket_name: pepper-labs-fruits
object_name: Apple Golden 1/63_100.jpg
file_path: C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300\Apple Golden 1\63_100.jpg
C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300\Apple Golden 1\63_100.jpg  5843 / 5843.0  (100.00%)

# [Downloading files](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html)

The methods provided by the AWS SDK for Python to download files are similar to those provided to upload files.

The `download_file` method accepts the names of the bucket and object to download and the filename to save the file to.

```python
import boto3

s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
```

In [23]:
s3 = boto3.client("s3")
dl_file_path = os.path.join(
    project_dir, "tmp",
    object_name.replace("/", "_").replace(" ", "_")
)
print(dl_file_path)
s3.download_file(bucket_name, object_name, dl_file_path)

C:\Users\franc\Projects\pepper_cloud_based_model\tmp\Apple_Golden_1_63_100.jpg


The `download_fileobj` method accepts a writeable file-like object. The file object must be opened in binary mode, not text mode.

```python
s3 = boto3.client('s3')
with open('FILE_NAME', 'wb') as f:
    s3.download_fileobj('BUCKET_NAME', 'OBJECT_NAME', f)
```

Like their upload cousins, the download methods are provided by the S3 `Client`, `Bucket`, and `Object` classes, and each class provides identical functionality. Use whichever class is convenient.

Also like the upload methods, the download methods support the optional `ExtraArgs` and `Callback` parameters.

The list of valid `ExtraArgs` settings for the download methods is specified in the `ALLOWED_DOWNLOAD_ARGS` attribute of the `S3Transfer` object at [`boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS).

The download method’s `Callback` parameter is used for the same purpose as the upload method’s. The upload and download methods can both invoke the same Callback class.

# [File transfer configuration](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3.html#file-transfer-configuration)

* Multipart transfers
* Concurrent transfer operations
* Threads

# [Presigned URLs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#presigned-urls)

# [Bucket policies](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-bucket-policies.html)

# [Access permissions](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-access-permissions.html)

# [Using an Amazon S3 bucket as a static web host](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-static-web-host.html)

# [Bucket CORS configuration](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-configuring-buckets.html)

# [AWS PrivateLink for Amazon S3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-privatelink.html)

# S3 : de la v1 à la v2

```python
if is_s3_path(path):
    s3 = boto3.resource("s3")
    bucket, key = path[5:].split("/", 1)
    try:
        obj = s3.Object(bucket, key)
        obj.put(Body=contents.encode("utf-8"))
    except botocore.exceptions.ClientError as e:
        raise RuntimeError(f"Failed to write file to S3: {e}") from e
```

Récupérer la taille d'une image sur S3 :

In [1]:
from fruits.storage_utils import get_s3_object_size
import os
from pepper.env import get_project_dir

project_dir = get_project_dir()
sample_300_dir = os.path.join(project_dir, "data", "im", "sample_300")
bucket_name = "pepper-labs-fruits"
object_class = "Apple Golden 1"
object_id = "63_100.jpg"
object_name = f"{object_class}/{object_id}"
file_path = os.path.join(sample_300_dir, object_class, object_id)
print(get_s3_object_size(bucket_name, object_name))

5843


Téléverser une image sur S3 :

In [2]:
from fruits.storage_utils import upload_image_object
image_class = "Avocado ripe"
image_id = "181_100"
upload_image_object(sample_300_dir, image_class, image_id)

Avocado ripe/181_100  4026 / 4026.0  (100.00%)

True

Lister les sous-dossiers partant d'un chemin racine.

En local :

In [1]:
import os
from pepper.env import get_project_dir
from fruits.storage_utils import list_subdirs

project_dir = get_project_dir()
sample_300_dir = os.path.join(project_dir, "data", "im", "sample_300")
print(list_subdirs(sample_300_dir))

['Apple Braeburn', 'Apple Crimson Snow', 'Apple Golden 1', 'Apple Golden 2', 'Apple Golden 3', 'Apple Granny Smith', 'Apple Pink Lady', 'Apple Red 1', 'Apple Red 2', 'Apple Red 3', 'Apple Red Delicious', 'Apple Red Yellow 1', 'Apple Red Yellow 2', 'Apricot', 'Avocado', 'Avocado ripe', 'Banana', 'Banana Lady Finger', 'Banana Red', 'Beetroot', 'Blueberry', 'Cactus fruit', 'Cantaloupe 1', 'Cantaloupe 2', 'Carambula', 'Cauliflower', 'Cherry 1', 'Cherry 2', 'Cherry Rainier', 'Cherry Wax Black', 'Cherry Wax Red', 'Cherry Wax Yellow', 'Chestnut', 'Clementine', 'Cocos', 'Corn', 'Corn Husk', 'Cucumber Ripe', 'Cucumber Ripe 2', 'Dates', 'Eggplant', 'Fig', 'Ginger Root', 'Granadilla', 'Grape Blue', 'Grape Pink', 'Grape White', 'Grape White 2', 'Grape White 3', 'Grape White 4', 'Grapefruit Pink', 'Grapefruit White', 'Guava', 'Hazelnut', 'Huckleberry', 'Kaki', 'Kiwi', 'Kohlrabi', 'Kumquats', 'Lemon', 'Lemon Meyer', 'Limes', 'Lychee', 'Mandarine', 'Mango', 'Mango Red', 'Mangostan', 'Maracuja', 'Melo

Sur S3 :

In [2]:
from fruits.storage_utils import list_s3_subdirs
bucket_name = "pepper-labs-fruits"
print(list_s3_subdirs(bucket_name, ""))

['Apple Braeburn', 'Apple Golden 1', 'Avocado ripe', 'Avocado']


Compter le nombre de fichiers par dossier :

En local :

In [1]:
import os
from pepper.env import get_project_dir
from fruits.storage_utils import count_files

project_dir = get_project_dir()
sample_300_dir = os.path.join(project_dir, "data", "im", "sample_300")
print(count_files(sample_300_dir))

({'Apple Braeburn': 2, 'Apple Crimson Snow': 3, 'Apple Golden 1': 2, 'Apple Golden 2': 2, 'Apple Golden 3': 2, 'Apple Granny Smith': 2, 'Apple Pink Lady': 2, 'Apple Red 1': 2, 'Apple Red 2': 2, 'Apple Red 3': 2, 'Apple Red Delicious': 2, 'Apple Red Yellow 1': 3, 'Apple Red Yellow 2': 3, 'Apricot': 2, 'Avocado': 2, 'Avocado ripe': 2, 'Banana': 2, 'Banana Lady Finger': 2, 'Banana Red': 2, 'Beetroot': 2, 'Blueberry': 2, 'Cactus fruit': 2, 'Cantaloupe 1': 2, 'Cantaloupe 2': 2, 'Carambula': 2, 'Cauliflower': 3, 'Cherry 1': 2, 'Cherry 2': 3, 'Cherry Rainier': 3, 'Cherry Wax Black': 3, 'Cherry Wax Red': 2, 'Cherry Wax Yellow': 2, 'Chestnut': 2, 'Clementine': 2, 'Cocos': 2, 'Corn': 3, 'Corn Husk': 3, 'Cucumber Ripe': 3, 'Cucumber Ripe 2': 3, 'Dates': 2, 'Eggplant': 2, 'Fig': 4, 'Ginger Root': 1, 'Granadilla': 2, 'Grape Blue': 4, 'Grape Pink': 2, 'Grape White': 2, 'Grape White 2': 2, 'Grape White 3': 2, 'Grape White 4': 2, 'Grapefruit Pink': 2, 'Grapefruit White': 2, 'Guava': 2, 'Hazelnut': 2, 

Sur S3 :

In [2]:
from fruits.storage_utils import count_s3_objects
bucket_name = "pepper-labs-fruits"
print(count_s3_objects(bucket_name))

({'Apple Braeburn': 2, 'Apple Golden 1': 1, 'Avocado ripe': 1, 'Avocado': 2}, 6)


Test de non régression de la 3ème version de sample_images, version locale :

In [1]:
import os
from pepper.env import get_project_dir
from pepper.utils import create_if_not_exist
from fruits.storage_utils import sample_images
project_dir = get_project_dir()
raw_src_im_dir = os.path.join(project_dir, r"dataset\fruits-360_dataset\Test")
sample_300_im_dir = os.path.join(project_dir, r"data\im\sample_300")
create_if_not_exist(sample_300_im_dir)
target_dist = sample_images(raw_src_im_dir, sample_300_im_dir, 300)
display(sum(target_dist.values()))

300

Test de copie massive vers S3 :

In [2]:
import os
from pepper.env import get_project_dir
from pepper.utils import create_if_not_exist
from fruits.storage_utils import (
    list_subdirs,
    count_files,
    compute_target_dist,
    copy_files_to_s3
)
n_samples = 10
project_dir = get_project_dir()
target_bucket = "pepper-labs-fruits"
root_path = os.path.join(project_dir, r"dataset\fruits-360_dataset\Test")
# Get the list of subdirectories
subdirs = list_subdirs(root_path)
# Count the number of images in each folder
image_counts, n_total = count_files(root_path, subdirs)
# Calculate the number of images to sample from each folder
target_dist = compute_target_dist(subdirs, image_counts, n_total, n_samples)
# Copy to S3
copy_files_to_s3(root_path, target_bucket, subdirs, target_dist)

Nettoyage sécurisé avec une double confirmation :

En local :

In [3]:
import os
from pepper.env import get_project_dir
from fruits.storage_utils import count_files, clean_directory
project_dir = get_project_dir()
root_path = os.path.join(project_dir, "data", "im", "sample_300")
# subdirs = list_subdirs(root_path)
image_counts, n_total = count_files(root_path)
display(n_total)
display(len(image_counts))
clean_directory(root_path)

304

131

The directory C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300
contains 304 files and 131 directories.
The directory C:\Users\franc\Projects\pepper_cloud_based_model\data\im\sample_300 has been successfully deleted.


Sur S3 :

In [1]:
from fruits.storage_utils import count_s3_objects
target_bucket = "pepper-labs-fruits"
count_s3_objects(target_bucket, "A")

({'Apple Braeburn': 2, 'Apple Golden 1': 1, 'Avocado ripe': 1, 'Avocado': 2},
 6)

Suppression d'un "dossier", mais en fait d'une racine représentée par un préfixe.

In [2]:
from fruits.storage_utils import clean_s3_directory

target_bucket = "pepper-labs-fruits"
clean_s3_directory(target_bucket, "Apple Golden 1")

The S3 directory s3://pepper-labs-fruits/Apple Golden 1
contains 1 files and 1 directories.
The S3 directory s3://pepper-labs-fruits/Apple Golden 1 has been successfully deleted.


Suppression de toutes les ressources dont le chemin commence par A (donc dans les dossiers qui commencent pas A)

In [3]:
from fruits.storage_utils import clean_s3_directory

target_bucket = "pepper-labs-fruits"
clean_s3_directory(target_bucket, "A")

The S3 directory s3://pepper-labs-fruits/A
contains 5 files and 3 directories.
The S3 directory s3://pepper-labs-fruits/A has been successfully deleted.


Reset complet du bucket :

In [3]:
from fruits.storage_utils import clean_s3_directory

target_bucket = "pepper-labs-fruits"
clean_s3_directory(target_bucket, "")

The S3 directory s3://pepper-labs-fruits/
contains 378 files and 131 directories.
The S3 directory s3://pepper-labs-fruits/ has been successfully deleted.


Sampling directement vers S3

In [2]:
import os
from pepper.env import get_project_dir
from fruits.storage_utils import sample_images_local_to_s3

n_samples = 300
project_dir = get_project_dir()
bucket_name = "pepper-labs-fruits"
root_path = os.path.join(project_dir, r"dataset\fruits-360_dataset\Test")

sample_images_local_to_s3(root_path, bucket_name, n_samples)

{'Apple Braeburn': 2,
 'Apple Crimson Snow': 2,
 'Apple Golden 1': 2,
 'Apple Golden 2': 2,
 'Apple Golden 3': 2,
 'Apple Granny Smith': 2,
 'Apple Pink Lady': 2,
 'Apple Red 1': 2,
 'Apple Red 2': 2,
 'Apple Red 3': 2,
 'Apple Red Delicious': 2,
 'Apple Red Yellow 1': 2,
 'Apple Red Yellow 2': 3,
 'Apricot': 2,
 'Avocado': 2,
 'Avocado ripe': 2,
 'Banana': 2,
 'Banana Lady Finger': 2,
 'Banana Red': 2,
 'Beetroot': 2,
 'Blueberry': 2,
 'Cactus fruit': 2,
 'Cantaloupe 1': 2,
 'Cantaloupe 2': 2,
 'Carambula': 3,
 'Cauliflower': 3,
 'Cherry 1': 2,
 'Cherry 2': 4,
 'Cherry Rainier': 3,
 'Cherry Wax Black': 2,
 'Cherry Wax Red': 2,
 'Cherry Wax Yellow': 2,
 'Chestnut': 2,
 'Clementine': 2,
 'Cocos': 2,
 'Corn': 2,
 'Corn Husk': 2,
 'Cucumber Ripe': 2,
 'Cucumber Ripe 2': 2,
 'Dates': 3,
 'Eggplant': 2,
 'Fig': 3,
 'Ginger Root': 1,
 'Granadilla': 2,
 'Grape Blue': 5,
 'Grape Pink': 2,
 'Grape White': 2,
 'Grape White 2': 2,
 'Grape White 3': 2,
 'Grape White 4': 2,
 'Grapefruit Pink': 2,
 