## Connecting to S3 with Data Connections and Boto3

This notebook uses the `boto3` library, which is the AWS SDK for Python.
This library is included in some workbench images included with RHOAI, such as `Standard Data Science`.

The data connection injects the parameters required to connect to S3 as environment variables.

1. Import the required packages.

In [2]:
pip install boto3

Collecting boto3
  Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/17/c2/72a92794237b43f64141e156bc3a58bc36d18631f1a614e1e97a48b56447/boto3-1.36.2-py3-none-any.whl.metadata
  Downloading boto3-1.36.2-py3-none-any.whl.metadata (6.6 kB)
Collecting botocore<1.37.0,>=1.36.2 (from boto3)
  Obtaining dependency information for botocore<1.37.0,>=1.36.2 from https://files.pythonhosted.org/packages/0c/fe/c066e8cb069027c12dbcf9066a7a4f3e9d2a31b10c7b174a8455ef1d0f46/botocore-1.36.2-py3-none-any.whl.metadata
  Downloading botocore-1.36.2-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3)
  Obtaining dependency information for jmespath<2.0.0,>=0.7.1 from https://files.pythonhosted.org/packages/31/b4/b9b800c45527aadd64d5b442f9b932b00648617eb5d63d2c7a6587b7cafc/jmespath-1.0.1-py3-none-any.whl.metadata
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from boto3)
  Obtaining depe

In [1]:
import os
import io
import boto3

2. Retrieve the environment variables injected by the data connection.

In [2]:
key_id = os.getenv("AWS_ACCESS_KEY_ID")
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
region = os.getenv("AWS_DEFAULT_REGION")
endpoint = os.getenv("AWS_S3_ENDPOINT")
bucket_name = os.getenv("AWS_S3_BUCKET")

3. View a retrieved value.
Note that the `key_id` matches the value provided in the form when creating the data connection.

> WARNING: Because cell outputs are saved as part of the notebook file, be cautious when printing sensitive information to notebook output.
If you leave sensitive credentials printed in an output cell, then you might accidentally leak this information when the notebook is committed to version control.

In [3]:
key_id

'minio'

4. Use the values retrieved from the data connection to create a connection to the S3 bucket.

In [4]:
s3 = boto3.client(
    "s3",
    region,
    aws_access_key_id=key_id,
    aws_secret_access_key=secret_key,
    endpoint_url=endpoint,
    use_ssl=False
)

5. Use the connection to retrieve the list of available buckets.

In [5]:
response = s3.list_buckets()
response["Buckets"]

[{'Name': 'projects-data-bucket',
  'CreationDate': datetime.datetime(2025, 1, 18, 3, 19, 55, 289000, tzinfo=tzlocal())}]

6. Upload a file to the bucket via the connection.

In [6]:
# create a file-like object containing bytes that represent the "hello world" string
file_obj = io.BytesIO(b"hello world")

# upload the file-like object to the S3 bucket specified in the data connection
# the name of the "file" in S3 is "hello.txt"
s3.upload_fileobj(file_obj, bucket_name, Key="hello.txt")

6. List the contents of the bucket specified in the data connection.

In [7]:
# retrieve the metadata of contents within the bucket
objects = s3.list_objects_v2(Bucket=bucket_name)

# output the name of each object within the bucket
for obj in objects["Contents"]:
    print(obj["Key"])

hello.txt


> NOTE: Optionally, verify the corresponding S3 bucket for the new `hello.txt` object.

7. Download the file from the S3 bucket to a new location.

In [8]:
s3.download_file(bucket_name, "hello.txt", "new_hello.txt")

8. The pane to the left displays a new file called `new_hello.txt`.
Open the file and verify that its contents are `hello world`.

> NOTE: You might need to refresh the file browser by clicking the `Refresh the file browser` button in the file browser pane.
The button displays as a circular arrow.

Return to the course book to finish the exercise.