# 2. Upload Datasets into S3 Bucket

Following steps will help you upload the `ADE-Corpus-V2` dataset prepared in [`01_prepare_ade_dataset.ipynb`](./01_prepare_ade_dataset.ipynb) into an S3 bucket.


## Step 1: Create BTP Object Storage Instance

1. Create an [Object Store](https://discovery-center.cloud.sap/serviceCatalog/object-store) instance in SAP BTP.

2. Create a default service key for the created Object Store instance.

   ![Create Service Key for SAP BTP Object Store instance](../docs/images/btp-s3-credentials.png)

3. Clone the [`.env.example`](../.env.example) file to `.env`

   ```bash
   cp .env.example .env
   ```

4. Fill in the values with the credentials in [`./.env`](../.env) from the service key created in the previous step.


## Step 2: Upload Dataset to S3 Bucket

In [1]:
import os
import boto3
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
client_s3 = boto3.client(
    "s3",
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
)

bucket = os.getenv("AWS_S3_BUCKET_NAME")

# directory which acts as a root
dir_evals = "evals"

# directory prefix in S3 bucket
prefix_s3 = "ade-v2"

In [4]:
# list all objects in the bucket
response = client_s3.list_objects_v2(Bucket=bucket)

if 'Contents' in response:
    for obj in response['Contents']:
        print("Found object: %s" % obj['Key'])
else:
    print("No objects found.")

No objects found.


In [None]:
local_directory = os.path.abspath(os.path.join(os.getcwd(), "..", dir_evals))

# enumerate local files recursively
for root, dirs, files in os.walk(local_directory):

    for filename in files:
        # construct the full local path and relative path
        local_path = os.path.join(root, filename)
        relative_path = os.path.relpath(local_path, local_directory)

        # construct the S3 path
        s3_path = os.path.join(prefix_s3, dir_evals, relative_path)

        print('Searching "%s" in "%s"' % (s3_path, bucket))

        try:
            client_s3.head_object(Bucket=bucket, Key=s3_path) # throws an exception if the object does not exist

            # if the object exists, delete it
            try:
                print("Path found on S3! Deleting %s..." % s3_path)
                client_s3.delete_object(Bucket=bucket, Key=s3_path)
            except:
                print("Unable to delete %s..." % s3_path)

        except:
            print("Uploading %s..." % s3_path)
            client_s3.upload_file(local_path, bucket, s3_path)

Searching "ade-v2/evals/testdata/ade-v2-300.json" in "hcp-98f1a85b-214c-414a-8a01-77d06f0c89d8"
Uploading ade-v2/evals/testdata/ade-v2-300.json...
Searching "ade-v2/evals/runs/gemini-2.5-flash.json" in "hcp-98f1a85b-214c-414a-8a01-77d06f0c89d8"
Uploading ade-v2/evals/runs/gemini-2.5-flash.json...
Searching "ade-v2/evals/runs/mistral-large-instruct.json" in "hcp-98f1a85b-214c-414a-8a01-77d06f0c89d8"
Uploading ade-v2/evals/runs/mistral-large-instruct.json...
Searching "ade-v2/evals/runs/anthropic-claude-4-sonnet.json" in "hcp-98f1a85b-214c-414a-8a01-77d06f0c89d8"
Uploading ade-v2/evals/runs/anthropic-claude-4-sonnet.json...
Searching "ade-v2/evals/runs/gpt-4.1.json" in "hcp-98f1a85b-214c-414a-8a01-77d06f0c89d8"
Uploading ade-v2/evals/runs/gpt-4.1.json...
