## S3 Data Transfer

This notebook will demonstrate uploading and downloading from an S3 store using hub temporary S3 session credentials.

## Credentials

To get S3 session credentials for your workspace bucket:
1. Visit https://eodatahub.org.uk/workspaces/
2. Ensure correct workspace is selected (icons on far left)
3. Select "Credentials" from the side bar
4. Select the S3 Token tab
5. Click "Request Temporary AWS S3 Credentials"

You will receive credentials in the form:
```
Access Key ID: ASIAT...
Secret Access Key: yADQk...
Session Token: IQoJb...
Expiration: 2025-04-09T11:39:10Z
```

You will need to copy the Access Key ID, Secret Access Key and Session token into a local enviornment file (.env). The file should be of the form:
```
ACCESS_KEY_ID=ASIAT...
SECRET_ACCESS_KEY=yADQk..
SESSION_TOKEN=IQoJb...
```

You will use this file to load the session credentials into the notebook in the next cell. Update the .env filename as required.

Note that these session credentials only last an hour, by default.

In [None]:
# Install dependencies, if required
%pip install boto3 dotenv

In [None]:
import os
from dotenv import load_dotenv

env_file = ".env"  # update file name as required

# Load session credentials
load_dotenv(env_file)

# Check all required environment variables are present
ACCESS_KEY_ID = os.environ.get("ACCESS_KEY_ID")
SECRET_ACCESS_KEY = os.environ.get("SECRET_ACCESS_KEY")
SESSION_TOKEN = os.environ.get("SESSION_TOKEN")

missing_vars = [k for k, v in {
    "ACCESS_KEY_ID": ACCESS_KEY_ID, 
    "SECRET_ACCESS_KEY": SECRET_ACCESS_KEY, 
    "SESSION_TOKEN": SESSION_TOKEN,
}.items() if not v]

if missing_vars:
    print("The following environment variables are missing:")
    for var in missing_vars:
        print(f" - {var}")
else:
    print("All required environment variables are set.")

In [None]:
import boto3

# Set up the Boto3 client and authenticate with the session credentials.
session = boto3.Session(
    aws_access_key_id=ACCESS_KEY_ID,
    aws_secret_access_key=SECRET_ACCESS_KEY,
    aws_session_token=SESSION_TOKEN,
)
s3 = session.client("s3")

In [None]:
import os
import os.path

# Upload to bucket from local directory

# Configure bucket details
workspace = "my-workspace"
bucket_name = "workspaces-eodhp"  # This is the prod bucket, needs to be updated for environment, e.g. "workspaces-eodhp-staging"
prefix = f"{workspace}"  # you can add subdirectories to the end of this if you wish to upload to a subdirectory of your bucket, e.g. `f"{workspace}/path/to/subdir"`
local_dir = "local/path/to/dir"  # local directory to be uploaded, update as required

for root, dirs, files in os.walk(local_dir):
    for file in files:
        local_path = os.path.join(root, file)
        relative_path = os.path.relpath(local_path, local_dir)
        s3_path = os.path.join(prefix, relative_path).replace("\\", "/")  # replace windows paths with posix

        print(f"Uploading {local_path} to s3://{bucket_name}/{s3_path}")
        try:
            s3.upload_file(local_path, bucket_name, s3_path)
        except Exception as e:
            print(f"Failed to upload {local_path}: {str(e)}")


In [None]:
from pathlib import Path
import os.path

# Download an S3 prefix to local directory

# Configure bucket details
workspace = "my-workspace"
bucket_name = "workspaces-eodhp"  # This is the prod bucket, needs to be updated for environment, e.g. "workspaces-eodhp-staging"
prefix = f"{workspace}"  # you can add subdirectories to the end of this if you wish to download from a subdirectory of your bucket, e.g. `f"{workspace}/path/to/subdir"`
local_dir = "local/path/to/dir"  # local directory to be downloaded to, update as required

# Ensure local directory exists
Path(local_dir).mkdir(parents=True, exist_ok=True)

paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
    for obj in page.get("Contents", []):
        s3_key = obj["Key"]  # Full S3 path
        relative_path = os.path.relpath(s3_key, prefix)  # Path relative to the prefix
        local_path = os.path.join(local_dir, relative_path)  # Local file path

        # Ensure local subdirectories exist
        Path(os.path.dirname(local_path)).mkdir(parents=True, exist_ok=True)

        # Download the file
        print(f"Downloading {s3_key} to {local_path}")
        s3.download_file(bucket_name, s3_key, local_path)
