---
title: S3 Data Upload and Download
author: GSI Environmental Inc.
date: 2025-11-17
code-fold: false
execute:
  freeze: true
---

This page provides simple examples of how to upload and download Parquet files to and from an AWS S3 bucket using the Boto3 library in Python.

## Parquet Upload to S3

Note that in order to upload files to S3, you need to have AWS credentials configured. You can set up your credentials using the AWS CLI or by setting environment variables. This example assumes you have the necessary permissions to upload files to the specified S3 bucket.

In [None]:
import os
from pathlib import Path

import boto3

# AWS S3 configuration
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET", "phyto-indicator")
AWS_S3_PREFIX = "data/"

DATA_DIR = Path.cwd().parent.parent / "phyto-indicator-data"

def s3_upload_parquet_files(bucket: str = AWS_S3_BUCKET, prefix: str = AWS_S3_PREFIX, local_dir: Path = DATA_DIR) -> None:
    """Upload Parquet files from local directory to S3 bucket.

    Args:
    -----
        bucket (str): Name of the S3 bucket.
        prefix (str): S3 prefix (folder path) to upload files to.
        local_dir (Path): Local directory containing Parquet files.
    """
    s3_client = boto3.client(
        "s3",
        region_name="us-west-2",
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )
    for dataset_dir in DATA_DIR.iterdir():
        if dataset_dir.is_dir():
            dataset_name = dataset_dir.name
            print(f"Uploading dataset {dataset_name} to S3...")
            for root, _, files in os.walk(dataset_dir):
                for file in files:
                    if file.endswith(".parquet"):
                        file_path = Path(root) / file
                        s3_key = f"{AWS_S3_PREFIX}{dataset_name}/{file_path.relative_to(dataset_dir)}"
                        print(f"Uploading {file_path} to s3://{AWS_S3_BUCKET}/{s3_key}...")
                        s3_client.upload_file(
                            Filename=str(file_path),
                            Bucket=AWS_S3_BUCKET,
                            Key=s3_key,
                        )

if __name__ == "__main__":
    s3_upload_parquet_files(
        bucket=AWS_S3_BUCKET,
        prefix=AWS_S3_PREFIX,
        local_dir=DATA_DIR,
    )

## Download from S3

This example demonstrates how to download Parquet files from an S3 bucket using Boto3. The example assumes the bucket you are trying to access is public. If the bucket is private, make sure you have the necessary permissions to access the specified S3 bucket and download files.

In [None]:
import os
import boto3

def s3_download(bucket: str, prefix: str, local_dir: str) -> None:
    """Download files from S3 bucket to local directory.

    Args:
        bucket (str): Name of the S3 bucket.
        prefix (str): S3 prefix (folder path) to download files from.
        local_dir (str): Local directory to save downloaded files.
    """
    s3 = boto3.client("s3", region_name="us-west-2")
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get("Contents", []):
            key = obj["Key"]
            if key.endswith("/"):
                continue
            local_path = os.path.join(local_dir, key)
            os.makedirs(os.path.dirname(local_path), exist_ok=True)
            print(f"Downloading {key} -> {local_path}")
            s3.download_file(bucket, key, local_path)


if __name__ == "__main__":
    s3_download(
        bucket="phyto-indicator",
        prefix="data/",
        local_dir="./local-dir"
    )