# Working with S3 Buckets

`boto3` is an SDK for working with AWS services in Python. This notebook explores ways to perform simple operations with the Amazon S3 cloud storage service. These operations involve creating and deleting S3 buckets and uploading and downloading files from them. There is much more to `boto3` than this, as you can find out in the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html">`boto3` documentation</a>.

This notebook follows loosely some of the examples on <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-creating-buckets.html">this AWS documentation page</a> and on the pages thereafter.

The examples are geared mainly to <a href="https://aws.amazon.com/sagemaker/">SageMaker</a> users, but should be useful more broadly as well. They are meant to be run in a SageMaker Jupyter notebook instance.

## Module Imports

In [1]:
import logging
import boto3
from botocore.exceptions import ClientError

## Creating a Bucket


In [2]:
def create_bucket(bucket_name, region=None):
    """Create an S3 bucket in a specified region
    If a region is not specified, the bucket is created in the S3 default
    region (us-east-1).
    :param bucket_name: Name of the bucket to create
    :param region: String region to create bucket in, e.g., 'us-west-2'
    :return: True if bucket created, else False
    """

    # Create bucket
    try:
        if region is None:
            s3_client = boto3.client('s3')
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            s3_client = boto3.client('s3', region_name=region)
            location = {'LocationConstraint': region}
            s3_client.create_bucket(Bucket=bucket_name,
                                    CreateBucketConfiguration=location)
    except ClientError as e:
        logging.error(e)
        return False
    return True

A bucket-creation routine for SageMaker-accessible buckets and with Duke in its name, to promote consistent bucket naming conventions. We use our AWS region, `us-east-2` (Ohio), as the value for the `region` argument, since our resources are located there.

In [3]:
def create_standard_bucket(suffix):
    bucket_name = 'sagemaker-duke-' + suffix
    return create_bucket(bucket_name, region='us-east-2'), bucket_name

In [4]:
response, full_bucket_name = create_standard_bucket('carlo-test-2')
print(response, full_bucket_name)

True sagemaker-duke-carlo-test-2


## Checking if a Bucket Exists

In [5]:
def bucket_exists(bucket_name, region='us-east-2'):
    s3_client = boto3.client('s3')
    try:
        bucket = s3_client.head_bucket(Bucket=bucket_name)
        region_found = bucket['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
        if region is None:
            return True
        else:
            return region_found == region
    except ClientError:
        return False

Someone actually created a public bucket `yada` in region `east-us-1`, so the command below should return true unless the bucket has been deleted.

In [6]:
print(bucket_exists('yada'))
print(bucket_exists('yada', region=None))
print(bucket_exists('yadazz'))
print(bucket_exists(full_bucket_name))

False
True
False
True


## Listing Buckets

The following code lists all the buckets in the AWS account, and therefore needs the appropriate credentials to do so. You can achieve this by one of the methods described <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html">here</a>.

You can also define an appropriate role to use when creating a notebook instance. To check your current role do the following.

In [7]:
from sagemaker import get_execution_role
role = get_execution_role()
print(role)

arn:aws:iam::513002341673:role/sagemaker


In [8]:
def list_all_buckets():
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    return [bucket['Name'] for bucket in response['Buckets']]

In [9]:
buckets = list_all_buckets()
for bucket in buckets:
    print(bucket)

sagemaker-duke-carlo-test
sagemaker-duke-carlo-test-2
sagemaker-duke-carlo-tutorial
sagemaker-duke-shuzhi
sagemaker-duke-vision
sagemaker-studio-5fixvt9sp9h
sagemaker-studio-fgiy4vsxlzl
sagemaker-studio-i0ft6t3t96c
sagemaker-studio-r0xgnmr9u1n
sagemaker-studio-tn7l6bfsx9
sagemaker-studio-xtha4s3jd3


## Transferring Files to and from Buckets

In [10]:
def upload_file(file_name, bucket_name, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: Name of local file to upload
    :param bucket_name: Name of the bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # Upload the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket_name, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

In [11]:
!date > date.txt
response = upload_file('date.txt', full_bucket_name)
response

True

In [12]:
def download_file(object_name, bucket_name, file_name=None):
    """Download an objet (file) from an S3 bucket

    :param object_name: S3 name of object to download
    :param bucket_name: Name of the bucket to downlaod from
    :param object_name: Name of local file. If not specified then object_name is used
    :return: True if file was downloaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if file_name is None:
        file_name = object_name

    # Download the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.download_file(bucket_name, object_name, file_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

In [13]:
response = download_file('date.txt', full_bucket_name, 'date_copy.txt')
!ls date*

date_copy.txt  date.txt


## Deleting a Bucket

The following code assumes that the bucket has no versioning

In [14]:
def delete_bucket(bucket_name):
    s3 = boto3.resource('s3')
    if bucket_exists(bucket_name):
        bucket = s3.Bucket(bucket_name)
        bucket.objects.all().delete()
        s3_client = boto3.client('s3')
        s3_client.delete_bucket(Bucket=bucket_name)

In [15]:
delete_bucket(full_bucket_name)
print(bucket_exists(full_bucket_name))

False
