![https://pieriantraining.com/](../PTCenteredPurple.png)

In this notebook we are going to learn about the basics of boto3 using the S3 service

The basic approach to tackle any AWS project using boto3 is as follows:
1) Identify if your desired is available (Either by listing all services or just checking [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/index.html))
2) If your service is available you need to add the corresponding policy to your group / account in IAM (we already did that for S3)
3) Check the list of available functions you can use (just click on the service within the boto3 documention. [Here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) for S3)
4) Identify the operations you want to execute and chose the corresponding services

## Difference between Client, Paginators, Waiters and Resources
When browsing along the S3 services, you might notice that there are different ways to achieve your goals

**Client**:<br />
A client in Boto3 represents the connection to a specific AWS service. It provides low-level access to the API operations offered by that service. When you create a Boto3 client for a specific AWS service, you can use it to make direct API requests and manage resources associated with that service. Clients are typically used when you need fine-grained control over the API calls and their parameters.

**Paginators**:(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html)<br />
AWS service APIs often return large amounts of data that can be paginated for easier handling. Paginators in Boto3 are helpers that allow you to iterate over paginated API responses seamlessly. They automatically make subsequent API calls to retrieve additional pages of data as needed, abstracting away the complexity of dealing with pagination manually. Paginators are useful when you're dealing with API operations that return a large number of results.

**Waiters**:(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/clients.html#waiters)<br />
Waiters in Boto3 are used to wait for a certain condition or state to be met in an AWS resource before proceeding with further actions. They help you poll an AWS service API until a specific state is reached. For example, you might use a waiter to wait for a specific Amazon EC2 instance to reach the "running" state after launching it. Waiters are especially helpful in situations where you need to ensure that an operation has completed before moving on.

**Resources**:(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html)<br />
Resources in Boto3 provide a higher-level, more Pythonic interface to interact with AWS services. They abstract away the details of the API calls and provide a more object-oriented way of working with AWS resources. Resources represent AWS entities like instances, S3 buckets, DynamoDB tables, etc. When you work with resources, you can use methods and attributes that make it easier to manage and manipulate these entities without directly dealing with the underlying API calls.

Summary:
1) Client: Low-level API interaction with an AWS service.
2) Paginators: Handle paginated results from AWS service APIs.
3) Waiters: Wait for specific conditions to be met in AWS resources.
4) Resources: High-level, object-oriented interface to interact with AWS resources.

Within this course we will mostly use the **Client** and **Resources** apis

## Uploading a file to Amazon S3
To upload a file to [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html), we at first need to create a bucket that will store the file.

To create a bucket using the client, we can use the **client.create_bucket(**kwargs)** function.
Note that you can pass many arguments to this function (Access contol lists, desired region, Ownership, full list [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/create_bucket.html)), the only required one, however, is the bucket name

In [1]:
import boto3

client = boto3.client('s3')  # First we create the boto3 s3 client
client.create_bucket(Bucket="my-first-bucket")  # call the create_bucket function and pass the bucket name

BucketAlreadyExists: An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.

As you can see, this bucket name already exists. Finding a good bucket name can be a little bit tricky as amazon s3 has a global namespace, which means that no two buckets in the same region can have the same name. Thus you need to use a unique bucket name. So do not copy the exact bucket name below if you are in the same region.

In [2]:
client.create_bucket(Bucket="my-first-bucket-jose")  # If a bucket already exists in your s3, nothing will happen

{'ResponseMetadata': {'RequestId': 'V8RHWXFHN5TGSVHB',
  'HostId': 'LmVeBAG3jHtD0TmOa6YvbyxuIsU7RyP/EMi1PQZJNIeoYJPppfhwMygVbYROtDs7NZ1zfuRogKWvJzhgznRKTlwW4JyxqPfM',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'LmVeBAG3jHtD0TmOa6YvbyxuIsU7RyP/EMi1PQZJNIeoYJPppfhwMygVbYROtDs7NZ1zfuRogKWvJzhgznRKTlwW4JyxqPfM',
   'x-amz-request-id': 'V8RHWXFHN5TGSVHB',
   'date': 'Mon, 07 Aug 2023 12:46:27 GMT',
   'location': '/my-first-bucket-jose',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': '/my-first-bucket-jose'}

We now created a bucket. If you go to the AWS S3 system you should be able to see your new bucket.

### File uploading

To upload a file to this bucket we can use the **upload_file(filename, bucket, file_name_in_bucket)** [Doc](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_file.html)

In [3]:
with open("Test.txt", "w") as f:
    f.write("Hello World!")

In [4]:
client.upload_file(Filename="Test.txt", Bucket="my-first-bucket-jose", Key="Test_in_bucket.txt")

If you browse to your bucket you should see the new file inside

### File downloading
Of course we can also download files from s3!
To do so, let's use the **download_file(bucket, file_name_in_bucket, filename_local)** function

In [5]:
client.download_file(Bucket="my-first-bucket-jose", Key="Test_in_bucket.txt", Filename="Test_local.txt")

In [2]:
with open("Test_local.txt", "r") as f:
    print(f.read())

Hello World!


Great! We just downloaded the file again

## File deletion
To delete a file, use the **delete_object(bucket, file_name_in_bucket)** function

In [3]:
client.delete_object(Bucket="my-first-bucket-jose", Key="Test_in_bucket.txt")

{'ResponseMetadata': {'RequestId': 'NSHMAV68DGW3QZ76',
  'HostId': 'cbFbreBkZ2r2fLWAKQhtoDzl8aI6VRFCyakz8Lm6rE33m+IyrLAUt/Xscj5np1pinTRxsS1vhv8=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'cbFbreBkZ2r2fLWAKQhtoDzl8aI6VRFCyakz8Lm6rE33m+IyrLAUt/Xscj5np1pinTRxsS1vhv8=',
   'x-amz-request-id': 'NSHMAV68DGW3QZ76',
   'date': 'Fri, 11 Aug 2023 17:47:06 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

Let's try to download that file and see what happens

In [4]:
client.download_file(Bucket="my-first-bucket-jose", Key="Test_in_bucket.txt", Filename="Test_local.txt")

ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

## Resources
Let's take a look at some operations we can achieve using resources:

### List all buckets
As we already learnt in the very first notebook, we can list all buckets

In [7]:
s3 = boto3.resource("s3")
list(s3.buckets.all())

[s3.Bucket(name='amazon-course-downloads'),
 s3.Bucket(name='gcp-text-storage'),
 s3.Bucket(name='gcp-text-storage-translation-output'),
 s3.Bucket(name='mp3-output-transcription'),
 s3.Bucket(name='mp3-output-translation'),
 s3.Bucket(name='mp3-polly-output'),
 s3.Bucket(name='mp3-transcription-bucket-chinese'),
 s3.Bucket(name='mp3files-ffmpeg'),
 s3.Bucket(name='mp4-files-for-translation'),
 s3.Bucket(name='mp4-output-bucket-chinese'),
 s3.Bucket(name='mp4-output-new'),
 s3.Bucket(name='mp4-upload-translation'),
 s3.Bucket(name='my-first-bucket-jose'),
 s3.Bucket(name='polly-output-bucket-chinese'),
 s3.Bucket(name='polly-translated-video'),
 s3.Bucket(name='speech-to-speech-test'),
 s3.Bucket(name='test-mp4-upload-chinese'),
 s3.Bucket(name='translated-captions')]

### Create bucket instance
To create an instance of a specific bucket we can use the **Bucket(name)** class

In [8]:
bucket = s3.Bucket("my-first-bucket-jose")

We can now leverage this instance to perform all sorts of operations, including:

### File Uploading

In [9]:
bucket.upload_file(Filename="Test_local.txt", Key="another_file.txt")

### File Listing

In [10]:
for file in bucket.objects.all():
    print(file.key)

Test_in_bucket.txt
another_file.txt
someDirectory/Test.txt


### File Downloading

In [11]:
bucket.download_file(Key="another_file.txt", Filename="another_local_file.txt")

### File Filtering
If you want to grab all files that start with, e.g **Test** you can use the **objects.filter()** function. ([Doc](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/bucket/objects.html#filter))

In [12]:
list(bucket.objects.filter(Prefix="Test"))

[s3.ObjectSummary(bucket_name='my-first-bucket-jose', key='Test_in_bucket.txt')]

### Directory structure
You can also create directories inside your bucket by simply uploading a file and use the desired directory structure as Key:

In [13]:
bucket.upload_file(Filename="Test_local.txt", Key="someDirectory/Test.txt")

In [14]:
for file in bucket.objects.all():
    print(file.key)

Test_in_bucket.txt
another_file.txt
someDirectory/Test.txt


## Paginators
We will end this section by taking a look at the **ListObjects** Paginator

In [15]:
paginator = client.get_paginator("list_objects")
res = paginator.paginate(Bucket=bucket.name)

In [16]:
list(res)

[{'ResponseMetadata': {'RequestId': 'HZSV8C72NM6RP5F8',
   'HostId': 'ewIXNC6oX4UuSTa/TUfYwrb5CDE8l+P1bAYxoVNrffFlgR2LgFfwMmQ1AvWBrWzkTkJ+GR8cAk3ZFkjtNk3JXg==',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'ewIXNC6oX4UuSTa/TUfYwrb5CDE8l+P1bAYxoVNrffFlgR2LgFfwMmQ1AvWBrWzkTkJ+GR8cAk3ZFkjtNk3JXg==',
    'x-amz-request-id': 'HZSV8C72NM6RP5F8',
    'date': 'Mon, 07 Aug 2023 12:46:36 GMT',
    'x-amz-bucket-region': 'us-east-1',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3'},
   'RetryAttempts': 0},
  'IsTruncated': False,
  'Marker': '',
  'Contents': [{'Key': 'Test_in_bucket.txt',
    'LastModified': datetime.datetime(2023, 8, 7, 12, 46, 28, tzinfo=tzutc()),
    'ETag': '"ed076287532e86365e841e92bfc50d8c"',
    'Size': 12,
    'StorageClass': 'STANDARD',
    'Owner': {'DisplayName': 'jose.portilla',
     'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0'}},
   {'Key': 'another_file.txt',
    'LastModi

You can also search for all keys in this list

In [17]:
for item in res.search("Contents"):
    print(item["Key"], item["Owner"])

Test_in_bucket.txt {'DisplayName': 'jose.portilla', 'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0'}
another_file.txt {'DisplayName': 'jose.portilla', 'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0'}
someDirectory/Test.txt {'DisplayName': 'jose.portilla', 'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0'}


## Advanced

We can also inspect more advanced properties, like the encryption or the ACL of a bucket

In [24]:
client.get_bucket_encryption(Bucket="my-first-bucket-jose")

{'ResponseMetadata': {'RequestId': 'NK6BX5PDF6SJS5BT',
  'HostId': 'KgwclOe3BYHMPuhbaVvVXhW+v/ppu5kRraM2LV6hdR4eFoSyoPeya079L9KeggNPlYeJc5c68K4=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'KgwclOe3BYHMPuhbaVvVXhW+v/ppu5kRraM2LV6hdR4eFoSyoPeya079L9KeggNPlYeJc5c68K4=',
   'x-amz-request-id': 'NK6BX5PDF6SJS5BT',
   'date': 'Mon, 07 Aug 2023 12:51:01 GMT',
   'transfer-encoding': 'chunked',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ServerSideEncryptionConfiguration': {'Rules': [{'ApplyServerSideEncryptionByDefault': {'SSEAlgorithm': 'AES256'},
    'BucketKeyEnabled': False}]}}

In [25]:
client.get_bucket_acl(Bucket="my-first-bucket-jose")

{'ResponseMetadata': {'RequestId': '3CJMRC7CJAN2W1SW',
  'HostId': 'BXesrTqCTcFZUcgROWhFBWiZ30f44CZjQh4iYpm97ClEZDqtsjNHxU1KO+UF9P8etbI43eFKtSo=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'BXesrTqCTcFZUcgROWhFBWiZ30f44CZjQh4iYpm97ClEZDqtsjNHxU1KO+UF9P8etbI43eFKtSo=',
   'x-amz-request-id': '3CJMRC7CJAN2W1SW',
   'date': 'Mon, 07 Aug 2023 12:51:29 GMT',
   'content-type': 'application/xml',
   'transfer-encoding': 'chunked',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'Owner': {'DisplayName': 'jose.portilla',
  'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0'},
 'Grants': [{'Grantee': {'DisplayName': 'jose.portilla',
    'ID': 'c3713249811e2dca12591ee4163e8b9034512caa92f633e902824359f05334f0',
    'Type': 'CanonicalUser'},
   'Permission': 'FULL_CONTROL'}]}

### Lifecycle configuration
We can also add a [lifecycle configuration] (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/put_bucket_lifecycle_configuration.html) to our bucket.
Note that these configurations can become very large.
For our case, let us simply add an expiration date in 180 days.

The required parameters for LifecycleConfiguration consist of the rules list, the expiration dictionary, a prefix (can be empty), and the status.

[All Rules](https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html)

In [39]:
import datetime
client.put_bucket_lifecycle_configuration(Bucket="my-first-bucket-jose",
                                         LifecycleConfiguration={
                                            'Rules': [
                                                {
                                                    'Expiration': {
                                                        'Days': 180,
                                                    },
                                                    'Prefix': '',
                                                    'Status': 'Enabled',
                                                }
                                            ]
                                        }
                                    )

{'ResponseMetadata': {'RequestId': '1P3YKXQE9QXQE0E6',
  'HostId': 'lU8vKjAh73OLaMqmnwRvFrNwdSXJwmEiOVeYzahCyPVg/CWza470rEM98Hs33bazxmFfNAv3OD4=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'lU8vKjAh73OLaMqmnwRvFrNwdSXJwmEiOVeYzahCyPVg/CWza470rEM98Hs33bazxmFfNAv3OD4=',
   'x-amz-request-id': '1P3YKXQE9QXQE0E6',
   'date': 'Mon, 07 Aug 2023 13:06:17 GMT',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0}}

In [40]:
client.get_bucket_lifecycle_configuration(Bucket="my-first-bucket-jose")

{'ResponseMetadata': {'RequestId': '07B385ZXMQ1KFAZJ',
  'HostId': 'RMZdWIB2tHo/q4gzbXd/7OF2STRCYcNLFLLlRm87bb1x7C2yMBT5R6NhQ/rwIYoOPvO/XR5XjBc=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'RMZdWIB2tHo/q4gzbXd/7OF2STRCYcNLFLLlRm87bb1x7C2yMBT5R6NhQ/rwIYoOPvO/XR5XjBc=',
   'x-amz-request-id': '07B385ZXMQ1KFAZJ',
   'date': 'Mon, 07 Aug 2023 13:06:37 GMT',
   'server': 'AmazonS3',
   'content-length': '288'},
  'RetryAttempts': 0},
 'Rules': [{'Expiration': {'Days': 180},
   'ID': 'NGU0NThlNGQtNmJlZS00NTA5LWE5YjEtYTczMDc3NGQ2ZDZj',
   'Prefix': '',
   'Status': 'Enabled'}]}

You can also inspect the rule by browsing to your S3 console, selecting your bucket and click on **Management**

## Sharing your files using presigned URL's
By default, your buckets are private and nobody is able to access your files.
If you want to publicly share your files, you can generate a [presigned URL](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html)

Note, that not all client functions are implemented (e.g download_file is not implemented), however the most important one, such as get_object to download a file, works!

In [42]:
url = client.generate_presigned_url(ClientMethod="get_object",  # Name of the boto s3 function
                                    Params={"Bucket": "my-first-bucket-jose",
                                            "Key": "Test_in_bucket.txt"},
                                    ExpiresIn=1800) # expires in 1800 seconds

In [43]:
url

'https://my-first-bucket-jose.s3.amazonaws.com/Test_in_bucket.txt?AWSAccessKeyId=AKIAW4HPMF34XSD4CG63&Signature=QqJD78%2Fkqsof6knKAOsdYHLYVbA%3D&Expires=1691416362'

You can now share this url with anybody you like for the next 1800 seconds

## Bucket deletion
Last but not least, let us delete our bucket using **delete_bucket(bucket_name)**

In [5]:
client.delete_bucket(Bucket="my-first-bucket-jose")

ClientError: An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: The bucket you tried to delete is not empty

As you can see, we can only delete a bucket if it is empty!
To delete all objects in a bucket we can use the following operation:

In [9]:
s3 = boto3.resource("s3")
bucket = s3.Bucket("my-first-bucket-jose")
bucket.objects.all().delete()

[{'ResponseMetadata': {'RequestId': 'SAQ3BX3DWZYW54S6',
   'HostId': 'jaljsqLRE6HprtEr4vU7XlZG1P570o266d1GRCna9oXVDwpbZ40I8JOzrechPtm30MuvBeDXx6k=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'jaljsqLRE6HprtEr4vU7XlZG1P570o266d1GRCna9oXVDwpbZ40I8JOzrechPtm30MuvBeDXx6k=',
    'x-amz-request-id': 'SAQ3BX3DWZYW54S6',
    'date': 'Fri, 11 Aug 2023 17:52:29 GMT',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3',
    'connection': 'close'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'another_file.txt'}, {'Key': 'someDirectory/Test.txt'}]}]

In [10]:
client.delete_bucket(Bucket="my-first-bucket-jose")

{'ResponseMetadata': {'RequestId': 'X2BJX6RAJN49738X',
  'HostId': 'bD6rZcsu7Lfu9MOklBT6MB7/NHJv+G3Qf7VHLxIm5l2P6eN0dty22xtp9U1QLXbmJYacvisdbHU=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'bD6rZcsu7Lfu9MOklBT6MB7/NHJv+G3Qf7VHLxIm5l2P6eN0dty22xtp9U1QLXbmJYacvisdbHU=',
   'x-amz-request-id': 'X2BJX6RAJN49738X',
   'date': 'Fri, 11 Aug 2023 17:52:33 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}