**Amazon Simple Storage Service (Amazon S3)** is an object storage service that offers scalability, data availability, security, and performance.


Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.


An **Amazon S3 bucket** is a storage location to hold files. S3 files are referred to as **objects**.



**Boto 3** Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.


**Create an Amazon S3 bucket**

The name of an Amazon S3 bucket must be unique across all regions of the AWS platform. The bucket can be located in a specific region to minimize latency or to address regulatory requirements.

In [1]:
import logging
import boto3
from botocore.exceptions import ClientError

# src/data/dotenv_example.py
import os
from dotenv import load_dotenv, find_dotenv

# find .env automagically by walking up directories until it's found
dotenv_path = find_dotenv()

# load up the entries as environment variables 
load_dotenv(dotenv_path, verbose=True)

AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY")
AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY")

BUCKET_NAME = 'invest-bot-data'

In [2]:
s3_client = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, 
                               aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                               region_name='us-east-2',
                               config=boto3.session.Config(signature_version='s3v4'))

# location = {'LocationConstraint': 'us-east-2'}
# s3_client.create_bucket(Bucket='invest-bot-data', 
#                         CreateBucketConfiguration=location)

## Listing Buckets

In [3]:
r = s3_client.list_buckets()
for bucket in r['Buckets']:
    print(bucket['Name'])

invest-bot-data


## Uploading files


In [4]:
s3_client.upload_file(r'flower_image.jpeg', BUCKET_NAME, 'data/flower_image.jpeg')

## Upload as File Object

In [5]:
with open (r'flower_image.jpeg', 'rb') as file:
    s3_client.upload_fileobj(file, BUCKET_NAME, 'data/flower_image.jpeg')

## Extra Args
Both upload_file and upload_fileobj accept an optional ExtraArgs parameter that can be used for various purposes.

Some Important ExtraArgs


In [14]:
# response = s3_client.upload_file(r'flower_image.jpeg', BUCKET_NAME, 'data/flower_image.jpeg', ExtraArgs={'ACL':'public-read'})

## Downloading files

The methods provided by the AWS SDK for Python to download files are similar to those provided to upload files.


The download_file method accepts the names of the bucket and object to download and the filename to save the file to.

In [7]:
s3_client.download_file(BUCKET_NAME, 'data/flower_image.jpeg', 'flower_image_downloaded.jpeg')

In [8]:
with open ('flower_image_downloaded.jpeg', 'wb') as file:
    s3_client.download_fileobj(BUCKET_NAME, 'data/flower_image.jpeg', file)

## File transfer configuration


When uploading, downloading, or copying a file or S3 object, the AWS SDK for Python automatically manages retries and multipart and non-multipart transfers.

The management operations are performed by using reasonable default settings that are well-suited for most scenarios. To handle a special case, the default settings can be configured to meet requirements.

## Multipart transfers

Multipart transfers occur when the file size exceeds the value of the multipart_threshold attribute.


In [9]:
from boto3.s3.transfer import TransferConfig
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)

s3_client.upload_file(r'flower_image.jpeg', BUCKET_NAME, 'data/flower_image.jpeg', Config=config)

## Presigned URLs

A user who does not have AWS credentials or permission to access an S3 object can be granted temporary access by using a presigned URL.

A presigned URL is generated by an AWS user who has access to the object. The generated URL is then given to the unauthorized user. The presigned URL can be entered in a browser or used by a program or HTML webpage. The credentials used by the presigned URL are those of the AWS user who generated the URL.

A presigned URL remains valid for a limited period of time which is specified when the URL is generated.

In [10]:
r_pre_signed = s3_client.generate_presigned_url('get_object', 
                                                Params={'Bucket': BUCKET_NAME,
                                                        'Key': 'data/flower_image.jpeg',
                                                        }, 
                                                ExpiresIn=3600)
print(r_pre_signed)

https://invest-bot-data.s3.amazonaws.com/data/flower_image.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5FBAEOTKWVDHLAGS%2F20230825%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230825T171958Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=cc445f202dc1e074ec37a98515ef714a84bf5f8c2a560186f1b402a3cddbcc73


## Bucket policies

An S3 bucket can have an optional policy that grants access permissions to other AWS accounts or AWS Identity and Access Management (IAM) users. Bucket policies are defined using the same JSON format as a resource-based IAM policy.

## Retrieve a Bucket Policy

In [23]:
r = s3_client.get_bucket_policy(Bucket=BUCKET_NAME)
print(r)

{'ResponseMetadata': {'RequestId': 'NS7PAK2R1V3TAHKE', 'HostId': 'wqHrfHiQCASmzPjxzTONbj9mfFdLPGHaELOJJX3PxIB2RcIj7FX+hLzvG7/0no1ejkXZ8xgUUKk=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'wqHrfHiQCASmzPjxzTONbj9mfFdLPGHaELOJJX3PxIB2RcIj7FX+hLzvG7/0no1ejkXZ8xgUUKk=', 'x-amz-request-id': 'NS7PAK2R1V3TAHKE', 'date': 'Fri, 25 Aug 2023 17:41:12 GMT', 'content-type': 'application/json', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'Policy': '{"Version":"2012-10-17","Statement":[{"Sid":"AddPerm","Effect":"Allow","Principal":"*","Action":"s3:GetObject","Resource":"arn:aws:s3:::invest-bot-data/*"}]}'}


## Set a bucket policy

A bucket's policy can be set by calling the put_bucket_policy method.

The policy is defined in the same JSON format as an IAM policy. 



## Policy Format

The **Sid (statement ID)** is an optional identifier that you provide for the policy statement. You can assign a Sid value to each statement in a statement array.

The **Effect** element is required and specifies whether the statement results in an allow or an explicit deny. Valid values for Effect are Allow and Deny.

By default, access to resources is denied. 

Use the **Principal** element in a policy to specify the principal that is allowed or denied access to a resource.

You can specify any of the following principals in a policy:

- AWS account and root user
- IAM users
- Federated users (using web identity or SAML federation)
- IAM roles
- Assumed-role sessions
- AWS services
- Anonymous users


The **Action** element describes the specific action or actions that will be allowed or denied. 

We specify a value using a service namespace as an action prefix (iam, ec2, sqs, sns, s3, etc.) followed by the name of the action to allow or deny.

The **Resource** element specifies the object or objects that the statement covers. We specify a resource using an ARN. Amazon Resource Names (ARNs) uniquely identify AWS resources.

Let's define a policy that enables any user to retrieve any object stored in the bucket identified by the bucket_name variable.

In [None]:
BUCKET_NAME

'invest-bot-data'

In [22]:
import json

policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": f"arn:aws:s3:::{BUCKET_NAME}/*"
        }
    ]
}

policy = json.dumps(policy)
s3_client.put_bucket_policy(Bucket=BUCKET_NAME, Policy=policy)

{'ResponseMetadata': {'RequestId': '35G2MFG15ERT48FM',
  'HostId': 'Ykt30oyk+2laI1AA2IWeLRL4GemdbjKwLimoJk6Av15oi/BhgSo5LRhifr9Lt/jfutRFd5d1HtI=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'Ykt30oyk+2laI1AA2IWeLRL4GemdbjKwLimoJk6Av15oi/BhgSo5LRhifr9Lt/jfutRFd5d1HtI=',
   'x-amz-request-id': '35G2MFG15ERT48FM',
   'date': 'Fri, 25 Aug 2023 17:26:29 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

## Delete a bucket policy


In [24]:
s3_client.delete_bucket_policy(Bucket=BUCKET_NAME)

{'ResponseMetadata': {'RequestId': 'MPWZ0VYEQKJ88YJW',
  'HostId': '5RgHePgTcKKDqVXeBpUFYpsaKw6pOU1kCd05Yk6oqhvOgYGkYpz8b8afvy+28+8DV9bGgps8URSYw+VuE8W61Q==',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': '5RgHePgTcKKDqVXeBpUFYpsaKw6pOU1kCd05Yk6oqhvOgYGkYpz8b8afvy+28+8DV9bGgps8URSYw+VuE8W61Q==',
   'x-amz-request-id': 'MPWZ0VYEQKJ88YJW',
   'date': 'Fri, 25 Aug 2023 17:41:46 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

## CORS Configuration

Cross Origin Resource Sharing (CORS) enables client web applications in one domain to access resources in another domain. An S3 bucket can be configured to enable cross-origin requests. The configuration defines rules that specify the allowed origins, HTTP methods (GET, PUT, etc.), and other elements.

## Retrieve a bucket CORS configuration

Retrieve a bucket's CORS configuration by calling the AWS SDK for Python get_bucket_cors method.


In [29]:
r = s3_client.get_bucket_cors(Bucket=BUCKET_NAME)
print(r)

{'ResponseMetadata': {'RequestId': 'VZYRR9XGRN6RBZ1E', 'HostId': '18poMDIi7+jpobsB53ajWoAWsxpOcsZwZb24phIXcg7UExBXTe/OMrZKuTqCNbPoWgp/LGBn8QNwA5IpLA7b/w==', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '18poMDIi7+jpobsB53ajWoAWsxpOcsZwZb24phIXcg7UExBXTe/OMrZKuTqCNbPoWgp/LGBn8QNwA5IpLA7b/w==', 'x-amz-request-id': 'VZYRR9XGRN6RBZ1E', 'date': 'Fri, 25 Aug 2023 17:51:43 GMT', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'CORSRules': [{'AllowedHeaders': ['Authorization'], 'AllowedMethods': ['GET', 'PUT'], 'AllowedOrigins': ['*'], 'ExposeHeaders': ['GET', 'PUT'], 'MaxAgeSeconds': 3000}]}


## Set Bucket CORS

In [28]:
cors_configuration = {
    'CORSRules': [{
        'AllowedHeaders': ['Authorization'],
        'AllowedMethods': ['GET', 'PUT'],
        'AllowedOrigins': ['*'],
        'ExposeHeaders': ['GET', 'PUT'],
        'MaxAgeSeconds': 3000
    }]
}


s3_client.put_bucket_cors(Bucket=BUCKET_NAME, CORSConfiguration=cors_configuration)

{'ResponseMetadata': {'RequestId': 'XNN9CHPFR717B1TW',
  'HostId': 'DO9Kn9LPskIJYpGD5O453ZWBH77EoHmBNFIRyhkqOqVGPXFGrdzgrydvJEjMu/dYTVzquzLSwvLJAXqKlzyhhQ==',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'DO9Kn9LPskIJYpGD5O453ZWBH77EoHmBNFIRyhkqOqVGPXFGrdzgrydvJEjMu/dYTVzquzLSwvLJAXqKlzyhhQ==',
   'x-amz-request-id': 'XNN9CHPFR717B1TW',
   'date': 'Fri, 25 Aug 2023 17:51:25 GMT',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0}}