# Python boto3: S3 Linking Examples

### This jupyter script shows some simple examples for beginner practice in interacting with AWS. More automated complex examples should potentially use another service like CloudFormation.

### Before you begin this demo:

* follow these directions from the boto3 configuration site to set up link between python boto3 and your aws account: 
    * https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration

1. `pip install boto3`
2. configure boto3 to aws
    * If you have the AWS CLI installed, then you can use it to configure your credentials file:
    * `aws configure `
    
### Important Notes on following material:
* **S3 bucket names are universal and MUST BE UNIQUE**
* **BEST PRACTICE: MAKE S3 BUCKETS PRIVATE**
```
response = client.put_public_access_block(
		    Bucket=my_bucket,
		    PublicAccessBlockConfiguration={
		        'BlockPublicAcls': True,
		        'IgnorePublicAcls': True,
		        'BlockPublicPolicy': True,
		        'RestrictPublicBuckets': True
		    }
		)
 ```
* When uploading files to S3, you have a 5 GB limit
* Use Stubber for Testing

### Response Syntax
* https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObject.html

Response Elements
* If the action is successful, the service sends back an HTTP 204 response.Only returns header

### Useful Links
* Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/migrations3.html
* Boto3 Selecting Services: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/index.html
* Boto3 Testing Stubber: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/stubber.html


In [None]:
import boto3
import pandas as pd

session = boto3.Session()
s3 = boto3.resource('s3')

# S3 Buckets - resource('s3')

`s3 = boto3.resource('s3')`

* Read bucket
* Create bucket
* Check if bucket exists
* Delete Bucket
* Read objects in buckets
* Add objects to bucket
* Delete objects in buckets

# Working with buckets

This section works with creating, reading, and deleting S3 buckets. 

### Read All S3 Buckets

* `s3.buckets.all()`

In [None]:
# Print out bucket names
for bucket in s3.buckets.all():
    print(bucket.name)

### Create S3 Bucket

* `s3.create_bucket()`
* #### BUCKETS ARE UNIVERSAL NAMES AND CANNOT BE THE SAME
    * you'll get the error, "Bucket name is already owned"
* You will have to create your own bucket

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

s3.create_bucket(Bucket=my_bucket
    , CreateBucketConfiguration={'LocationConstraint': 'us-west-2'})


### BEST PRACTICES: SET BUCKET TO PRIVATE

In [None]:
response = client.put_public_access_block(
    Bucket=my_bucket,
    PublicAccessBlockConfiguration={
        'BlockPublicAcls': True,
        'IgnorePublicAcls': True,
        'BlockPublicPolicy': True,
        'RestrictPublicBuckets': True
    }
)

### Checking if a bucket exists 

**Method** 1: Pull entire list of buckets and check to see if it exists
* `s3.buckets.all()`



In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

for bucket in s3.buckets.all():
    if my_bucket == bucket.name:
        print("Bucket Exists")


**Method 2**: Use a bucket function
* `bucket.creation_date`
* it doesn't require ListBuckets which can be expensive
* it doesn't require going down to the low-level client API

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

if bucket.creation_date:
    print("The bucket exists, created:", bucket.creation_date)
else:
    print("The bucket does not exist")

### Delete Bucket
* `s3.Bucket(my_bucket).delete()`


In [None]:
my_bucket='dana-demo-bucket'

s3.Bucket(my_bucket).delete()

### Create Bucket unless it exists

* `bucket.creation_date`
* ` s3.create_bucket()`


In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket('dana-demo-bucket')

if bucket.creation_date:
    print("The bucket already exists, created:", bucket.creation_date)
else:
    s3.create_bucket(Bucket=my_bucket
    , CreateBucketConfiguration={'LocationConstraint': 'us-west-2'})
    print("Created your bucket:", my_bucket)


# Working with Objects in Buckets

This section works with the objects in a bucket using boto3 functions and the boto3client. This allows us to read dataframes into and out of buckets. Check to see if an object exists in a bucket and use object functions.


### Check for objects in bucket

* `bucket.objects.all()`

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

print(len(list(bucket.objects.all())))

### Add Object to Bucket (like a dataframe as CSV)

1. First I'm uploading a local example csv file as a dataframe
2. Turn the df back into csv format 
3. Use put functionality to load into bucket


`boto3.client()` https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client

* `client = boto3.client('s3')`
* `client.put_object()`


In [None]:
from io import StringIO


df = pd.read_csv('My_Local_File.csv')

csv_buffer = StringIO()
df.to_csv(csv_buffer)

bucket = s3.Bucket('dana-demo-bucket')
client = boto3.client('s3')

path = 'My_Local_File' + '.csv'
client.put_object(
              Body=csv_buffer.getvalue()
            , Bucket=my_bucket
            , Key=path)

In [None]:
#df.head()

### Add df in a folder

In [None]:
bucket = s3.Bucket('dana-demo-bucket')
filename = 'My_Local_File'
client = boto3.client('s3')
    
path = "folder/" + filename + '.csv' # ADD FOLDER TO PATH
client.put_object(
              Body=csv_buffer.getvalue()
            , Bucket=my_bucket
            , Key=path)

## Now the Easy Way...
### Add file using S3 client
* s3.meta.client.upload_file()
* https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

# 5 gig max?
s3.meta.client.upload_file('hello_world.txt', my_bucket, 'hello.txt') 
s3.meta.client.upload_file('My_Local_File.csv', my_bucket, 'local_file.csv')

### Send df to S3 Bucket Function

In [None]:
def send_dataframe_to_S3(df, bucketName, path):
    import boto3
    from io import StringIO
    
    
    print("WRITING to S3 bucket", bucketName)

    csv_buffer = StringIO()
    df.to_csv(csv_buffer)
    client = boto3.client('s3')
    
    try:
        response = client.put_object(
              Body=csv_buffer.getvalue()
            , Bucket=bucketName
            , Key=path)
        print("SUCCESSFUL WRITE to the S3 BUCKET")
        print("YOUR FILES ARE HERE: ", path)
        
    except Exception as e:
        print ("UNSUCCESSFUL write to S3 BUCKET ")
        print("ERROR:", e)

#########################################################
        
from io import StringIO

filename = 'My_Local_File'
df = pd.read_csv(filename + '.csv')

my_bucket='dana-demo-bucket'

path = filename + '.csv'
send_dataframe_to_S3(df, my_bucket, path)

### S3 CSV to df

**Method** 1: Direct link to a single document
* `df = pd.read_csv(path)`

In [None]:
file = filename + ".csv"
path = "s3://" + my_bucket + "/" + file
print(path)

df = pd.read_csv(path)
df.head()

**Method 2:** key/body to get list of dataframes


CSVs in S3 buckets are stored as bytes and need to be decoded to be read by the pands.read_csv() function. 
* `bucket.objects.all()`
    * `obj.key`
    * `obj.get()`
* import io

In [None]:
import io

my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

df_list = []
for obj in bucket.objects.all():
    key = obj.key
    body = obj.get()['Body'].read() # body stored in bytes

    csv_string = body.decode('utf-8')
    df_list.append(pd.read_csv(StringIO(csv_string)))
    
print(len(df_list))

In [None]:
df_list[0]

### Read ALL Objects from S3 Bucket

* `bucket.objects.all()`

In [None]:
bucket = s3.Bucket('dana-demo-bucket')

obj_list = list(bucket.objects.all())
if len(obj_list) > 0:
    print(len(obj_list), "Objects Exist")
else:
    print("This bucket is empty")
    

### print list of objects

In [None]:
obj_list

### print all object "keys" or names

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

for file in bucket.objects.all():
    print(file.key)

### Check if Object Exists

In [None]:
bucket = s3.Bucket('dana-demo-bucket')

file_to_find = 'My_Local_File.csv'
for file in bucket.objects.all():
    if file.key == file_to_find:
        print(file_to_find, "EXISTS in S3")


### Delete Single Object in S3 Bucket

In [None]:
my_bucket='dana-demo-bucket'

file_to_delete = 'My_Local_File.csv'
obj = s3.Object(my_bucket, file_to_delete)
obj.delete()

### Clean up: Delete ALL Objects in S3 Bucket

In [None]:
my_bucket='dana-demo-bucket'
bucket = s3.Bucket(my_bucket)

print("number objects: ", len(list(bucket.objects.all())))

for obj in bucket.objects.all():
    
    ob = s3.Object(my_bucket,  obj.key)
    ob.delete()
    
print("number objects: ", len(list(bucket.objects.all())))