### AWS Boto3


#### Boto3

 - A library to interact with AWS services
 - At the very minimum you must pass the service (e.g. s3), region name, and AWS credentials

#### IAM
 - To create a key and secret for boto3 you need to use the Identity Access Management service
 - Create IAM (sub-)users to control access to services in the AWS account
 - Credentials or key id and secret combo are what authenticate IAM users
 - In the AWS console, type IAM to find the services section -> at the IAM screen -> Users -> Add User
 - Enter a "user name" and select "programmatic access"
 - Select "attach existing policies directly", search for the proper polcies to add
 - Create the user and save their credentials
 
#### Create s3 and list its buckets

In [None]:
# Generate the boto3 client for interacting with S3
s3 = boto3.client('s3', region_name='us-east-1', 
                        # Set up AWS credentials 
                        aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)
# List the buckets
buckets = s3.list_buckets()

# Print the buckets
print(buckets)

#### Create sns and list its topics

In [None]:
# Generate the boto3 client for interacting with SNS
sns = boto3.client('sns', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# List SNS topics
topics = sns.list_topics()

# Print out the list of SNS topics
print(topics)

#### Boto operations on s3

 - Instatiate client connection to s3
 - Create three new buckets
 - List buckets
 - Print response
 - Delete one bucket
 - List and print again

In [None]:
import boto3

# Create boto3 client to S3
s3 = boto3.client('s3', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# Create the buckets
response_staging = s3.create_bucket(Bucket='gim-staging')
response_processed = s3.create_bucket(Bucket='gim-processed')
response_test = s3.create_bucket(Bucket='gim-test')

# Print out the response
print(response_staging)

# Get the list_buckets response
response = s3.list_buckets()

# Iterate over Buckets from .list_buckets() response
for bucket in response['Buckets']:
  
    # Print the Name for each bucket
    print(bucket['Name'])

In [None]:
# Delete the gim-test bucket
s3.delete_bucket(Bucket='gim-test')

# Get the list_buckets response
response = s3.list_buckets()

# Print each Buckets Name
for bucket in response['Buckets']:
    print(bucket['Name'])

#### Delete multiple s3 buckets

In [None]:
# Get the list_buckets response
response = s3.list_buckets()

# Delete all the buckets with 'gim', create replacements.
for bucket in response['Buckets']:
    if 'gim' in bucket['Name']:
        s3.delete_bucket(Bucket=bucket['Name'])
    
s3.create_bucket(Bucket='gid-staging')
s3.create_bucket(Bucket='gid-processed')
  
# Print bucket listing after deletion
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

#### S3 buckets vs objects

 - Similar relationship as folders and files in a file management system
![image](assets/boto3/bucket_vs_object.png)

#### Uploading objects
 - Use **upload_file()** method
 - the filename parameter is the local file path
 - the Bucket parameter takes the name of the bucket we're uploading to
 - the key parameter is what we name the object in s3
 - the method does not return anything, if there is an error it will throw an exception

#### Listing objects
 - the **list_objects()** method will return up to 1000 objects if there are that many available in the bucket
 - use the prefix argument to limit the response to those objects whose name includes the prefix

#### Viewing a single object
 - use the **head_object()** method passing the bucket's name and object's key

#### Downloading a file
 - use the **download_file()** method to download a file specifying the bucket and key of the object you want to download

#### Delete an object
 - use the **delete_object()** method to delete the object in the bucket passing the bucket and key

In [None]:
# Upload final_report.csv to gid-staging
s3.upload_file(Bucket='gid-staging',
              # Set filename and key
               Filename='final_report.csv', 
               Key='2019/final_report_01_01.csv')

# Get object metadata and print it
response = s3.head_object(Bucket='gid-staging', 
                       Key='2019/final_report_01_01.csv')

# Print the size of the uploaded object
print(response['ContentLength'])

In [None]:
# List only objects that start with '2018/final_'
response = s3.list_objects(Bucket='gid-staging', 
                           Prefix='2018/final_')

# Iterate over the objects
if 'Contents' in response:
    for obj in response['Contents']:
        # Delete the object
        s3.delete_object(Bucket='gid-staging', Key=obj['Key'])

# Print the remaining objects in the bucket
response = s3.list_objects(Bucket='gid-staging')

for obj in response['Contents']:
    print(obj['Key'])

#### AWS permissions system
 - 1. Attaching IAM policies to a user to give access permissions. Applies to all AWS services
 - 2. Bucket policies give control to a bucket and an object within it
 - 3. ACL or Access Control Lists let us set permissions on specific objects within a bucket
 - 4. Presigned URLs let us provide temporary access to an object

IAM and bucket policies are used in multi-user environments

#### ACLs
 - Are entities attached to a bucket in s3
 - Can be private or public-read
 - Private by default
 - Set a bucket on "public read" with **s3.put_object_acl(Bucket='bucket-name',Key='file.type',ACL='public-read')**
 - Or on upload with **s3.upload_file(Bucket='bucket-name', Filename='file.type', Key='file.type', ExtraArgs={'ACL':'public-read'})**

#### Accessing public object
 - Use the following http format https://{bucket}.s3.amazonaws.com/{key}
 - For instance, "https://{}.s3.amazonaws.com/{}".format( "gid-requests", "2019/potholes.csv")
 - Common pandas read method df = pd.read_csv(url)

#### How access is decided
![image](assets/boto3/how_access_1.png)
![image](assets/boto3/how_access_2.png)

In [None]:
# Upload the final_report.csv to gid-staging bucket
s3.upload_file(
  # Complete the filename
  Filename='./final_report.csv', 
  # Set the key and bucket
  Key='2019/final_report_2019_02_20.csv', 
  Bucket='gid-staging',
  # During upload, set ACL to public-read
  ExtraArgs = {
    'ACL': 'public-read'})

In [None]:
# List only objects that start with '2019/final_'
response = s3.list_objects(Bucket='gid-staging', Prefix='2019/final_')

# Iterate over the objects
for obj in response['Contents']:
  
    # Give each object ACL of public-read
    s3.put_object_acl(Bucket='gid-staging', 
                      Key=obj['Key'], 
                      ACL='public-read')
    
    # Print the Public Object URL for each object
    print("https://{}.s3.amazonaws.com/{}".format('gid-staging', obj['Key']))

#### Reading private files

To read private s3 files
 - either use boto3 client download_file() method to download the file and read from disk with pandas or other method
 - or use the get_object() method whose obj['Body'] should then be passed to pandas
 - or use pre-signed urls that are active for a pre-determined amount of time
   - s3.generate_presigned_url(  ClientMethod='get_object',  ExpiresIn=3600,  Params={'Bucket': 'gid-requests','Key': 'potholes.csv'})

In [None]:
# Generate presigned_url for the uploaded object
share_url = s3.generate_presigned_url(
  # Specify allowable operations
  ClientMethod='get_object',
  # Set the expiration time
  ExpiresIn=3600,
  # Set bucket and shareable object's name
  Params={'Bucket': 'gid-staging','Key': 'final_report.csv'}
)

# Print out the presigned URL
print(share_url)

In [None]:
df_list =  [ ] 

for file in response['Contents']:
    # For each file in response load the object from S3
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])
    # Load the object's StreamingBody with pandas
    obj_df = pd.read_csv(obj['Body'])
    # Append the resulting DataFrame to list
    df_list.append(obj_df)

# Concat all the DataFrames with pandas
df = pd.concat(df_list)

# Preview the resulting DataFrame
df.head()

#### Pandas to html and s3 upload

Some usefule media types
 - application/json
 - image/png
 - application/pdf
 - text/csv

In [None]:
# Generate an HTML table with no border and selected columns
services_df.to_html('./services_no_border.html',
           # Keep specific columns only
           columns=['service_name', 'link'],
           # Set border
           border=0)

# Generate an html table with border and all columns.
services_df.to_html('./services_border_all_columns.html', 
           border=1)

In [None]:
# Upload the lines.html file to S3
s3.upload_file(Filename='lines.html', 
               # Set the bucket name
               Bucket='datacamp-public', Key='index.html',
               # Configure uploaded file
               ExtraArgs = {
                 # Set proper content type
                 'ContentType':'text/html',
                 # Set proper ACL
                 'ACL': 'public-read'})

# Print the S3 Public Object URL for the new file.
# https://datacamp-website.s3.amazonaws.com/table.html
print("http://{}.s3.amazonaws.com/{}".format('datacamp-public', 'index.html'))

#### Mini project to read data and upload as html to s3

In [None]:
# Combine daily requests for February


df_list = [] 

# Load each object from s3
for file in request_files:
    s3_day_reqs = s3.get_object(Bucket='gid-requests', 
                                Key=file['Key'])
    # Read the DataFrame into pandas, append it to the list
    day_reqs = pd.read_csv(s3_day_reqs['Body'])
    df_list.append(day_reqs)

# Concatenate all the DataFrames in the list
all_reqs = pd.concat(df_list)

# Preview the DataFrame
all_reqs.head()

In [None]:
# Upload aggregated reports for February

# Write agg_df to a CSV and HTML file with no border
agg_df.to_csv('./feb_final_report.csv')
agg_df.to_html('./feb_final_report.html', border=0)

# Upload the generated CSV to the gid-reports bucket
s3.upload_file(Filename='./feb_final_report.csv', 
	Key='2019/feb/final_report.html', Bucket='gid-reports',
    ExtraArgs = {'ACL': 'public-read'})

# Upload the generated HTML to the gid-reports bucket
s3.upload_file(Filename='./feb_final_report.html', 
	Key='2019/feb/final_report.html', Bucket='gid-reports',
    ExtraArgs = {'ContentType': 'text/html', 
                 'ACL': 'public-read'})

In [None]:
# Update index to include February

# List the gid-reports bucket objects starting with 2019/
objects_list = s3.list_objects(Bucket='gid-reports', Prefix='2019/')

# Convert the response contents to DataFrame
objects_df = pd.DataFrame(objects_list['Contents'])

# Create a column "Link" that contains Public Object URL
base_url = "http://gid-reports.s3.amazonaws.com/"
objects_df['Link'] = base_url + objects_df['Key']

# Preview the resulting DataFrame
objects_df.head()

In [None]:
# Upload the new index

# Write objects_df to an HTML file
objects_df.to_html('report_listing.html',
    # Set clickable links
    render_links=True,
    # Isolate the columns
    columns=['Link', 'LastModified', 'Size'])

# Overwrite index.html key by uploading the new file
s3.upload_file(
  Filename='./report_listing.html', Key='index.html', 
  Bucket='gid-reports',
  ExtraArgs = {
    'ContentType': 'text/html', 
    'ACL': 'public-read'
  })

![image](assets/boto3/result.png)