# Introduction to AWS Boto in Python

## Chapter 1: Putting File in the Cloud!

### Intro to AWS and Boto3

#### What is Amazon Web Services?

We buy AWS services for our projects.

#### What is Boto3?

In [None]:
import boto3

s3 = boto3.client('s3',
                  region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

response = s3.list_buckets()

#### AWS console
* Create an account

#### Creating keys with IAM
* IAM = Identity Access Management
* We create IAM sub-users to control access to AWS resources in our account
* Credentials (key/secret combo) are what authenticate IAM users.

#### AWS services
* s3 or Simple Storage Service lets us store files in the cloud.
* SNS or Simple Notification Service lets us send emails and texts.
* Comprehend performs sentiment analysis on blocks of text.
* Rekognition extracts text from images and looks for cats in a picture...

### Diving into buckets

#### S3 Components - Buckets
* Desktop folders
* Own permission policy
* Website storage
* Generate logs

#### S3 Components - Objects
* Files within the buckets (csv, jpg, etc.)

#### What can we do with buckets?
* Create Bucket
* List Buckets
* Delete Buckets

#### Creating a Bucket

In [None]:
import boto3

s3 = boto3.client('s3',
                  region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

bucket = s3.create_bucket(Bucket='gid-requests')

#### Our bucket in the console
* Bucket names need to be unique across all of s3

#### Listing Buckets

In [None]:
import boto3

s3 = boto3.client('s3',
                  region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

bucket_response = s3.list_buckets()

In [None]:
buckets = bucket_response['Buckets']

for bucket in buckets:
    print(bucket['Name'])

#### Deleting buckets

In [None]:
import boto3

s3 = boto3.client('s3',
                  region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

bucket = s3.delete_bucket('gid-requests')

### Uploading and retrieving files

#### Buckets and objects
* An object can be anything: an image, video file, CSV or a log file

#### A Bucket
* A bucket has a **name**
* **Name*** is a string
* **Unique** name in all of S3.
* Containes **many** objects

#### An Object
* An object has a **key**
* **Name** is a full path from bucket root
* **Unique** key in the bucket
* Can only be in **one** parent bucket

#### Creating the client

In [None]:
s3 = boto3.client('s3',
                  region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

#### Uploading files

In [None]:
s3.upload_file(
    Filename='gid_requests_2019_02_01.csv',
    Bucket='gid-requests',
    Key='gid_requests_2019_02_01.csv'
)

#### Listing objects in a bucket

In [None]:
response = s3.list_objects(
    Bucket='gid-requests',
    MaxKeys=2,
    Prefix='gid_requests_2019_'
)

print(response)

#### Getting object metadata

In [None]:
response = s3.head_object(
    Bucket='gid-requests',
    Key='gid_requests_2019_03_01.csv'
)

print(response)

In [None]:
# Print the size of the uploaded object
print(response['ContentLength'])

#### Downloading files

In [None]:
s3.download_file(
    Filename='gid_requests_2019_02_01.csv',
    Bucket='gid-requests',
    Key='gid_requests_2019_02_01.csv'
)

#### Deleting objects

In [None]:
s3.delete_object(
    Bucket='gid-requests',
    Key='gid_requests_2019_02_01.csv'
)

In [None]:
# List only objects that start with '2018/final_'
response = s3.list_objects(Bucket='gid-staging', 
                           Prefix='2018/final_')
print(response)

# Iterate over the objects
if 'Contents' in response:
    for obj in response['Contents']:
        # Delete the object
        s3.delete_object(Bucket='gid-staging', Key=obj['Key'])

# Print the keys of remaining objects in the bucket
response = s3.list_objects(Bucket='gid-staging')

for obj in response['Contents']:
    print(obj['Key'])

## Chapter 2: Sharing Files Securely

### Keeping objects secure

#### Why care about permissions?

In [None]:
import pandas as pd

# Gives an error
df = pd.read_csv("https://git-staging.s3.amazonaws.com/potholes.csv")

#### AWS Permissions Systems
* IAM
    * Attach IAM policies to a user
* Bucket Policy
* ACL - Access Control Lists
* Presigned URL

#### ACLs
* ACLs are entities attached to objects in S3
* 2 types:
    * private
    * public-read
    
**Upload File**

In [None]:
s3.upload_file(
    Filename='potholes.csv', Bucket='gid-requests', Key='potholes.csv')

**Set ACL to `public-read`**

In [None]:
s3.put_object_acl(
    Bucket='gid-requests', Key='potholes.csv', ACL='public-read')

#### Setting ACLs on upload
* Upload file with `public-read` ACL

In [None]:
s3.upload_file(
    Bucket='gid-requests',
    Filename='potholes.csv',
    Key='potholes.csv',
    ExtraArgs={'ACL':'public-read'})

#### Accessing public objects
* s3 object URL Template
`https://{bucket}}.s3.amazonaws.com/{key}`

#### Generating public object URL
* Generate Object URL String

In [None]:
url = "https://{}.s3.amazonaws.com/{}".format(
    "gid-requests",
    "2019/potholes.csv")

In [None]:
# Read the URL into Pandas
df = pd.read_csv(url)

#### How access is decided
* If it's a presigned URL, it will allow the download
* If not, it will check the policies to make sure they allow the download

#### Review
* IAM: "What can this user do in AWS?"
* Bucket Policy: "Who can access this S3 bucket?"
* ACL: "Who can access this object"
* Presigned URL: Let us grant temporary access to objects

### Accessing private objects in S3

#### Downloading a private file
* Download File

In [None]:
s3.download_file(
    Filename='potholes_local.csv',
    Bucket='gid-staging',
    Key='2019/potholes_private.csv')

* Read from Disk

In [None]:
pd.read_csv("./potholes_local.csv")

#### Accessing private files
* Use `.get_object()`

In [None]:
obj = s3.get_object(Bucket='gid-requests', Key='2019/potholes.csv')

# Response is tons of metadata
print(obj)

In [None]:
# Pandas knows how to handle this
pd.read_csv(obj['Body'])

#### Pre-signed URLs
* Expire after a certain timeframe
* Great for temporary access

In [None]:
# Generate Presigned URL
share_url = s3.generate_presigned_url(
    ClientMethod='get_object',
    ExpiresIn=3600,
    Params={'Bucket': 'gid-requests', 'Key': 'potholes.csv'}
)

In [None]:
# Open in Pandas
pd.read_csv(share_url)

#### Load multiple files into one DataFrame

In [None]:
# Create list to hold our DataFrames
df_list = []

# Request the list of csv's from s3 with prefix; Get contents
response = s3.list_objects(
    Bucket='gid-requests',
    Prefix='2019/')

# Get response contents
request_files = response['Contents']

# Iterate over each object
for file in request_files:
    obj = s3.get_object(Bucket='gid-staging', Key=file['Key'])
    
    # Read it as DataFrame
    obj_df = pd.read_csv(obj['Body'])
    
    # Append DataFrame to list
    df_list.append(obj_df)
    
df = pd.concat(df_list)

### Sharing files through a website

#### Serving HTML Pages

#### HTML table in Pandas
* Convert DataFrame to html

In [None]:
df.to_html('table_agg.html', 
           render_links=True,
           columns=['service_name', 'request_count', 'info_link'])

#### Borders

In [None]:
df.to_html('table_agg.html', 
           render_links=True,
           columns=['service_name', 'request_count', 'info_link'],
           border=0 # removes border. border=1 keeps border.
          )

#### Uploading an HTML file to S3

In [None]:
s3.upload_file(
    Filename='./table_agg.html',
    Bucket='datacamp-website',
    Key='table.html',
    ExtraArgs = {
        'ContentType': 'text/html',
        'ACL': 'public-read'
    })

#### Accessing HTML file
* S3 object URL Template
`https://{bucket}.s3.amazonaws.com/{key}`

#### Uploading other types of content

In [None]:
s3.upload_file(
    Filename='./plot_image.png',
    Bucket='datacamp-website',
    Key='plot_image.png',
    ExtraArgs = {
        'ContentType': 'image/png'm
        'ACL': 'public-read'
    })

#### IANA Media Types
* JSON: `application/json`
* PNG: `image/png`
* PDF: `application/pdf`
* CSV: `text/csv`

#### Generating an index page

In [None]:
# List the gid-report bucket objects atarting with 2019/
r = s3.list_object(Bucket='gid-reports', Prefix='2019/')

# Convert the response contents to DataFrame
objects_df = pd.DataFrame(r['Contents'])

# Create a column "Link that contains website url + key
base_url = "https://datacamp-website/s3/amazonaws.com/"
objects_df['Link'] = base_url + objects_df['Key']

# Write DataFrame to html
objects_df.to_html('report_listing.html',
                   columns=['Link', 'LastModified', 'Size'],
                   render_links=True)

#### Uploading index page

In [None]:
s3.upload_file(
    Filename='./report_listing.html',
    Bucket='datacamp-website',
    Key='index.html',
    ExtraArgs = {
        'ContentType': 'text/html',
        'ACL': 'public-read'
    })

### Case Study: Generating a Report Repository

#### The Steps

1. Prepare the data
    * Download files for the month from the raw data bucket
    * Concatenate them into one csv
    * Create an aggregated DataFrame
    
    
2. Create the report
    * Write the DataFrame to CSV and HTML
    * Generate a Bokeh plot, save as HTML
    
   
3. Upload report to sharable website
    * Create `gid-reports` bucket
    * Upload all three files for the month to S3
    * Generate an index.html file that lists all the files
    
#### Raw data bucket
* Private files
* Daily CSVs of requests from the App
* Raw data

#### Read raw data files

In [None]:
# Create list to hold our DataFrames
df_list = []

# Request the list of csv's from S3 with prefix; Get contents
response = s3.list_objects(
    Bucket='gid-requests',
    Prefix='2019_jan')

# Get response contents
request_files = response['Contents']

# Iterate over each object
for file in request_files:
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])
    
    # Read it as DataFrame
    obj_df = pd.read_csv(obj['Body'])
    
    # Append DataFrame to list
    df_list.append(obj_df)
    
# Concatenate all the DataFrames in the list
df = pd.concat(df_list)

# Preview the DataFrame
df.head()

#### Create aggregated reports
* Perform some aggregation
* `df.to_csv('jan_final_report.csv')`
* `df.to_html('jan_final_report.html')`
* `jan_final_chart.html`

#### Report bucket
* Bucket website
* Publicly Accessible
* Aggregated data and HTML reports

#### Upload Aggregated CSV

In [None]:
# Upload Aggregated CSV to S3
s3.upload_file(Filename='./jan_final_report.csv',
               Key='2019/jan/final_report.csv',
               Bucket='gid-reports',
               ExtraArgs = {'ACL':'public-read'})

#### Upload HTML Table

In [None]:
# Upload HTML table to S3
s3.upload_file(Filename='./jan_final_report.html',
               Key='2019/jan/final_report.html',
               Bucket='gid-reports',
               ExtraArgs = {
                    'ContentType':'text/html',
                    'ACL':'public-read'})

#### Upload HTML Chart

In [None]:
# Upload Aggregated Chart to S3
s3.upload_file(Filename='./jan_final_chart.html',
               Key='2019/jan/final_chart.html',
               Bucket='gid-reports',
               ExtraArgs = {
                    'ContentType':'text/html',
                    'ACL':'public-read'})

#### Create index.html

In [None]:
# List the gid-reports bucket object starting with 2019/
r = s3.list_objects(Bucket='gid-reports', Prefix='2019/')

# Convert the response contents to DataFrame
objects_df = pd.DataFrame(r['Contents'])

# Create a column "Link" that contains website url + key
base_url = 'https://gid-reports.s3.amazonaws.com/'
objects_df['Link'] = base_url + objects_df['Key']

# Write DataFrame to html
object_df.to_html('report_listing.html',
                  columns=['Link', 'LastModified', 'Size'],
                  render_links=True)

#### Upload index.html

In [None]:
s3.upload_file(
    Filename='./report_listing.html',
    Key='index.html',
    Bucket='gid-reports',
    ExtraArgs = {
        'ContentType': 'text/html',
        'ACL': 'public-read'
    })

## Chapter 3: Reporting and Notifying

### SNS Topics

#### SNS: Simple Notification Service
* Can send emails, push notifications, texts, etc.

#### Creating an SNS Topic 

In [None]:
sns = boto3.client('sns',
                   region_name='us-east-1',
                   aws_access_key_id=AWS_KEY_ID,
                   aws_secret_access_key=AWS_SECRET)

response = sns.create-topic(Name='city_alerts')

topic_arn = response['TopicArn']

#### Listing topics

In [None]:
topics = response['Topics']

#### Deleting topics

In [None]:
sns.delete_topic(TopicArn='arn:aws:sns:us-east-1:320333787981:city_alerts')