### AWS Boto3

 - docs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/index.html

#### Boto3

 - A library to interact with AWS services
 - At the very minimum you must pass the service (e.g. s3), region name, and AWS credentials

#### IAM
 - To create a key and secret for boto3 you need to use the Identity Access Management service
 - Create IAM (sub-)users to control access to services in the AWS account
 - Credentials or key id and secret combo are what authenticate IAM users
 - In the AWS console, type IAM to find the services section -> at the IAM screen -> Users -> Add User
 - Enter a "user name" and select "programmatic access"
 - Select "attach existing policies directly", search for the proper polcies to add
 - Create the user and save their credentials
 
#### Create s3 and list its buckets

In [None]:
# Generate the boto3 client for interacting with S3
s3 = boto3.client('s3', region_name='us-east-1', 
                        # Set up AWS credentials 
                        aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)
# List the buckets
buckets = s3.list_buckets()

# Print the buckets
print(buckets)

#### Create sns and list its topics

In [None]:
# Generate the boto3 client for interacting with SNS
sns = boto3.client('sns', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# List SNS topics
topics = sns.list_topics()

# Print out the list of SNS topics
print(topics)

#### Boto operations on s3

 - Instatiate client connection to s3
 - Create three new buckets
 - List buckets
 - Print response
 - Delete one bucket
 - List and print again

In [None]:
import boto3

# Create boto3 client to S3
s3 = boto3.client('s3', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# Create the buckets
response_staging = s3.create_bucket(Bucket='gim-staging')
response_processed = s3.create_bucket(Bucket='gim-processed')
response_test = s3.create_bucket(Bucket='gim-test')

# Print out the response
print(response_staging)

# Get the list_buckets response
response = s3.list_buckets()

# Iterate over Buckets from .list_buckets() response
for bucket in response['Buckets']:
  
    # Print the Name for each bucket
    print(bucket['Name'])

In [None]:
# Delete the gim-test bucket
s3.delete_bucket(Bucket='gim-test')

# Get the list_buckets response
response = s3.list_buckets()

# Print each Buckets Name
for bucket in response['Buckets']:
    print(bucket['Name'])

#### Delete multiple s3 buckets

In [None]:
# Get the list_buckets response
response = s3.list_buckets()

# Delete all the buckets with 'gim', create replacements.
for bucket in response['Buckets']:
    if 'gim' in bucket['Name']:
        s3.delete_bucket(Bucket=bucket['Name'])
    
s3.create_bucket(Bucket='gid-staging')
s3.create_bucket(Bucket='gid-processed')
  
# Print bucket listing after deletion
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

#### S3 buckets vs objects

 - Similar relationship as folders and files in a file management system
![image](assets/boto3/bucket_vs_object.png)

#### Uploading objects
 - Use **upload_file()** method
 - the filename parameter is the local file path
 - the Bucket parameter takes the name of the bucket we're uploading to
 - the key parameter is what we name the object in s3
 - the method does not return anything, if there is an error it will throw an exception

#### Listing objects
 - the **list_objects()** method will return up to 1000 objects if there are that many available in the bucket
 - use the prefix argument to limit the response to those objects whose name includes the prefix

#### Viewing a single object
 - use the **head_object()** method passing the bucket's name and object's key

#### Downloading a file
 - use the **download_file()** method to download a file specifying the bucket and key of the object you want to download

#### Delete an object
 - use the **delete_object()** method to delete the object in the bucket passing the bucket and key

In [None]:
# Upload final_report.csv to gid-staging
s3.upload_file(Bucket='gid-staging',
              # Set filename and key
               Filename='final_report.csv', 
               Key='2019/final_report_01_01.csv')

# Get object metadata and print it
response = s3.head_object(Bucket='gid-staging', 
                       Key='2019/final_report_01_01.csv')

# Print the size of the uploaded object
print(response['ContentLength'])

In [None]:
# List only objects that start with '2018/final_'
response = s3.list_objects(Bucket='gid-staging', 
                           Prefix='2018/final_')

# Iterate over the objects
if 'Contents' in response:
    for obj in response['Contents']:
        # Delete the object
        s3.delete_object(Bucket='gid-staging', Key=obj['Key'])

# Print the remaining objects in the bucket
response = s3.list_objects(Bucket='gid-staging')

for obj in response['Contents']:
    print(obj['Key'])

#### AWS permissions system
 - 1. Attaching IAM policies to a user to give access permissions. Applies to all AWS services
 - 2. Bucket policies give control to a bucket and an object within it
 - 3. ACL or Access Control Lists let us set permissions on specific objects within a bucket
 - 4. Presigned URLs let us provide temporary access to an object

IAM and bucket policies are used in multi-user environments

#### ACLs
 - Are entities attached to a bucket in s3
 - Can be private or public-read
 - Private by default
 - Set a bucket on "public read" with **s3.put_object_acl(Bucket='bucket-name',Key='file.type',ACL='public-read')**
 - Or on upload with **s3.upload_file(Bucket='bucket-name', Filename='file.type', Key='file.type', ExtraArgs={'ACL':'public-read'})**

#### Accessing public object
 - Use the following http format https://{bucket}.s3.amazonaws.com/{key}
 - For instance, "https://{}.s3.amazonaws.com/{}".format( "gid-requests", "2019/potholes.csv")
 - Common pandas read method df = pd.read_csv(url)

#### How access is decided
![image](assets/boto3/how_access_1.png)
![image](assets/boto3/how_access_2.png)

In [None]:
# Upload the final_report.csv to gid-staging bucket
s3.upload_file(
  # Complete the filename
  Filename='./final_report.csv', 
  # Set the key and bucket
  Key='2019/final_report_2019_02_20.csv', 
  Bucket='gid-staging',
  # During upload, set ACL to public-read
  ExtraArgs = {
    'ACL': 'public-read'})

In [None]:
# List only objects that start with '2019/final_'
response = s3.list_objects(Bucket='gid-staging', Prefix='2019/final_')

# Iterate over the objects
for obj in response['Contents']:
  
    # Give each object ACL of public-read
    s3.put_object_acl(Bucket='gid-staging', 
                      Key=obj['Key'], 
                      ACL='public-read')
    
    # Print the Public Object URL for each object
    print("https://{}.s3.amazonaws.com/{}".format('gid-staging', obj['Key']))

#### Reading private files

To read private s3 files
 - either use boto3 client download_file() method to download the file and read from disk with pandas or other method
 - or use the get_object() method whose obj['Body'] should then be passed to pandas
 - or use pre-signed urls that are active for a pre-determined amount of time
   - s3.generate_presigned_url(  ClientMethod='get_object',  ExpiresIn=3600,  Params={'Bucket': 'gid-requests','Key': 'potholes.csv'})

In [None]:
# Generate presigned_url for the uploaded object
share_url = s3.generate_presigned_url(
  # Specify allowable operations
  ClientMethod='get_object',
  # Set the expiration time
  ExpiresIn=3600,
  # Set bucket and shareable object's name
  Params={'Bucket': 'gid-staging','Key': 'final_report.csv'}
)

# Print out the presigned URL
print(share_url)

In [None]:
df_list =  [ ] 

for file in response['Contents']:
    # For each file in response load the object from S3
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])
    # Load the object's StreamingBody with pandas
    obj_df = pd.read_csv(obj['Body'])
    # Append the resulting DataFrame to list
    df_list.append(obj_df)

# Concat all the DataFrames with pandas
df = pd.concat(df_list)

# Preview the resulting DataFrame
df.head()

#### Pandas to html and s3 upload

Some usefule media types
 - application/json
 - image/png
 - application/pdf
 - text/csv

In [None]:
# Generate an HTML table with no border and selected columns
services_df.to_html('./services_no_border.html',
           # Keep specific columns only
           columns=['service_name', 'link'],
           # Set border
           border=0)

# Generate an html table with border and all columns.
services_df.to_html('./services_border_all_columns.html', 
           border=1)

In [None]:
# Upload the lines.html file to S3
s3.upload_file(Filename='lines.html', 
               # Set the bucket name
               Bucket='datacamp-public', Key='index.html',
               # Configure uploaded file
               ExtraArgs = {
                 # Set proper content type
                 'ContentType':'text/html',
                 # Set proper ACL
                 'ACL': 'public-read'})

# Print the S3 Public Object URL for the new file.
# https://datacamp-website.s3.amazonaws.com/table.html
print("http://{}.s3.amazonaws.com/{}".format('datacamp-public', 'index.html'))

#### Mini project to read data and upload as html to s3

In [None]:
# Combine daily requests for February


df_list = [] 

# Load each object from s3
for file in request_files:
    s3_day_reqs = s3.get_object(Bucket='gid-requests', 
                                Key=file['Key'])
    # Read the DataFrame into pandas, append it to the list
    day_reqs = pd.read_csv(s3_day_reqs['Body'])
    df_list.append(day_reqs)

# Concatenate all the DataFrames in the list
all_reqs = pd.concat(df_list)

# Preview the DataFrame
all_reqs.head()

In [None]:
# Upload aggregated reports for February

# Write agg_df to a CSV and HTML file with no border
agg_df.to_csv('./feb_final_report.csv')
agg_df.to_html('./feb_final_report.html', border=0)

# Upload the generated CSV to the gid-reports bucket
s3.upload_file(Filename='./feb_final_report.csv', 
    Key='2019/feb/final_report.html', Bucket='gid-reports',
    ExtraArgs = {'ACL': 'public-read'})

# Upload the generated HTML to the gid-reports bucket
s3.upload_file(Filename='./feb_final_report.html', 
    Key='2019/feb/final_report.html', Bucket='gid-reports',
    ExtraArgs = {'ContentType': 'text/html', 
                 'ACL': 'public-read'})

In [None]:
# Update index to include February

# List the gid-reports bucket objects starting with 2019/
objects_list = s3.list_objects(Bucket='gid-reports', Prefix='2019/')

# Convert the response contents to DataFrame
objects_df = pd.DataFrame(objects_list['Contents'])

# Create a column "Link" that contains Public Object URL
base_url = "http://gid-reports.s3.amazonaws.com/"
objects_df['Link'] = base_url + objects_df['Key']

# Preview the resulting DataFrame
objects_df.head()

In [None]:
# Upload the new index

# Write objects_df to an HTML file
objects_df.to_html('report_listing.html',
    # Set clickable links
    render_links=True,
    # Isolate the columns
    columns=['Link', 'LastModified', 'Size'])

# Overwrite index.html key by uploading the new file
s3.upload_file(
  Filename='./report_listing.html', Key='index.html', 
  Bucket='gid-reports',
  ExtraArgs = {
    'ContentType': 'text/html', 
    'ACL': 'public-read'
  })

![image](assets/boto3/result.png)

#### Simple notification service - SNS

![image](assets/boto3/sns.png)

 - Send messages, emails, push notifications
 - Publishers post messages to topics
 - Subscribers receive them
 - Every topic has an ARN (Amazon Resource Name), that is a unique id for this topic
 - Under each topic there are subscriptions with unique ids
 - If you create a topic that already exists it doesn't throw an error but returns the ARN of the existing topic

In [None]:
# Initialize boto3 client for SNS
sns = boto3.client('sns', 
                   region_name='us-east-1', 
                   aws_access_key_id=AWS_KEY_ID, 
                   aws_secret_access_key=AWS_SECRET)

# Create the city_alerts topic
response = sns.create_topic(Name="city_alerts")
c_alerts_arn = response['TopicArn']

# Re-create the city_alerts topic using a oneliner
c_alerts_arn_1 = sns.create_topic(Name='city_alerts')['TopicArn']

# Compare the two to make sure they match
print(c_alerts_arn == c_alerts_arn_1)

In [None]:
# Create list of departments
departments = ['trash', 'streets', 'water']

for dept in departments:
  	# For every department, create a general topic
    sns.create_topic(Name="{}_general".format(dept))
    
    # For every department, create a critical topic
    sns.create_topic(Name="{}_critical".format(dept))

# Print all the topics in SNS
response = sns.list_topics()
print(response['Topics'])

In [None]:
# Create list of departments
departments = ['trash', 'streets', 'water']

for dept in departments:
  	# For every department, create a general topic
    sns.create_topic(Name="{}_general".format(dept))
    
    # For every department, create a critical topic
    sns.create_topic(Name="{}_critical".format(dept))

# Print all the topics in SNS
response = sns.list_topics()
print(response['Topics'])

#### SNS subscriptions
 - Each subscription has a unique id, a protocol and a status

In [None]:
# Subscribe Elena's phone number to streets_critical topic
# str_critical_arn 'arn:aws:sns:us-east-1:123456789012:streets_critical'
resp_sms = sns.subscribe(
  TopicArn = str_critical_arn, 
  Protocol='sms', Endpoint="+16196777733")

# Print the SubscriptionArn
print(resp_sms['SubscriptionArn'])

In [None]:
# Subscribe Elena's email to streets_critical topic.
resp_email = sns.subscribe(
  TopicArn = str_critical_arn, 
  Protocol='email', Endpoint="eblock@sandiegocity.gov")

# Print the SubscriptionArn
print(resp_email['SubscriptionArn'])

In [None]:
# For each email in contacts, create subscription to street_critical
for email in contacts['Email']:
    sns.subscribe(TopicArn = str_critical_arn,
                # Set channel and recipient
                Protocol = 'email',
                Endpoint = email)

# List subscriptions for streets_critical topic, convert to DataFrame
response = sns.list_subscriptions_by_topic(
  TopicArn = str_critical_arn)
subs = pd.DataFrame(response['Subscriptions'])

# Preview the DataFrame
subs.head()

In [None]:
# List subscriptions for streets_critical topic.
response = sns.list_subscriptions_by_topic(
  TopicArn = str_critical_arn)

# For each subscription, if the protocol is SMS, unsubscribe
for sub in response['Subscriptions']:
    if sub['Protocol'] == 'sms':
        sns.unsubscribe(SubscriptionArn=sub['SubscriptionArn'])

# List subscriptions for streets_critical topic in one line
subs = sns.list_subscriptions_by_topic(
  TopicArn=str_critical_arn)['Subscriptions']

# Print the subscriptions
print(subs)

#### Publishing messages

![image](assets/boto3/publishing.png)

In [None]:
# If there are over 100 potholes, create a message
if streets_v_count > 100:
  # The message should contain the number of potholes.
  message = "There are {} potholes!".format(streets_v_count)
  # The email subject should also contain number of potholes
  subject = "Latest pothole count is {}".format(streets_v_count)

  # Publish the email to the streets_critical topic
  sns.publish(
    TopicArn = str_critical_arn,
    # Set subject and message
    Message = message,
    Subject = subject
  )

In [None]:
# Loop through every row in contacts
for idx, row in contacts.iterrows():
    
    # Publish an ad-hoc sms to the user's phone number
    response = sns.publish(
        # Set the phone number
        PhoneNumber = str(row['Phone']),
        # The message should include the user's name
        Message = 'Hello {}'.format(row['Name'])
    )
   
    print(response)

![image](assets/boto3/sns_use_case.png)

#### Creating multi-level topics

In [None]:
dept_arns = {} 

for dept in departments:
    # For each deparment, create a critical topic
    critical = sns.create_topic(Name="{}_critical".format(dept))
    # For each department, create an extreme topic
    extreme = sns.create_topic(Name="{}_extreme".format(dept))
    # Place the created TopicARNs into a dictionary 
    dept_arns['{}_critical'.format(dept)] = critical['TopicArn']
    dept_arns['{}_extreme'.format(dept)] = extreme['TopicArn']

# Print the filled dictionary.
print(dept_arns)

#### Different protocols per topic level

In [None]:
for index, user_row in contacts.iterrows():
    # Get topic names for the users's dept
    critical_tname = '{}_critical'.format(user_row['Department'])
    extreme_tname = '{}_extreme'.format(user_row['Department'])

    # Get or create the TopicArns for a user's department.
    critical_arn = sns.create_topic(Name=critical_tname)['TopicArn']
    extreme_arn = sns.create_topic(Name=extreme_tname)['TopicArn']

    # Subscribe each users email to the critical Topic
    sns.subscribe(TopicArn = critical_arn, 
                Protocol='email', Endpoint=user_row['Email'])
    # Subscribe each users phone number for the extreme Topic
    sns.subscribe(TopicArn = extreme_arn, 
                Protocol='sms', Endpoint=str(user_row['Phone']))

#### Sending multi-level alerts

In [None]:
if vcounts['water'] > 100:
  # If over 100 water violations, publish to water_critical
  sns.publish(
    TopicArn = dept_arns['water_critical'],
    Message = "{} water issues".format(vcounts['water']),
    Subject = "Help fix water violations NOW!")

if vcounts['water'] > 300:
  # If over 300 violations, publish to water_extreme
  sns.publish(
    TopicArn = dept_arns['water_extreme'],
    Message = "{} violations! RUN!".format(vcounts['water']),
    Subject = "THIS IS BAD.  WE ARE FLOODING!")

#### Rekognition - a computer visual service

 - some of its features is detecting text and objects in an image

In [None]:
# Initiate client
rekog = boto3.client(
  'rekognition',
  region_name='us-east-1',
  aws_access_key_id=AWS_KEY_ID,
  aws_secret_access_key=AWS_SECRET)

# Image config
image1 = {'S3Object': {'Bucket': 'datacamp-gid-images', 'Name': 'report_1010.jpg'}}
image2 = {'S3Object': {'Bucket': 'datacamp-gid-images', 'Name': 'report_1111.jpg'}}

 
# Use Rekognition client to detect labels
image1_response = rekog.detect_labels(
    # Specify the image as an S3Object; Return one label
    Image=image1, MaxLabels=1)

# Print the labels
print(image1_response['Labels'])

# Use Rekognition client to detect labels
image2_response = rekog.detect_labels(
    Image=image2, MaxLabels=1)

# Print the labels
print(image2_response['Labels'])

#### Detection of multiple objects

![image](assets/boto3/object_detection.png)

In [None]:
# Create an empty counter variable
cats_count = 0
# Iterate over the labels in the response
for label in response['Labels']:
    # Find the cat label, look over the detected instances
    if label['Name'] == 'Cat':
        for instance in label['Instances']:
            # Only count instances with confidence > 85
            if (instance['Confidence'] > 85):
                cats_count += 1
# Print count of cats
print(cats_count)

#### Text detection

![image](assets/boto3/text_detection.png)

In [None]:
# Create empty list of words
words = []
# Iterate over the TextDetections in the response dictionary
for text_detection in response['TextDetections']:
    # If TextDetection type is WORD, append it to words list
    if text_detection['Type'] == 'WORD':
        # Append the detected text
        words.append(text_detection['DetectedText'])
# Print out the words list
print(words)

In [None]:
# Create empty list of lines
lines = []
# Iterate over the TextDetections in the response dictionary
for text_detection in response['TextDetections']:
  	# If TextDetection type is Line, append it to lines list
    if text_detection['Type'] == 'LINE':
        # Append the detected text
        lines.append(text_detection['DetectedText'])
# Print out the words list
print(lines)

####   Language detection



In [None]:
comprehend = boto3.client('comprehend',
    region_name='us-east-1',
    aws_access_key_id=AWS_KEY_ID, aws_secret_access_key=AWS_SECRET)

In [None]:
# For each dataframe row
for index, row in dumping_df.iterrows():
    # Get the public description field
    description =dumping_df.loc[index, 'public_description']
    if description != '':
        # Detect language in the field content
        resp = comprehend.detect_dominant_language(Text=description)
        # Assign the top choice language to the lang column.
        dumping_df.loc[index, 'lang'] = resp['Languages'][0]['LanguageCode']
        
# Count the total number of spanish posts
spanish_post_ct = len(dumping_df[dumping_df.lang == 'es'])
# Print the result
print("{} posts in Spanish".format(spanish_post_ct))

![image](assets/boto3/language_detection.png)

#### Text translation

In [None]:
translate = boto3.client('translate',
    region_name='us-east-1',
    aws_access_key_id=AWS_KEY_ID, aws_secret_access_key=AWS_SECRET)

In [2]:
for index, row in dumping_df.iterrows():
    
    # Get the public_description into a variable
    description = dumping_df.loc[index, 'public_description']
    
    if description != '':
        
        # Translate the public description
        resp = translate.translate_text(
            Text=description, 
            SourceLanguageCode='auto', TargetLanguageCode='en')
        
        # Store original language in original_lang column
        dumping_df.loc[index, 'original_lang'] = resp['SourceLanguageCode']
        
        # Store the translation in the translated_desc column
        dumping_df.loc[index, 'translated_desc'] = resp['TranslatedText']
        
# Preview the resulting DataFrame
dumping_df = dumping_df[['service_request_id', 'original_lang', 'translated_desc']]
dumping_df.head()

![image](assets/boto3/translate_text.png)

#### Sentiment analysis

In [None]:
for index, row in dumping_df.iterrows():
    # Get the translated_desc into a variable
    description = dumping_df.loc[index, 'public_description']
    if description != '':
        # Get the detect_sentiment response
        response = comprehend.detect_sentiment(
          Text=description, 
          LanguageCode='en')
        # Get the sentiment key value into sentiment column
        dumping_df.loc[index, 'sentiment'] = response['Sentiment']
# Preview the dataframe
dumping_df.head()

![image](assets/boto3/sentiment_text.png)

#### Bringing it all together

In [None]:
for index, row in scooter_requests.iterrows():
    # For every DataFrame row
    desc = scooter_requests.loc[index, 'public_description']
    if desc != '':
        # Detect the dominant language
        resp = comprehend.detect_dominant_language(Text=desc)
        lang_code = resp['Languages'][0]['LanguageCode']
        scooter_requests.loc[index, 'lang'] = lang_code
        # Use the detected language to determine sentiment
        scooter_requests.loc[index, 'sentiment'] = comprehend.detect_sentiment(
          Text=desc, 
          LanguageCode=lang_code)['Sentiment']
# Perform a count of sentiment by group.
counts = scooter_requests.groupby(['sentiment', 'lang']).count()
counts.head()

In [None]:
# Get topic ARN for scooter notifications
topic_arn = sns.create_topic(Name='scooter_notifications')['TopicArn']

for index, row in scooter_requests.iterrows():
    # Check if notification should be sent
    if (row['sentiment'] == 'NEGATIVE') & (row['img_scooter'] == 1):
        # Construct a message to publish to the scooter team.
        message = "Please remove scooter at {}, {}. Description: {}".format(
            row['long'], row['lat'], row['public_description'])

        # Publish the message to the topic!
        sns.publish(TopicArn = topic_arn,
                    Message = message, 
                    Subject = "Scooter Alert")