
                                       AWS Comprehend Sentiment Analysis Using Python
This notebook shows how to use boto3 Amazon API to use Amazon Comprehend for real time analysis as well as scheduling analysis jobs.

1. For boto3 to work you need to create an IAM User, receive aws_access_key_id and aws_secret_access_key and configure your credentials using AWS Command Line Interface (AWS CLI)
2. Cost. If you are using free AWS tier, you can analyze 50K units a month free. Every unit is 100 characters. In my example, every tweet is ~2 units. In the scheduled job I am analyzing 10K tweets at once, so the free tier runs out pretty fast, and then it's $1 per 10K. Be sure to check pricing before you proceed. https://aws.amazon.com/comprehend/pricing/
3. Reference. Boto3 S3: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend.html Boto3 Comprehend: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html

In [3]:
import boto3
from botocore.exceptions import ClientError
import pandas as pd
import json
import tarfile
from dotenv import load_dotenv
load_dotenv()

link_data = "https://github.com/Amul-Thantharate/Comprihend-2-With-Pandas/blob/main/wallmarts_tweets.csv?raw=true"
local_file_name = "Comprehend/wallmarts_tweets_1k.csv"
df = pd.read_csv(link_data, header=None, names=['wallmart_tweets'], dtype=str, encoding='utf-8')
df.to_csv(local_file_name, index=False, header=False, encoding='utf-8')
df.head()


Unnamed: 0,wallmart_tweets
0,Tony Hawk’s Pro Skater 1+2 (PS4) is $33.88 on ...
1,@CassieFambro we were just saying that yesterd...
2,@lxoG21 I love me some Walmart candles lol the...
3,I actually am too 🤔 need to go shopping. 24/7 ...
4,@diancalondon Bill was.....Sunday morning Khak...


                                            Real Time Single Record Processing 
                                        Using this type of processing you can analyze one piece of text of up to 5K bytes long.

In [7]:
import boto3
text = df.loc[4].item()
# print(text)
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')
sentiment_output = comprehend.detect_sentiment(Text=text, LanguageCode='en')
# sentiment_output
sentiment_output['SentimentScore']
sentiment_output['Sentiment']

@diancalondon Bill was.....Sunday morning Khaki Walmart fly....in his own way...the heart wants what it wants. Yeah. Maybe the pickens are slim midwest? The only one I understood and felt bad for was Barb, because she felt she owed Bill for being there. https://t.co/BOCvIDvAmc


'NEUTRAL'

                                                                    Real-Time Batch Processing 
*** Up to 25 documents of up to 5,000 bytes each, submitted in a list. For larger jobs, use the Async Batch API. ***

                                                                    

In [16]:
text_list = list(df.wallmart_tweets[0:25])
# print(text_list)
sentiment_batch = comprehend.batch_detect_sentiment(TextList=text_list, LanguageCode='en')
text_list[10]
sentiment_batch['ResultList'][10]

{'Index': 10,
 'Sentiment': 'NEUTRAL',
 'SentimentScore': {'Positive': 0.31565114855766296,
  'Negative': 0.15727275609970093,
  'Neutral': 0.5268921256065369,
  'Mixed': 0.0001840569166233763}}

In [20]:

def parse_sentiment_batch(data):
    df = pd.DataFrame([item['SentimentScore'] for item in data['ResultList']])
    df['Sentiment'] = [item.get('Sentiment') for item in data['ResultList']]
    df['Index'] = [item.get('Index') for item in data['ResultList']]
    df.set_index('Index', inplace=True)
    
    return df
print(text)
parse_sentiment_batch(sentiment_batch).head()

@diancalondon Bill was.....Sunday morning Khaki Walmart fly....in his own way...the heart wants what it wants. Yeah. Maybe the pickens are slim midwest? The only one I understood and felt bad for was Barb, because she felt she owed Bill for being there. https://t.co/BOCvIDvAmc


Unnamed: 0_level_0,Positive,Negative,Neutral,Mixed,Sentiment
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0.000829,8.5e-05,0.999075,1.1e-05,NEUTRAL
1,0.033051,0.472191,0.494564,0.000194,NEUTRAL
2,0.994689,0.00011,0.005175,2.7e-05,POSITIVE
3,0.079764,0.101556,0.812737,0.005943,NEUTRAL
4,0.074933,0.331529,0.593397,0.00014,NEUTRAL
