# Amazon Comprehend - Sentiment Example
###  Assess sentiment of customer review

Objective: Use Comprehend Service to detect sentiment

Input: Customer Review headline and body  
Output: Overall sentiment and scores for Positive, Negative, Neutral, Mixed  

https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html  

### Customer Reviews for Major Appliances

**amazon reviews public dataset is no longer accessible.***

**Please utilize the file included in the course Git repository: data\customer_reviews_with_sentiment.parquet.**  

**It contains nearly 97,000 customer reviews for major appliances. Sentiment of the review is also included in this dataset**

In [None]:
## Install pyarrow package

!pip install pyarrow

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import pyarrow.parquet as pq

import boto3

### Load data

In [3]:
parquet_file_name = r".\data\customer_reviews_with_sentiment.parquet"

In [4]:
df = pd.read_parquet(parquet_file_name)

In [5]:
print('Rows: {0}, Columns: {1}'.format(df.shape[0],df.shape[1]))

Rows: 96834, Columns: 16


In [6]:
df.index.max()

96833

In [20]:
# The original review dataset did not have sentiment. 
# To save time and money, I have included the review sentiment in this file!

# In this notebook, we will learn how to use comprehend service to assess sentiment
# for a few reviews!
df.columns

Index(['marketplace', 'customer_id', 'review_id', 'product_id',
       'product_parent', 'product_title', 'product_category', 'star_rating',
       'helpful_votes', 'total_votes', 'vine', 'verified_purchase',
       'review_headline', 'review_body', 'review_date', 'sentiment'],
      dtype='object')

In [8]:
df.isna().any(axis=0)

marketplace          False
customer_id          False
review_id            False
product_id           False
product_parent       False
product_title        False
product_category     False
star_rating          False
helpful_votes        False
total_votes          False
vine                 False
verified_purchase    False
review_headline      False
review_body          False
review_date           True
sentiment            False
dtype: bool

In [9]:
# Look for any rows that have NA
rows_missing_values = df.isna().any(axis=1)

In [10]:
df[rows_missing_values]

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date,sentiment
95250,US,14267148,R3FCCZQ31S2Z4Q,B000IN22I2,99564707,"Igloo FR28WH 2.8-Cu-Ft Refrigerator, White",Major Appliances,5,0,0,N,N,"It does what it says on the tin"" This little f...",2008-07-21,,POSITIVE


In [11]:
# Replace embedded new lines, tabs and carriage return
pattern = r'[\n\t\r]+'

In [12]:
# Use Regex module sub method to identify patterns of interest and replace the matching text.
text = 'ab,cd\n\tef'

print('original text:', text)

print('after substituition:', re.sub(pattern,' ', text))

original text: ab,cd
	ef
after substituition: ab,cd ef


In [13]:
df['product_title'] = df['product_title'].map(lambda x: re.sub(pattern,' ',x))
df['review_headline'] = df['review_headline'].map(lambda x: re.sub(pattern,' ',x))
df['review_body'] = df['review_body'].map(lambda x: re.sub(pattern,' ',x))

In [14]:
df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date,sentiment
0,US,16199106,R203HPW78Z7N4K,B0067WNSZY,633038551,"FGGF3032MW Gallery Series 30"" Wide Freestandin...",Major Appliances,5,0,0,N,Y,"If you need a new stove, this is a winner.",What a great stove. What a wonderful replacem...,2015-08-31,POSITIVE
1,US,16374060,R2EAIGVLEALSP3,B002QSXK60,811766671,Best Hand Clothes Wringer,Major Appliances,5,1,1,N,Y,Five Stars,worked great,2015-08-31,POSITIVE
2,US,15322085,R1K1CD73HHLILA,B00EC452R6,345562728,Supco SET184 Thermal Cutoff Kit,Major Appliances,5,0,0,N,Y,Fast Shipping,Part exactly what I needed. Saved by purchasi...,2015-08-31,POSITIVE
3,US,32004835,R2KZBMOFRMYOPO,B00MVVIF2G,563052763,Midea WHS-160RB1 Compact Single Reversible Doo...,Major Appliances,5,1,1,N,Y,Five Stars,Love my refrigerator! ! Keeps everything cold...,2015-08-31,POSITIVE
4,US,25414497,R6BIZOZY6UD01,B00IY7BNUW,874236579,Avalon Bay Portable Ice Maker,Major Appliances,5,0,0,N,Y,Five Stars,No more running to the store for ice! Works p...,2015-08-31,POSITIVE


In [15]:
df['review_body'].head()

0    What a great stove.  What a wonderful replacem...
1                                         worked great
2    Part exactly what I needed.  Saved by purchasi...
3    Love my refrigerator! ! Keeps everything  cold...
4    No more running to the store for ice!  Works p...
Name: review_body, dtype: object

In [16]:
# Some examples of review title and body
for i in range(10):
    print(df.iloc[i]['review_headline'] + ' - ' + df.iloc[i]['review_body'])
    print()

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.

Five Stars - worked great

Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.

Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!

Five Stars - No more running to the store for ice!  Works perfectly.

Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.

Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.

Five Stars - exactly what I wanted!

Four Stars - AS advertised

but has poor insulation in the top - It works as advertised, but has poor insulation in the top. Like the 3rd shelf, it comes in h

### Get Sentiment of Reviews using Comprehend AI Service

**Warning: For 100,000 reviews, Comprehend's detect sentiment charges were  USD 65**
  
**So, for the labs, we use comprehend to assess sentiment for 15 reviews**

In [17]:
session = boto3.Session(region_name='us-east-1')

In [18]:
client = session.client('comprehend')

In [None]:
# Try some examples
sentiment = client.detect_sentiment(
    Text="It's insulting that @awscloud marked an EBS volume limit increase support request as low severity but I can't do anything while I wait.",
    LanguageCode='en'
)

In [None]:
sentiment['Sentiment'],sentiment['SentimentScore']

In [26]:
%%time
# Sentiment of reviews -
# One roundtrip to comprehend service for each review
for i in range(15):
    review = df.iloc[i]['review_headline'] + ' - ' + df.iloc[i]['review_body']
    print(review)
    sentiment = client.detect_sentiment(Text=review,LanguageCode='en')
    print(sentiment['Sentiment'])
    print()

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.
POSITIVE

Five Stars - worked great
POSITIVE

Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.
POSITIVE

Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!
POSITIVE

Five Stars - No more running to the store for ice!  Works perfectly.
POSITIVE

Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.
NEGATIVE

Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.
POSITIVE

Five Stars - exactly what I wanted!
POSITIVE

Four Stars - AS advertised
POSITIVE

but has poor insulation in the top - It works as 

In [27]:
%%time

# Batch Processing - Upto 25 reviews in one roundtrip
results = []

# Let's get sentiment for first 15 reviews
review = list((df.iloc[0:15]['review_headline'] + ' - ' + df.iloc[0:15]['review_body'].str.slice(0,4000)).values)    
#print(review)

# initialize place holder for return values
temp_results = ['']*len(review)

sentiment = client.batch_detect_sentiment(TextList=review,LanguageCode='en')

# Get the sentiment
for s in sentiment['ResultList']:
    #print(s['Index']+i,s['Sentiment'])
    temp_results[s['Index']] = s['Sentiment']

# Check for errors
for s in sentiment['ErrorList']:
    #print(s['Index']+i,s['ErrorCode'])    
    temp_results[s['Index']] = s['ErrorCode']

results.extend(temp_results)

CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 222 ms


In [28]:
for idx, r in enumerate(review):
    print(r)
    print(results[idx])

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.
POSITIVE
Five Stars - worked great
POSITIVE
Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.
POSITIVE
Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!
POSITIVE
Five Stars - No more running to the store for ice!  Works perfectly.
POSITIVE
Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.
NEGATIVE
Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.
POSITIVE
Five Stars - exactly what I wanted!
POSITIVE
Four Stars - AS advertised
POSITIVE
but has poor insulation in the top - It works as advertise