# Amazon Comprehend - Sentiment Example
###  Assess sentiment of customer review

Objective: Use Comprehend Service to detect sentiment

Input: Customer Review headline and body  
Output: Overall sentiment and scores for Positive, Negative, Neutral, Mixed  

https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html  

Dataset and Problem Description:  
https://s3.amazonaws.com/amazon-reviews-pds/readme.html   
https://s3.console.aws.amazon.com/s3/buckets/amazon-reviews-pds/?region=us-east-2  

File: s3://amazon-reviews-pds/tsv/amazon_reviews_us_Major_Appliances_v1_00.tsv.gz

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

# Connect to Comprehend to get Sentiment
import boto3

### Download Customer Reviews from Amazon Public Dataset

In [2]:
!aws s3 cp s3://amazon-reviews-pds/tsv/amazon_reviews_us_Major_Appliances_v1_00.tsv.gz .

download: s3://amazon-reviews-pds/tsv/amazon_reviews_us_Major_Appliances_v1_00.tsv.gz to ./amazon_reviews_us_Major_Appliances_v1_00.tsv.gz


### Prepare Training and Test data 

In [3]:
df = pd.read_csv('amazon_reviews_us_Major_Appliances_v1_00.tsv.gz',
                 sep='\t',error_bad_lines=False,warn_bad_lines=True)#,nrows=1000)

b'Skipping line 5583: expected 15 fields, saw 22\nSkipping line 22814: expected 15 fields, saw 22\nSkipping line 22883: expected 15 fields, saw 22\nSkipping line 29872: expected 15 fields, saw 22\nSkipping line 37242: expected 15 fields, saw 22\nSkipping line 59693: expected 15 fields, saw 22\n'


In [4]:
print('Rows: {0}, Columns: {1}'.format(df.shape[0],df.shape[1]))

Rows: 96834, Columns: 15


In [5]:
df.index.max()

96833

In [6]:
df.columns

Index(['marketplace', 'customer_id', 'review_id', 'product_id',
       'product_parent', 'product_title', 'product_category', 'star_rating',
       'helpful_votes', 'total_votes', 'vine', 'verified_purchase',
       'review_headline', 'review_body', 'review_date'],
      dtype='object')

In [7]:
df.isna().any(axis=0)

marketplace          False
customer_id          False
review_id            False
product_id           False
product_parent       False
product_title        False
product_category     False
star_rating          False
helpful_votes        False
total_votes          False
vine                 False
verified_purchase    False
review_headline       True
review_body           True
review_date           True
dtype: bool

In [8]:
# Look for any rows that have NA
rows_missing_values = df.isna().any(axis=1)

In [9]:
df[rows_missing_values]

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
3254,US,29686651,R1DGB2U8KV9HKP,B00GOFUISY,138120585,"FIREBIRD New 36"" European Style Wall Mount Sta...",Major Appliances,3,2,3,N,Y,Three Stars,,2015-08-06
8640,US,733945,R1QOC0UPHADKFZ,B00GHXU3VA,931824698,"GOLDEN VANTAGE 30"" European Style Ventless/Duc...",Major Appliances,5,9,9,N,Y,Five Stars,,2015-06-20
11556,US,18030318,R36Z529A9SVZ14,B002ROS27U,461806580,Whynter UIM-155 Stainless Steel Built-In Ice M...,Major Appliances,1,54,72,N,Y,One Star,,2015-05-25
15500,US,52655156,RZR2BV8UJXB3J,B00DNSO2UK,316513931,Haier Wine Cellar with Electronic Controls,Major Appliances,5,10,10,N,Y,Working great so far!,,2015-04-16
16453,US,24105158,R15NCTE2RINP6W,B005KT4LK6,236627965,Whirlpool WTW8800YW Cabrio 4.6 Cu. Ft. White T...,Major Appliances,1,1,1,N,N,One Star,,2015-04-07
22583,US,48624154,R223L9DVYCY4J5,B000S0PRNM,136191470,LG : WM2233HW 27 XL Front-Load Washer - White,Major Appliances,1,0,0,N,N,the worse washer and dryer set ever,,2015-02-15
29680,US,51669844,R18VF51XXHU2UE,B00DOHHZHM,221894244,Koolatron Beer Keg Cooler Brown,Major Appliances,1,10,18,N,N,Paperweight,,2014-12-13
36130,US,8711378,R3NXEY6CSAUFR,B0050KKS5C,758706493,316075103 BAKE ELEMENT REPAIR PART FOR FRIGIDA...,Major Appliances,3,1,1,N,Y,,Did not fix problem - my fault.,2014-10-08
95250,US,14267148,R3FCCZQ31S2Z4Q,B000IN22I2,99564707,"Igloo FR28WH 2.8-Cu-Ft Refrigerator, White",Major Appliances,5,0,0,N,N,"It does what it says on the tin""\tThis little ...",2008-07-21,


In [10]:
df['review_headline'] = df['review_headline'].fillna(' ')
df['review_body'] = df['review_body'].fillna(' ')

In [11]:
# Replace embedded new lines, tabs and carriage return
pattern = r'[\n\t\r]+'

In [12]:
# Use Regex module sub method to identify patterns of interest and replace the matching text.
text = 'ab,cd\n\tef'

print('original text:', text)

print('after substituition:', re.sub(pattern,' ', text))

original text: ab,cd
	ef
after substituition: ab,cd ef


In [13]:
df['product_title'] = df['product_title'].map(lambda x: re.sub(pattern,' ',x))
df['review_headline'] = df['review_headline'].map(lambda x: re.sub(pattern,' ',x))
df['review_body'] = df['review_body'].map(lambda x: re.sub(pattern,' ',x))

In [14]:
df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
0,US,16199106,R203HPW78Z7N4K,B0067WNSZY,633038551,"FGGF3032MW Gallery Series 30"" Wide Freestandin...",Major Appliances,5,0,0,N,Y,"If you need a new stove, this is a winner.",What a great stove. What a wonderful replacem...,2015-08-31
1,US,16374060,R2EAIGVLEALSP3,B002QSXK60,811766671,Best Hand Clothes Wringer,Major Appliances,5,1,1,N,Y,Five Stars,worked great,2015-08-31
2,US,15322085,R1K1CD73HHLILA,B00EC452R6,345562728,Supco SET184 Thermal Cutoff Kit,Major Appliances,5,0,0,N,Y,Fast Shipping,Part exactly what I needed. Saved by purchasi...,2015-08-31
3,US,32004835,R2KZBMOFRMYOPO,B00MVVIF2G,563052763,Midea WHS-160RB1 Compact Single Reversible Doo...,Major Appliances,5,1,1,N,Y,Five Stars,Love my refrigerator! ! Keeps everything cold...,2015-08-31
4,US,25414497,R6BIZOZY6UD01,B00IY7BNUW,874236579,Avalon Bay Portable Ice Maker,Major Appliances,5,0,0,N,Y,Five Stars,No more running to the store for ice! Works p...,2015-08-31


In [None]:
df['review_body'].head()

In [15]:
# Some examples of review title and body
for i in range(10):
    print(df.iloc[i]['review_headline'] + ' - ' + df.iloc[i]['review_body'])
    print()

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.

Five Stars - worked great

Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.

Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!

Five Stars - No more running to the store for ice!  Works perfectly.

Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.

Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.

Five Stars - exactly what I wanted!

Four Stars - AS advertised

but has poor insulation in the top - It works as advertised, but has poor insulation in the top. Like the 3rd shelf, it comes in h

### Get Sentiment of Reviews using Comprehend AI Service

In [16]:
session = boto3.Session(region_name='us-east-1')

In [17]:
client = session.client('comprehend')

In [19]:
# Try some examples
sentiment = client.detect_sentiment(
    Text="It's insulting that @awscloud marked an EBS volume limit increase support request as low severity but I can't do anything while I wait.",
    LanguageCode='en'
)

In [20]:
sentiment['Sentiment'],sentiment['SentimentScore']

('NEGATIVE',
 {'Positive': 0.006640770472586155,
  'Negative': 0.9318274259567261,
  'Neutral': 0.061524178832769394,
  'Mixed': 7.676783752685878e-06})

In [26]:
%%time
# Sentiment of reviews -
# One roundtrip to comprehend service for each review
for i in range(15):
    review = df.iloc[i]['review_headline'] + ' - ' + df.iloc[i]['review_body']
    print(review)
    sentiment = client.detect_sentiment(Text=review,LanguageCode='en')
    print(sentiment['Sentiment'])
    print()

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.
POSITIVE

Five Stars - worked great
POSITIVE

Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.
POSITIVE

Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!
POSITIVE

Five Stars - No more running to the store for ice!  Works perfectly.
POSITIVE

Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.
NEGATIVE

Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.
POSITIVE

Five Stars - exactly what I wanted!
POSITIVE

Four Stars - AS advertised
POSITIVE

but has poor insulation in the top - It works as 

In [27]:
%%time

# Batch Processing - Upto 25 reviews in one roundtrip
results = []

# Let's get sentiment for first 15 reviews
review = list((df.iloc[0:15]['review_headline'] + ' - ' + df.iloc[0:15]['review_body'].str.slice(0,4000)).values)    
#print(review)

# initialize place holder for return values
temp_results = ['']*len(review)

sentiment = client.batch_detect_sentiment(TextList=review,LanguageCode='en')

# Get the sentiment
for s in sentiment['ResultList']:
    #print(s['Index']+i,s['Sentiment'])
    temp_results[s['Index']] = s['Sentiment']

# Check for errors
for s in sentiment['ErrorList']:
    #print(s['Index']+i,s['ErrorCode'])    
    temp_results[s['Index']] = s['ErrorCode']

results.extend(temp_results)

CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 222 ms


In [28]:
for idx, r in enumerate(review):
    print(r)
    print(results[idx])

If you need a new stove, this is a winner. - What a great stove.  What a wonderful replacement for my sort of antique.  Enjoy it every day.
POSITIVE
Five Stars - worked great
POSITIVE
Fast Shipping - Part exactly what I needed.  Saved by purchasing myself.
POSITIVE
Five Stars - Love my refrigerator! ! Keeps everything  cold..will recommend!
POSITIVE
Five Stars - No more running to the store for ice!  Works perfectly.
POSITIVE
Piece of Junk - It would not cool below 55 degrees and has now stopped working all together.  I would NOT recommend this piece of junk to anyone.
NEGATIVE
Works awesome for apt size 110 dryer - Works awesome for apt size 110 dryer. Handles load from apt size washer just fine. It does take longer to dry. Electric cost savings over a full size 220 is worth the time. Does not add much humidity unless lint filter is full.
POSITIVE
Five Stars - exactly what I wanted!
POSITIVE
Four Stars - AS advertised
POSITIVE
but has poor insulation in the top - It works as advertise

### Get Sentiment for All Reviews

### Warning: Below code accumulated USD 65 in charges for around 100,000 reviews.  
### Do not run this code as you will incur the charges.
### I have commented the below cell
### Use the file with sentiments that I generated : customer_reviews_with_sentiment_compressed.txt.gz 

### Copy the file to your bucket
#### NOTE: Change the bucket name 'aws-glue-cl' to point to your bucket

In [30]:
!aws s3 cp customer_reviews_with_sentiment_compressed.txt.gz s3://aws-glue-cl/customer_review/customer_reviews_with_sentiment_compressed.txt.gz

upload: ./customer_reviews_with_sentiment_compressed.txt.gz to s3://aws-glue-cl/customer_review/customer_reviews_with_sentiment_compressed.txt.gz
