# Getting insight from customer reviews using Amazon Comprehend

## Comprehend Topic Mapping & Sentiment Analysis Notebook
In the previous Notebook we performed topic modeling job. Now in this Notebook we will use the output of the topic modeling job and map it with the topic names. We also understand how sentiment of the reviews are w.r.t to the item and the associated topic to have an aggregated view





### Import Libararies

In [None]:
# Library imports
import pandas as pd
import boto3
from collections import Counter

# boto3 session to access service
session = boto3.Session()
comprehend = boto3.client(  'comprehend',
                            region_name=session.region_name)

### Input Paths

In [None]:
# S3 bucket
BUCKET = 'clothing-shoe-jewel-tm-blog'
# Final dataframe on S3 where the topics/sentiments are joined with the original dataset
S3_FEEDBACK_TOPICS = f's3://{BUCKET}/out/FinalDataframe.csv'

### Output paths

In [None]:
# Final raw output and final aggregated version of the output
S3_RAW_OUTPUT = f's3://{BUCKET}/out/_RawOutput.csv'
S3_OUTPUT = f's3://{BUCKET}/out/_TopicKeySentiment.csv'

## Variables

In [None]:
# Top 3 topics per product will be aggregated
TOP_TOPICS = 3

# Working on English language only. 
language_code = 'en'

In [None]:
# Topic names for 5 topics created by human-in-the-loop or SME feed
topicMaps = {
    0: 'Product comfortability',
    1: 'Product Quality and Price',
    2: 'Product Size',
    3: 'Product Color',
    4: 'Product Return',
}

## Processing doc-topics to list document-topic no. mapping

In [None]:
# Loading documents and topics assigned to each of them by Comprehend
docTopics = pd.read_csv('comprehend-out/doc-topics.csv')
docTopics.head()

In [None]:
# Creating a field with doc number. 
# This doc number is the line number of the input file to Comprehend.
docTopics['doc'] = docTopics['docname'].str.split(':').str[1]
docTopics['doc'] = docTopics['doc'].astype(int)
docTopics.head()

## Generate topic names from topic terms

In [None]:
# Load topics and associated terms
topicTerms = pd.read_csv('comprehend-out/topic-terms.csv')

In [None]:
# Consolidate terms for each topic
aggregatedTerms = topicTerms.groupby('topic')['term'].aggregate(lambda term: term.unique().tolist()).reset_index()

In [None]:
# Sneak peek
aggregatedTerms.head(10)

In [None]:
# Map topic names to topic number
aggregatedTerms['TopicNames'] = aggregatedTerms['topic'].apply(lambda x:topicMaps[x])

In [None]:
# Sneak peek
aggregatedTerms.head(10)

## Load main feedback data

In [None]:
# Load final dataframe where Comprehend results will be merged to 
feedbackTopics = pd.read_csv(S3_FEEDBACK_TOPICS)

## Adding back topic number, terms, and names to main data

In [None]:
# Joining topic numbers to main data
# The index of feedbackTopics is referring to doc field of docTopics dataframe
feedbackTopics = pd.merge(feedbackTopics, 
                          docTopics, 
                          left_index=True, 
                          right_on='doc', 
                          how='left')

In [None]:
# Reviews will now have topic numbers, associated terms and topics names
feedbackTopics = feedbackTopics.merge(aggregatedTerms, 
                                      on='topic', 
                                      how='left')
feedbackTopics.head()

## Generate sentiments for each feedback

In [None]:
# Function for detecting sentiment
def detect_sentiment(text, language_code):
    comprehend_json_out = comprehend.detect_sentiment(Text=text, LanguageCode=language_code)
    return comprehend_json_out

In [None]:
# Comprehend output for sentiment in raw json 
feedbackTopics['comprehend_sentiment_json_out'] = feedbackTopics['reviewText_cleaned'].apply(lambda x: detect_sentiment(x, language_code))

In [None]:
# Extracting the exact sentiment from raw Comprehend Json
feedbackTopics['sentiment'] = feedbackTopics['comprehend_sentiment_json_out'].apply(lambda x: x['Sentiment'])

In [None]:
# Sneak peek
feedbackTopics.head(2)

## Combining Topics and Sentiments

In [None]:
# Creating a composite key of topic name and sentiment.
# This is because we are counting frequency of this combination.
feedbackTopics['TopicSentiment'] = feedbackTopics['TopicNames'] + '_' + feedbackTopics['sentiment']

In [None]:
# Sneak peek
feedbackTopics.head(2)

## Aggregate topics and sentiment for each item

In [None]:
# Create product id group
asinWiseDF = feedbackTopics.groupby('asin')

In [None]:
# Each product now has a list of topics and sentiment combo (topics can appear multiple times)
topicDF = asinWiseDF['TopicSentiment'].apply(lambda x:list(x)).reset_index()

In [None]:
# Count appreances of topics-sentiment combo for product
topicDF['TopTopics'] = topicDF['TopicSentiment'].apply(Counter)

In [None]:
# Sorting topics-sentiment combo based on their appearance
topicDF['TopTopics'] = topicDF['TopTopics'].apply(lambda x: sorted(x, key=x.get, reverse=True))

In [None]:
# Select Top k topics-sentiment combo for each product/review
topicDF['TopTopics'] = topicDF['TopTopics'].apply(lambda x: x[:TOP_TOPICS])

In [None]:
# Sneak peek
topicDF.head()

In [None]:
# Saving topics-sentiment combo for each item locally
topicDF.to_csv('data_out/topic_sentiment.csv', index=False)

In [None]:
# Loading product metadata to add to reviews and their Comprehend information
meta_data = pd.read_csv('data/meta_data.csv')

In [None]:
# Adding the topic-sentiment combo back to product metadata
finalDF = topicDF.merge(meta_data, on='asin', how='left')

In [None]:
# Only selecting a subset of fields
finalDF = finalDF[['asin', 'TopTopics', 'category', 'title']]

In [None]:
# Frequency of sentiments for all reviews
feedbackTopics['sentiment'].value_counts()

In [None]:
# Saving the final output locally
finalDF.to_csv('data_out/reviewTopicsSentiments.csv', index=False)