# **Sentiment Analysis - Amazon Reviews**

In [0]:
import pandas as pd
import numpy as np
import nltk

In [0]:
# download vader lexicon
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [0]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

## Read Data

In [0]:
reviews = pd.read_csv('amazonreviews.tsv', sep='\t')

In [0]:
reviews.head()

Unnamed: 0,label,review
0,pos,Stuning even for the non-gamer: This sound tra...
1,pos,The best soundtrack ever to anything.: I'm rea...
2,pos,Amazing!: This soundtrack is my favorite music...
3,pos,Excellent Soundtrack: I truly like this soundt...
4,pos,"Remember, Pull Your Jaw Off The Floor After He..."


In [0]:
reviews.label.value_counts()

neg    5097
pos    4903
Name: label, dtype: int64

In [0]:
# check if any missing values
reviews.isnull().any()

label     False
review    False
dtype: bool

There are no any missing values, we are good to go..

# Sentiment Analysis

In [0]:
sid = SentimentIntensityAnalyzer()

In [0]:
# calculating sentiment score for a single review
review = reviews.review[2]

In [0]:
print(review)

Amazing!: This soundtrack is my favorite music of all time, hands down. The intense sadness of "Prisoners of Fate" (which means all the more if you've played the game) and the hope in "A Distant Promise" and "Girl who Stole the Star" have been an important inspiration to me personally throughout my teen years. The higher energy tracks like "Chrono Cross ~ Time's Scar~", "Time of the Dreamwatch", and "Chronomantique" (indefinably remeniscent of Chrono Trigger) are all absolutely superb as well.This soundtrack is amazing music, probably the best of this composer's work (I haven't heard the Xenogears soundtrack, so I can't say for sure), and even if you've never played the game, it would be worth twice the price to buy it.I wish I could give it 6 stars.


In [0]:
sid.polarity_scores(review)

{'compound': 0.9858, 'neg': 0.04, 'neu': 0.692, 'pos': 0.268}

The overall sentiment of the text is represented by compound score. 
According to the Vader library the compound score can be interpreted as
- positive sentiment: compound score >= 0.05
- neutral sentiment: (compound score > -0.05) and - (compound score < 0.05)
- negative sentiment: compound score <= -0.05

In [0]:
# creating a function to calssify a text as postive or negative based on the sentiment scores
def sentiment_label (text):
  sid = SentimentIntensityAnalyzer()
  senti_scores = sid.polarity_scores(text)
  compound_score = senti_scores['compound']

  if compound_score >= 0.05:
    return 'Pos'
  elif compound_score > -0.05 and compound_score < 0.05:
    return 'Neutral'
  else:
    return 'Neg'

In [0]:
sentiment_label(review)

'Pos'

We can now apply the function to all the reviews and get the sentiment label.

In [0]:
reviews['pred_label'] = reviews['review'].apply(sentiment_label)

In [0]:
reviews.head()

Unnamed: 0,label,review,pred_label
0,pos,Stuning even for the non-gamer: This sound tra...,Pos
1,pos,The best soundtrack ever to anything.: I'm rea...,Pos
2,pos,Amazing!: This soundtrack is my favorite music...,Pos
3,pos,Excellent Soundtrack: I truly like this soundt...,Pos
4,pos,"Remember, Pull Your Jaw Off The Floor After He...",Pos


In [0]:
# convert pred_label to lower cases to match with original label
reviews['pred_label'] = reviews['pred_label'].str.lower()

In [0]:
reviews.head()

Unnamed: 0,label,review,pred_label
0,pos,Stuning even for the non-gamer: This sound tra...,pos
1,pos,The best soundtrack ever to anything.: I'm rea...,pos
2,pos,Amazing!: This soundtrack is my favorite music...,pos
3,pos,Excellent Soundtrack: I truly like this soundt...,pos
4,pos,"Remember, Pull Your Jaw Off The Floor After He...",pos


### Comparing the original label and predicted label

In [0]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [0]:
# accuracy
accuracy_score(reviews['label'], reviews['pred_label'])

0.6981

In [0]:
print(classification_report(reviews['label'], reviews['pred_label']))

              precision    recall  f1-score   support

         neg       0.86      0.51      0.64      5097
     neutral       0.00      0.00      0.00         0
         pos       0.65      0.90      0.76      4903

    accuracy                           0.70     10000
   macro avg       0.50      0.47      0.46     10000
weighted avg       0.76      0.70      0.70     10000



  _warn_prf(average, modifier, msg_start, len(result))


We can see that there is problem in correctly identifyin the negative reviews compared to the positive reviews.

Lets have a look at some missclassified positve and negative reviews.

In [0]:
miss_label = reviews[reviews['label'] != reviews['pred_label']]
pos_miss = miss_label[reviews['label'] == 'pos']
neg_miss = miss_label[reviews['label'] == 'neg']

In [0]:
# missclassified positive as negative
for review in pos_miss['review'].head():
  print(review)
  print()

Old and good: This book is worth to keep in your collection as it does not only advise what to do with sourdough but Ruth also told you what the picture of past 100 years ago in Alaska where no stand mixer nor any civilized stuffs in kitchen, just a pot of sourdough.

Good but received defective book: I bought this book because we are moving to Germany and I wanted to get a good overview of all the countries in Europe that we could travel too. Unfortunately pages 457-480 on Greece are not in English - looks to be Spanish. There was some sort of printing problem! But I found the rest of the book to give good highlights of the countries.

unknown Africa: You don't expect music from africa to be so profesionally produced. Me from the white world, don't understand the lyrics, but we do understand the music which is a mixture between salsa, soukous and fado with always the african feeling for rithem. The soft almost borred voice of Oliver N'goma fits pefectly in the music. The CD is recorde

In [0]:
# missclassified negative as positive 
for review in neg_miss['review'].head():
  print(review)
  print()

Oh please: I guess you have to be a romance novel lover for this one, and not a very discerning one. All others beware! It is absolute drivel. I figured I was in trouble when a typo is prominently featured on the back cover, but the first page of the book removed all doubt. Wait - maybe I'm missing the point. A quick re-read of the beginning now makes it clear. This has to be an intentional churning of over-heated prose for satiric purposes. Phew, so glad I didn't waste $10.95 after all.

sizes recomended in the size chart are not real: sizes are much smaller than what is recomended in the chart. I tried to put it and sheer it!. I guess you should not buy this item in the internet..it is better to go to the store and check it

mens ultrasheer: This model may be ok for sedentary types, but I'm active and get around alot in my job - consistently found these stockings rolled up down by my ankles! Not Good!! Solution: go with the standard compression stocking, 20-30, stock #114622. Excelle