This notebook showcases how to perform sentiment analysis on Amazon reviews using the VADER(Valence Aware Dictionary and Sentiment Reasoner) sentiment analysis tool from the Natural Language Toolkit (NLTK). VADER is a lexicon and rule-based sentiment analysis tool specifically designed to analyze social media texts but can be effectively applied to other domains, such as product reviews.

In [None]:
# Installation: Natural Language Tool Kit
!pip install nltk

In [None]:
# first time
import nltk
nltk.download('all')

In [None]:
# importing required librarires
import pandas as pd
import nltk
# for preprocessing the text 
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
# Vader rule based tool to analyse the sentiment of the text
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# gathering dataset
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/amazon.csv')
df

Unnamed: 0,reviewText,Positive
0,This is a one of the best apps acording to a b...,1
1,This is a pretty good version of the game for ...,1
2,this is a really cool game. there are a bunch ...,1
3,"This is a silly game and can be frustrating, b...",1
4,This is a terrific game on any pad. Hrs of fun...,1
...,...,...
19995,this app is fricken stupid.it froze on the kin...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1
19998,I love love love this app on my side of fashio...,1


In [None]:
def preprocess_text(text):
    '''
    Function to preprocess the input text by removing stop words which doesnot signify any emotions and reducing words to the base form using lemmatisation and stemming.
    '''
    token_words = word_tokenize(text)
    filtered_tokens = [token for token in token_words if token.lower() not in stopwords.words('english')]
    lemmatiser = WordNetLemmatizer()
    lemmatized_text = [lemmatiser.lemmatize(token) for token in filtered_tokens]
    preprocessed_text = ' '.join(lemmatized_text)
    return preprocessed_text

In [37]:
df['reviewText'] = df['reviewText'].apply(preprocess_text)

In [40]:
df['reviewText'].head()

0    one best apps acording bunch people agree bomb...
1    pretty good version game free . LOTS different...
2    really cool game . bunch level find golden egg...
3    silly game frustrating , lot fun definitely re...
4    terrific game pad . Hrs fun . grandkids love ....
Name: reviewText, dtype: object

In [19]:
# loading vader rule based sentiment analyser
analyser = SentimentIntensityAnalyzer()
def get_sentiment(text):
    
    # getting compound score for each text which is a normalised score between -1 & 1 to which defines the sentiment
    scores = analyser.polarity_scores(text)
    print(scores)
    # if compound score >0.05 positive comments else negative
    sentiment = 1.0 if scores['compound'] > 0.05 else 0
    return sentiment

In [None]:
df['sentiment'] = df['reviewText'].apply(get_sentiment)
df['sentiment']

Evaluation - accuracy, precision, recall using sklearn

In [21]:
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(df['Positive'], df['sentiment']))
print(confusion_matrix(df['Positive'], df['sentiment']))

              precision    recall  f1-score   support

           0       0.63      0.59      0.61      4767
           1       0.88      0.89      0.88     15233

    accuracy                           0.82     20000
   macro avg       0.75      0.74      0.75     20000
weighted avg       0.82      0.82      0.82     20000

[[ 2833  1934]
 [ 1662 13571]]


Conclusion : VADER is a rule-based sentiment analyzer with approximately 82% accuracy, primarily effective for social media analysis. Accuracy can be enhanced using machine learning models or hybrid approaches that integrate contextual understanding and domain-specific lexicon tweaks. Advanced NLP techniques, such as transformers, also help in capturing nuanced sentiments and improving overall performance