<style>
  .centered {
    text-align: center;
    font-size: 40px;
    font-weight: bold;
  }
</style>

<p class="centered">Sentiment Analysis</p>

**What is sentiment analysis?** 

In simple words, Sentiment analysis is defined as the process of mining of data, view, review or sentence to predict the emotion of the sentence through natural language processing (NLP), a branch of computer science concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. The sentiment analysis involve classification of text into three phase “Positive”, “Negative” or“Neutral”. It analyzes the data and labels the ‘better’ and ‘worse’ sentiment as positive and negative respectively.

Sentiment Analysis is very helpful in a variety of applications, in this case it is used to understand the real customer feedbacks based on their comments and reviews.

---
To proceed with this analysis I tried to answer different questions and to check if my assumptions were right or not.

From the modeling point of view, different approaches were used:
- an approach using the powerful functionalities of the library NLTK (Natural Language ToolKit - https://www.nltk.org/) with the VADER model;
- some Machine Learning models (KNN, Naive Bayes, Logistic Regression, SVM, Random Forest, XGBoost) along with pre-trained Deep Learning models (such as HuggingFace's RoBERTa);
- extra: use of built pipelines for making sentiment analysis really quick and easy (this will be really useful for the streamlit sentiment analyzer webapp).

---

This is the analyzed product: 
- Product: https://www.amazon.co.uk/PreSonus-3-5-inch-High-Definition-Active-Monitors/dp/B075QVMBT9/ref=cm_cr_arp_d_product_top?ie=UTF8
- Reviews: https://www.amazon.co.uk/product-reviews/B075QVMBT9/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&sortBy=recent&pageNumber=1

## Importing Dependencies

In [21]:
import matplotlib.pyplot as plt
from matplotlib import style
from datetime import datetime
import pandas as pd
import numpy as np
import seaborn as sns
import json
import re

In [22]:
pd.set_option("max_colwidth", None)
plt.style.use('ggplot')

## Loading Data

After the data scraping/mining step i ended up with a json file which needs to be converted into a pandas dataframe to simplify the analysis. 

This is the purpose of the **json_2_pandas** function: it takes as input the path where the json file is located, than opens it in 'read' ('r') mode to load the data. After loading the data, I iterate through the object to extract review titles, ratings and contents and adding them to a dictionary.

Then there is another function, the **format_date** function, that uses the datetime module to convert the dates in a easier format for pandas conversion.

In [23]:
def format_date(css_date):
    # only taking the date and joining those elements in a string
    date = css_date.split()[len(css_date.split())-3:]
    date_string = " ".join(date)

    # change format
    date_object = datetime.strptime(date_string, '%d %B %Y')
    formatted_date = date_object.strftime('%Y-%m-%d')
    return formatted_date

In [24]:
def json_2_pandas(json_path):
    with open(json_path, 'r') as json_file:
        data = json.load(json_file)
    
    reviews = {"Date": [],
               "Title": [],
               "Rating": [],
               "Content": []}

    for page in data:
        if len(page) != 0:          # if there are reviews in that page list
            for review in page:
                # append to the lists in the dictionary the desired elements
                reviews['Date'].append(format_date(review['place and date']))
                reviews['Title'].append(review["title"])
                reviews['Rating'].append(int(review["rating"][:1]))
                reviews['Content'].append(review["body"])
    
    reviews = pd.DataFrame.from_dict(reviews)
    return reviews

Converting the .json file into a pandas DataFrame.

In [25]:
path = 'G:\Il mio Drive\MAGISTRALE\IT Coding\Project\Sentiment-Analysis-on-Amazon-product-reviews\Data\B075QVMBT9_reviews.json'
df = json_2_pandas(json_path = path)

In [26]:
print('Before correction: ', df.iloc[2,1])
df.iloc[2,1] = "Its a beauty"           # can correct it right away since I saw it
print('After correction: ', df.iloc[2,1])

Before correction:  Its a beuaty
After correction:  Its a beauty


In [27]:
print(df.iloc[16]['Content'])

Bought these speakers to use for my new gaming pc and they haven’t disappointed. Well packaged, look and sound great. I often listen to music such as House so wanted speakers that could also have good bass and they would perform. Brilliant for such a low price.


In [28]:
print(df.iloc[13]['Content'])

Great set of speakers. Good quality sound. Easy connections. Very happy 🙂👍


As we can see from these examples, the first one presents some typing errors which can obviously occur when writing a review. Then we see in the second example that long reviews have been scrapped properly, and in the third example we notice also the presence of emoticons.

But there could be some missing values: let's check.

In [29]:
df[df['Content'].str.len() == 0]

Unnamed: 0,Date,Title,Rating,Content
12,2023-05-07,Great deal.,5,
53,2023-03-04,Impressive,5,
78,2022-12-29,"Loud, and very clear audio",5,
167,2022-06-24,Awesome! General balanced sound!,5,
291,2021-10-29,great job presonus,5,
296,2021-10-17,Amazing,5,
387,2021-04-12,The best speakers I have ever owned,5,
563,2020-05-07,Amazing Sound. Best I have ever heard,5,
601,2020-02-14,Good quality,3,


We have some missing review contents, but thanks to the title and the rating (as we can see) we can draw some sentiment insights either way! In fact we can already say that these empty reviews are all highly positive, apart from the last one (601) which is pretty neutral, exposing a comment about the good quality of the product but nothing more.

## Data Cleaning and PreProcessing

In [30]:
# convert first everything in lower case to maximize matching of stopwords (and not only)
df['Title'] = df['Title'].str.lower()
df['Content'] = df['Content'].str.lower()

### DATE

Quick date manipulation to obtain 3 columns Day, Month and Year:

In [31]:
# splitting day, month and year in 3 separate columns
date = df['Date'].str.split("-", n=2, expand=True)      # splitting all the values in the column at most 2 times
df['Year'] = date[0].astype(int)
df['Month'] = date[1].astype(int)
df['Day'] = date[2].astype(int)
df = df.drop(['Date'], axis=1)

### DIY SENTIMENT

Function that creates a new column named 'Sentiment' based on the customer rating.

In [32]:
def rating_2_sentiment(row):
    if row == 3:
        sentiment = 'Neutral'
    elif row == 4 or row == 5:
        sentiment = 'Positive'
    elif row == 1 or row == 2:
        sentiment = 'Negative'
    return sentiment

In [33]:
# apply function to Rating column
df['Sentiment'] = df['Rating'].apply(rating_2_sentiment)

### CLEAN

Putting in a new column the content + the title of the review for integrity and completeness. This will be useful not only for evaluating the sentiment, but also to have a sort of "content review" for the observations.

Here is presented also a function to remove non-necessary parts and substitute english contractions with their expansion using the "re" module.

In [34]:
#Removing special character
def clean(content):
    content = re.sub('\W+',' ', content )                             # special characters
    content = re.sub(r'[:;=]\s*[-]?[)D(\[\]/\\OpP]', '', content)     # emoticons
    content = re.sub(r'[^\w\s]', '', content)                         # punctuation
    return content

def clean_urls(content):
    return re.sub(r'http\S+', '', content)

# Expansion of english contractions
def contraction_expansion(content):
    content = re.sub(r"won\'t", "would not", content)
    content = re.sub(r"can\'t", "can not", content)
    content = re.sub(r"don\'t", "do not", content)
    content = re.sub(r"shouldn\'t", "should not", content)
    content = re.sub(r"needn\'t", "need not", content)
    content = re.sub(r"hasn\'t", "has not", content)
    content = re.sub(r"haven\'t", "have not", content)
    content = re.sub(r"weren\'t", "were not", content)
    content = re.sub(r"mightn\'t", "might not", content)
    content = re.sub(r"didn\'t", "did not", content)
    content = re.sub(r"n\'t", " not", content)
    return content

#Data preprocessing
def data_cleaning(content):
    #remove firts the urls
    content = clean_urls(content)
    content = contraction_expansion(content)
    content = clean(content)   
    return content

In [35]:
# generalize with . even though there are some titles that end up with some kind of punctuation
# will clean it in the Clean Review column
df['Review'] = df['Title'] + '. ' + df['Content']
df['Clean Review'] = df['Title'] + '. ' + df['Content']
df['Clean Review'] = df['Clean Review'].apply(data_cleaning)

### STOP WORDS

Creating a column without stop words. Coming to stop words, these are words that do not impact the overall sentiment of the review, but the general nltk stop words contains words like not, hasn't, would'nt which actually conveys a negative sentiment. If I remove that it will end up contradicting the target variable (sentiment). So I have curated a list of the stop words which doesn't have any negative sentiment or any negative alternatives.

Obviously these are just some of the many stopwords, but i'll use these as a reference for some EDA.

In [36]:
# stop_words = stopwords.words('english')
# print(stop_words)
stop_words = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y']
new_stopwords = ["would", "shall", "could", "might"]
stop_words.extend(new_stopwords)
len(stop_words)

142

In [37]:
# basically crating a new column with all the content in the review 
# eliminating with a lambda function and list comprehension the words in the stop_words set of words
df['Clean Review'] = df['Clean Review'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop_words)]))

My objective is to have different columns:
- Title and content of reviews
- Title + content column with stopwords, punctuation and emoticons to maintain the context of the sentence
- Title + content column without stopwords, punctuation and emoticons to then create the sparse matrix and train the models.

Let's now visualize the complete DataFrame.

In [38]:
# reorganize DataFrame
df = df[['Clean Review', 'Review', 'Title', 'Content', 'Rating', 'Sentiment', 'Year', 'Month', 'Day']]
df.head(15)

Unnamed: 0,Clean Review,Review,Title,Content,Rating,Sentiment,Year,Month,Day
0,good sound seem decent build quality good sound happy purchase,good sound. seem decent build quality and good sound. very happy with purchase.,good sound,seem decent build quality and good sound. very happy with purchase.,5,Positive,2023,5,28
1,not realise bad audio setup considering used quite respectable setup many years ago feel trap using bluetooth speakers leave lot desired price provide perfect audio experience filling 3x3m room hardly auditorium quality amazing feel like subwoofer genres average listening experience without neighbours complaining beautiful almost brings tear eye,"i didn't realise how bad my audio setup was. considering i used to have quite a respectable setup many years ago i've feel into a trap of using bluetooth speakers which leave a lot to be desired. for the price these provide a perfect audio experience. now i'm only filling a 3x3m room so hardly an auditorium but the quality is amazing. i do feel like i could do with a subwoofer for some genres but for the average listening experience, without the neighbours complaining, it's beautiful. almost brings a tear to my eye.",i didn't realise how bad my audio setup was,"considering i used to have quite a respectable setup many years ago i've feel into a trap of using bluetooth speakers which leave a lot to be desired. for the price these provide a perfect audio experience. now i'm only filling a 3x3m room so hardly an auditorium but the quality is amazing. i do feel like i could do with a subwoofer for some genres but for the average listening experience, without the neighbours complaining, it's beautiful. almost brings a tear to my eye.",5,Positive,2023,5,27
2,beauty love compact shape sound,its a beauty. love its compact shape and sound,its a beauty,love its compact shape and sound,5,Positive,2023,5,26
3,perfect upgrade pc audio ruining cheap pc speakers blow mind clean sound small frame good value money,"perfect upgrade for pc audio. if you ruining cheap pc speakers those will blow your mind. clean sound, small frame, good value for money.",perfect upgrade for pc audio,"if you ruining cheap pc speakers those will blow your mind. clean sound, small frame, good value for money.",5,Positive,2023,5,23
4,wonderful monitors years since using monitors use dt770 250 audio work wow pleased added studio first sound sound quite flat good accurately eq sound bass even meaning not exaggerated mids well balanced sound great high end also fairly well balanced get nice air snap eqing bayer dynamic dt770 250 find headphones air snap bit highs default lower mids probably little punch not much going back forth 770 monitors similar ish experience build quality size petit size not take much room build absolutely fine price no issues standard connections means speaker wire phono 3 5mm unbalanced recommend using trs trs help eliminate noise emf backs also access bass treble knobs help tune sound room front power switch 3 5mm ports headphones phone play music phone via correct cable usually 3 5mm 3 5mm note headphone not powerful enough power 250ohm headphones overall monitors brilliant well worth price,"wonderful monitors. it's been some years since using any monitors as i use dt770 250's for all my audio work and wow, i'm so pleased i added these to my studio...first the sound...the sound is quite flat which is good as you can accurately eq your sound. the bass is ""even"", meaning that it's there but it's not exaggerated. the mids are very well balanced (they sound great) and the high end is also fairly well balanced and you can get some nice air and snap when eqing. for those that have the bayer dynamic dt770 250's, you'll find your headphones have more ""air"" and snap a bit more in the highs by default, the lower mids probably have a little more punch, but not much. going back and forth between the 770's and the monitors should be a similar(ish) experience...build quality and size..very petit in size, they don't take up much room. the build is absolutely fine for the price. no issues there...you have standard connections by means of speaker wire and phono to 3.5mm (unbalanced). i'd recommend using some trs to trs to help eliminate noise from emf...on the backs you also have access to bass and treble knobs to help tune the sound to your room...on the front you have the power switch and 3.5mm ports for headphones and phone so you can play music from your phone via the correct cable, usually 3.5mm to 3.5mm. just note that the headphone in isn't powerful enough to power 250ohm headphones...overall, these monitors are brilliant and well worth the price.",wonderful monitors,"it's been some years since using any monitors as i use dt770 250's for all my audio work and wow, i'm so pleased i added these to my studio...first the sound...the sound is quite flat which is good as you can accurately eq your sound. the bass is ""even"", meaning that it's there but it's not exaggerated. the mids are very well balanced (they sound great) and the high end is also fairly well balanced and you can get some nice air and snap when eqing. for those that have the bayer dynamic dt770 250's, you'll find your headphones have more ""air"" and snap a bit more in the highs by default, the lower mids probably have a little more punch, but not much. going back and forth between the 770's and the monitors should be a similar(ish) experience...build quality and size..very petit in size, they don't take up much room. the build is absolutely fine for the price. no issues there...you have standard connections by means of speaker wire and phono to 3.5mm (unbalanced). i'd recommend using some trs to trs to help eliminate noise from emf...on the backs you also have access to bass and treble knobs to help tune the sound to your room...on the front you have the power switch and 3.5mm ports for headphones and phone so you can play music from your phone via the correct cable, usually 3.5mm to 3.5mm. just note that the headphone in isn't powerful enough to power 250ohm headphones...overall, these monitors are brilliant and well worth the price.",5,Positive,2023,5,21
5,not loud enough probably suited bedroom use not anything bass kept cutting highs mids times also not great opted krk instead,not loud enough. these are probably suited for bedroom use and not anything more. the bass kept cutting out with the highs and mids at times also doing the same. not great at all - i opted for krk’s instead!,not loud enough,these are probably suited for bedroom use and not anything more. the bass kept cutting out with the highs and mids at times also doing the same. not great at all - i opted for krk’s instead!,2,Negative,2023,5,20
6,faulty passive speaker not connect main speaker faulty disappointed,faulty. passive speaker won't connect to the main speaker.. faulty.very disappointed,faulty,passive speaker won't connect to the main speaker.. faulty.very disappointed,1,Negative,2023,5,18
7,brilliant monitor speakers dj ing absolutely love speakers perfect dj ing home loud enough little boogie kids sound quality great glad research ahead time build quality really good volume pot nice smooth great home music production gaming well worth investment,"brilliant monitor speakers for dj'ing. absolutely love these speakers - perfect for dj'ing at home and loud enough for a little boogie with the kids. sound quality is great and i'm glad i did my research ahead of time. build quality is really good, the volume pot is nice and smooth...these would be great for home music production or for gaming...well worth the investment.",brilliant monitor speakers for dj'ing,"absolutely love these speakers - perfect for dj'ing at home and loud enough for a little boogie with the kids. sound quality is great and i'm glad i did my research ahead of time. build quality is really good, the volume pot is nice and smooth...these would be great for home music production or for gaming...well worth the investment.",5,Positive,2023,5,17
8,good not good harman kardon soundticks good not good harman kardon soundticks though smaller overall,very good but not as good as the harman kardon soundticks. very good but not as good as the harman kardon soundticks..though these are smaller overall,very good but not as good as the harman kardon soundticks,very good but not as good as the harman kardon soundticks..though these are smaller overall,4,Positive,2023,5,14
9,crackling sound connected bluetooth like preface review saying tried 2 different sets speakers bluetooth version brand new exact issue must model issue rather one problem first pair bought constant crackling buzzing noise coming passive speaker connected bluetooth noise disappeared disconnected bluetooth appeared instantly connected audible low volumes ruined listening experience exchanged another set second set exact problem tried connecting different bluetooth device tried moving speakers different rooms moving electrical devices away swapping cables around noise not disappear no matter tried really disappointing massively turned away purchasing presonus products started returns process 2nd may 2023 posted item day still not refund today 12th may 2023 love update refund please,"crackling sound when connected to bluetooth. i’d like to preface this review by saying i tried 2 different sets of these speakers (both the bluetooth version), brand new, and both had the exact same issue, so must be a model issue rather than a one-off problem...the first pair i bought had a constant crackling/buzzing noise coming from the passive speaker when connected to bluetooth. the noise disappeared when disconnected from bluetooth, but appeared instantly when connected. it was audible at low volumes which ruined the listening experience so i exchanged for another set...this second set had the exact same problem. i tried connecting a different bluetooth device, i tried moving the speakers to different rooms, moving electrical devices away, swapping cables around… but the noise would not disappear no matter what i tried...really disappointing and has massively turned me away from purchasing presonus products again...i started the returns process on 2nd may 2023, posted the item the same day, and still have not had a refund - today is 12th may 2023. would love an update on my refund please?",crackling sound when connected to bluetooth,"i’d like to preface this review by saying i tried 2 different sets of these speakers (both the bluetooth version), brand new, and both had the exact same issue, so must be a model issue rather than a one-off problem...the first pair i bought had a constant crackling/buzzing noise coming from the passive speaker when connected to bluetooth. the noise disappeared when disconnected from bluetooth, but appeared instantly when connected. it was audible at low volumes which ruined the listening experience so i exchanged for another set...this second set had the exact same problem. i tried connecting a different bluetooth device, i tried moving the speakers to different rooms, moving electrical devices away, swapping cables around… but the noise would not disappear no matter what i tried...really disappointing and has massively turned me away from purchasing presonus products again...i started the returns process on 2nd may 2023, posted the item the same day, and still have not had a refund - today is 12th may 2023. would love an update on my refund please?",1,Negative,2023,5,12


In [39]:
df.shape

(740, 9)

We ended up with the pandas DataFrame with 740 reviews, all with date, title, rating, comment and a DIY sentiment column. Let's jump into the analysis.

In [40]:
df.to_csv('G:/Il mio Drive/MAGISTRALE/IT Coding/Project/Sentiment-Analysis-on-Amazon-product-reviews/Data/clean_reviews.csv')