# <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>1 | About</b></div>

Sentiment analysis of Apple's reviews on Trustpilot using Requests, BeautifulSoup, NLTK and TextBlob.

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>2 | Data overview</b></div>
- Web scraped reviews from Apple's TrustPilot page (5500+ samples, 2011 - 2023)
- Created additional features from reviews for a more in-depth data analysis
    - word_count, char_count, average_word_length, stopword_count, stopword_rate
- Pre-process data
    - to lowercase, remove punctuation, stop words, recursive words, extra custom stop words, and lemmatization

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>3 | Stack</b></div>

- Requests
- BeautifulSoup
- NLTK
- TextBlob
- Data Cleaning and Interpretation


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>4 | Extracting and collecting business reviews</b></div>

In [1]:
# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
# all data container
reviews = []

# 305 pages worth of data (2011 to 2023)
pages = np.arange(1, 305, 1)

# main loop, for each page scrape all reviews
for page in pages:
    page = requests.get("https://www.trustpilot.com/review/www.apple.com" + "?page=" + str(page))
    soup = BeautifulSoup(page.text, "html.parser")
    
    # getting reviews
    review_div = soup.find_all("div", class_="styles_reviewContent__0Q2Tg")
    
    # extracting data from review
    for container in review_div:
        raw_content = container.find("p")
        reviews.append(raw_content.text)

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>5 | Data Cleaning and Features creation</b></div>

In [3]:
# creating dataframe
df = pd.DataFrame(np.array(reviews), columns=["review"])
df.head()

Unnamed: 0,review
0,Forgot my screen lock code ok my fault. So it ...
1,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...
4,"Though I like the Apple products, god forbid y..."


In [4]:
# number of reviews
len(df["review"])

6080

In [5]:
# creating word_count feature for each review
df["word_count"] = df["review"].apply(lambda x: len(x.split()))
df.head()

Unnamed: 0,review,word_count
0,Forgot my screen lock code ok my fault. So it ...,83
1,accidnetly pressed the wrong button on my phon...,63
2,Booked an appointment to have screen protector...,62
3,Their phones are so glitchy. Things just start...,48
4,"Though I like the Apple products, god forbid y...",135


In [6]:
# creating character count feature for each review
df["char_count"] = df["review"].apply(lambda x: len(x))
df.head()

Unnamed: 0,review,word_count,char_count
0,Forgot my screen lock code ok my fault. So it ...,83,427
1,accidnetly pressed the wrong button on my phon...,63,350
2,Booked an appointment to have screen protector...,62,326
3,Their phones are so glitchy. Things just start...,48,288
4,"Though I like the Apple products, god forbid y...",135,756


In [7]:
# function to retrieve the average length of words
def average_words(x):
  words = x.split()
  return sum(len(word) for word in words) / len(words)

In [8]:
# creating average word length for each review
df["average_word_length"] = df["review"].apply(lambda x: average_words(x))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556
2,Booked an appointment to have screen protector...,62,326,4.241935
3,Their phones are so glitchy. Things just start...,48,288,5.020833
4,"Though I like the Apple products, god forbid y...",135,756,4.6


In [9]:
# importing NLTK
from nltk.corpus import stopwords

# will be used to also remove stopwords - english language as a basis
stop_words = stopwords.words("english")

# creating two more features
# stopword_count
df["stopword_count"] = df["review"].apply(lambda x: len([word for word in x.split() if word.lower() in stop_words]))
# stopword_rate
df["stopword_rate"] = df["stopword_count"] / df["word_count"]

In [10]:
df.sort_values(by="stopword_rate")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate
1162,Today I've received 24 inch mac. Feels nice,8,43,4.500000,0,0.000000
1240,Good shop!! :)))),3,17,5.000000,0,0.000000
5806,Expensive.,1,10,10.000000,0,0.000000
2763,scammers xoxo,2,13,6.000000,0,0.000000
620,Incredible! Bad service...,3,26,8.000000,0,0.000000
...,...,...,...,...,...,...
2507,Do NOT tell what I can and can not read or wha...,29,126,3.379310,20,0.689655
2031,I am on the phone with costumer service and sh...,29,148,4.137931,20,0.689655
4256,Too overrated and such is the price for any se...,10,52,4.300000,7,0.700000
5833,This is a very helpful thing and its funny to,10,45,3.600000,7,0.700000


In [11]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword_count,stopword_rate
count,6080.0,6080.0,6080.0,6080.0,6080.0
mean,106.790625,575.440132,4.508994,50.607072,0.440333
std,118.343695,635.878216,0.707333,58.858325,0.098719
min,1.0,10.0,2.4,0.0,0.0
25%,33.0,176.0,4.166667,14.0,0.404762
50%,71.0,384.0,4.430769,33.0,0.461538
75%,135.0,728.25,4.727553,64.0,0.5
max,1020.0,5323.0,31.333333,536.0,0.714286


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Data Preprocessing</b></div>

### <b><span style='color:#58A2A8'>6.1</span> | Removing redundant words</b>

In [12]:
df.review

0       Forgot my screen lock code ok my fault. So it ...
1       accidnetly pressed the wrong button on my phon...
2       Booked an appointment to have screen protector...
3       Their phones are so glitchy. Things just start...
4       Though I like the Apple products, god forbid y...
                              ...                        
6075    I have owned a Mac since 1997 and never looked...
6076    If there is anything electronic that I need in...
6077    Apple is one of the best e-companies I bought ...
6078    Down the years I have bought Apple products an...
6079    Apple are one of the best electronic companies...
Name: review, Length: 6080, dtype: object

In [13]:
# transforming reviews to lowercase
df["lowercase"] = df["review"].apply(lambda x: " ".join(word.lower() for word in x.split()))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y..."


In [14]:
# removing punctuation
df["punctuation"] = df["lowercase"].str.replace("[^\w\s]", "")
df.head()

  


Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...


In [15]:
# removing stop words 
df["stopwords"] = df["punctuation"].apply(lambda x: " ".join(word for word in x.split() if word not in stop_words))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...


In [16]:
# creating a frequency count to track recursive words
pd.Series(" ".join(df["stopwords"]).split()).value_counts()[:30]

apple       11700
phone        5001
service      2640
customer     2527
iphone       2497
would        2236
get          2213
new          2183
store        1919
one          1828
back         1749
products     1742
time         1724
told         1601
dont         1506
even         1427
never        1392
support      1388
buy          1364
said         1358
company      1292
product      1264
like         1168
money        1158
could        1139
days         1027
years        1025
im           1024
cant         1019
got          1006
dtype: int64

In [17]:
# removing recursive words after analysis 
other_stop_words = ["would", "get", "one", "told", "even", "said", "days"] # a lot more can be added

# putting together the cleaned pre-processed review
df["cleaned_review"] = df["stopwords"].apply(lambda x: " ".join(word for word in x.split() if word not in other_stop_words))
pd.Series(" ".join(df["cleaned_review"]).split()).value_counts()[:30]

apple       11700
phone        5001
service      2640
customer     2527
iphone       2497
new          2183
store        1919
back         1749
products     1742
time         1724
dont         1506
never        1392
support      1388
buy          1364
company      1292
product      1264
like         1168
money        1158
could        1139
years        1025
im           1024
cant         1019
got          1006
call          991
issue         983
still         976
use           964
pro           930
bought        929
another       922
dtype: int64

In [18]:
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords,cleaned_review
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...,booked appointment screen protectors fitted ip...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...,phones glitchy things start acting like banner...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...,though like apple products god forbid forget p...


### <b><span style='color:#58A2A8'>6.2</span> | Lemmatization using TextBlob</b>

In [19]:
# imports
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
from textblob import Word

[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...


In [20]:
# lemmatizing the cleaned review
df["lemmatized"] = df["cleaned_review"].apply(lambda x: " ".join(Word(word).lemmatize() for word in x.split()))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords,cleaned_review,lemmatized
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...,booked appointment screen protectors fitted ip...,booked appointment screen protector fitted iph...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...,phones glitchy things start acting like banner...,phone glitchy thing start acting like banner n...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...,though like apple products god forbid forget p...,though like apple product god forbid forget pa...


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>7 | Sentiment Analysis</b></div>

In [21]:
# imports
from textblob import TextBlob

In [22]:
# polarity: from -1 to 1, where -1 indicates negative sentiment, 0 indicates neutral sentiment, and 1 indicates positive sentiment
# subjectivity: from 0 to 1, where 0 indicates an objective statement, and 1 indicates a subjective statement
df["polarity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[0])
df["subjectivity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[1])    

In [23]:
# we drop all the "useless" features for a better view
df.drop(["lowercase", "punctuation", "stopwords", "cleaned_review", "lemmatized"], axis=1, inplace = True)
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,0.333333,0.333333
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,0.075,0.625
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,0.025,0.322917
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,0.0,0.0
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,-0.138095,0.54881


In [24]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
count,6080.0,6080.0,6080.0,6080.0,6080.0,6080.0,6080.0
mean,106.790625,575.440132,4.508994,50.607072,0.440333,0.030779,0.476655
std,118.343695,635.878216,0.707333,58.858325,0.098719,0.298574,0.218315
min,1.0,10.0,2.4,0.0,0.0,-1.0,0.0
25%,33.0,176.0,4.166667,14.0,0.404762,-0.1,0.366667
50%,71.0,384.0,4.430769,33.0,0.461538,0.00533,0.483333
75%,135.0,728.25,4.727553,64.0,0.5,0.171429,0.6
max,1020.0,5323.0,31.333333,536.0,0.714286,1.0,1.0


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Evaluation and Conclusion</b></div>


In [25]:
df.sort_values(by="polarity")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
3318,I like my keyboard that My husband purchased f...,27,135,4.037037,14,0.518519,-1.0,1.0
3109,"Apart from their tax evasion practices, they a...",16,99,5.250000,6,0.375000,-1.0,1.0
3894,The worst experience of my life with apple pro...,47,270,4.765957,25,0.531915,-1.0,1.0
4749,"If I could give less, I would...they are terri...",19,139,6.368421,8,0.421053,-1.0,1.0
1167,Horrrible Horrible phone do not be deceived b...,37,216,4.837838,17,0.459459,-1.0,1.0
...,...,...,...,...,...,...,...,...
4341,"The best phones on the market, leaders in the ...",22,127,4.818182,9,0.409091,1.0,0.3
4434,Best customer service ...best service provider,6,47,6.833333,0,0.000000,1.0,0.3
5773,"I just ordered Ipod Nano from apple store, i g...",17,85,4.058824,7,0.411765,1.0,0.3
5788,Best Products ever,3,18,5.333333,0,0.000000,1.0,0.3


Although the overall sentiment analysis was conducted on a medium-sized dataset (5780 samples), we can see that the mean polarity is 0.018299, while the mean subjectivity is 0.474698.

This suggests that there were more positive or neutral opinions expressed compared to negative ones. However, it is important to note that the magnitude of the polarity score is quite small, suggesting a relatively balanced sentiment.

Moreover, there is a moderate level of subjectivity in the reviews. This suggests that while some reviews were objective and based on factual information, a significant portion of the reviews contained subjective opinions or personal experiences.