# <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>1 | About</b></div>

Sentiment analysis of Apple's reviews on Trustpilot using Requests, BeautifulSoup, NLTK and TextBlob.

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>2 | Data overview</b></div>
- Web scraped business reviews from Apple's TrustPilot page
- Created additional features from reviews for a more in-depth data analysis
    - word_count, char_count, average_word_length, stopword_count, stopword_rate
- Pre-process text data
    - to lowercase, remove punctuation, stop words, recursive words, extra custom stop words, and lemmatization

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>3 | Stack</b></div>

- Requests
- BeautifulSoup
- NLTK
- TextBlob
- Data Cleaning and Interpretation


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>4 | Extracting and collecting business reviews</b></div>

In [1]:
# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
# all data container
reviews = []

# first 5 pages worth of data
pages = np.arange(1, 5, 1)

# main loop, for each page scrape all reviews
for page in pages:
    page = requests.get("https://www.trustpilot.com/review/www.apple.com" + "?page=" + str(page))
    soup = BeautifulSoup(page.text, "html.parser")
    
    # getting reviews
    review_div = soup.find_all("div", class_="styles_reviewContent__0Q2Tg")
    
    # extracting data from review
    for container in review_div:
        raw_content = container.find("p")
        reviews.append(raw_content.text)

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>5 | Exploratory Data Analysis and Features creation</b></div>

In [3]:
# creating dataframe
df = pd.DataFrame(np.array(reviews), columns=["review"])
df.head()

Unnamed: 0,review
0,Forgot my screen lock code ok my fault. So it ...
1,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...
4,"Though I like the Apple products, god forbid y..."


In [4]:
# number of reviews
len(df["review"])

80

In [5]:
# creating word_count feature for each review
df["word_count"] = df["review"].apply(lambda x: len(x.split()))
df.head()

Unnamed: 0,review,word_count
0,Forgot my screen lock code ok my fault. So it ...,83
1,accidnetly pressed the wrong button on my phon...,63
2,Booked an appointment to have screen protector...,62
3,Their phones are so glitchy. Things just start...,48
4,"Though I like the Apple products, god forbid y...",135


In [6]:
# creating character count feature for each review
df["char_count"] = df["review"].apply(lambda x: len(x))
df.head()

Unnamed: 0,review,word_count,char_count
0,Forgot my screen lock code ok my fault. So it ...,83,427
1,accidnetly pressed the wrong button on my phon...,63,350
2,Booked an appointment to have screen protector...,62,326
3,Their phones are so glitchy. Things just start...,48,288
4,"Though I like the Apple products, god forbid y...",135,756


In [7]:
# function to retrieve the average length of words
def average_words(x):
  words = x.split()
  return sum(len(word) for word in words) / len(words)

In [8]:
# creating average word length for each review
df["average_word_length"] = df["review"].apply(lambda x: average_words(x))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556
2,Booked an appointment to have screen protector...,62,326,4.241935
3,Their phones are so glitchy. Things just start...,48,288,5.020833
4,"Though I like the Apple products, god forbid y...",135,756,4.6


In [9]:
# importing NLTK
from nltk.corpus import stopwords

# will be used to also remove stopwords - english language as a basis
stop_words = stopwords.words("english")

# creating two more features
# stopword_count
df["stopword_count"] = df["review"].apply(lambda x: len([word for word in x.split() if word.lower() in stop_words]))
# stopword_rate
df["stopword_rate"] = df["stopword_count"] / df["word_count"]

In [10]:
df.sort_values(by="stopword_rate")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate
77,Production of high quality products,5,35,6.200000,1,0.200000
40,My iphone 13 pro max battery heats up whenever...,16,91,4.750000,4,0.250000
24,"Great tech always enjoy buying, altho I do not...",12,62,4.250000,3,0.250000
12,Great benefits like full health insurance and ...,19,117,5.210526,6,0.315789
37,Horrible customer service. Hard to speak to a...,17,105,5.176471,6,0.352941
...,...,...,...,...,...,...
8,I bought a new iPhone. Not the first time over...,53,281,4.320755,29,0.547170
51,I purchased a pair of apple AirPods Pro 2020Du...,146,735,4.041096,82,0.561644
23,Bought a pair of AirPod maxes 10 months ago ha...,93,485,4.225806,53,0.569892
52,After purchasing a phone for my daughter last ...,784,3892,3.965561,450,0.573980


In [11]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword_count,stopword_rate
count,80.0,80.0,80.0,80.0,80.0
mean,109.675,600.675,4.53469,51.2,0.445344
std,119.742633,639.164086,0.417291,61.189144,0.06872
min,5.0,35.0,3.653846,1.0,0.2
25%,46.5,244.0,4.274038,21.25,0.416766
50%,77.0,413.0,4.487263,38.0,0.449562
75%,124.5,712.25,4.763197,58.5,0.489362
max,784.0,3892.0,6.2,450.0,0.576923


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Data Preprocessing</b></div>

### <b><span style='color:#58A2A8'>6.1</span> | Removing redundant words</b>

In [12]:
df.review

0     Forgot my screen lock code ok my fault. So it ...
1     accidnetly pressed the wrong button on my phon...
2     Booked an appointment to have screen protector...
3     Their phones are so glitchy. Things just start...
4     Though I like the Apple products, god forbid y...
                            ...                        
75    Hi I am proud that I purchased a new Apple I p...
76    I just had a phone rep do a screen share and t...
77                  Production of high quality products
78    i bought airpods 2 this winter and still dont ...
79    I went into an apple retail store one of the s...
Name: review, Length: 80, dtype: object

In [13]:
# transforming reviews to lowercase
df["lowercase"] = df["review"].apply(lambda x: " ".join(word.lower() for word in x.split()))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y..."


In [14]:
# removing punctuation
df["punctuation"] = df["lowercase"].str.replace("[^\w\s]", "")
df.head()

  


Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...


In [15]:
# removing stop words 
df["stopwords"] = df["punctuation"].apply(lambda x: " ".join(word for word in x.split() if word not in stop_words))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...


In [16]:
# creating a frequency count to track recursive words
pd.Series(" ".join(df["stopwords"]).split()).value_counts()[:30]

apple        134
phone         55
service       42
iphone        40
customer      29
time          29
one           28
back          27
would         27
even          25
new           25
get           25
dont          24
like          23
pro           20
ipad          20
issue         19
products      19
never         19
buy           18
money         16
called        16
refund        16
help          16
company       16
ever          16
care          15
customers     14
use           14
screen        14
dtype: int64

In [17]:
# removing recursive words after analysis 
other_stop_words = ["would", "even", "get", "dont", "ever", "told"] # a lot more can be added

# putting together the cleaned pre-processed review
df["cleaned_review"] = df["stopwords"].apply(lambda x: " ".join(word for word in x.split() if word not in other_stop_words))
pd.Series(" ".join(df["cleaned_review"]).split()).value_counts()[:30]

apple        134
phone         55
service       42
iphone        40
customer      29
time          29
one           28
back          27
new           25
like          23
pro           20
ipad          20
never         19
issue         19
products      19
buy           18
refund        16
called        16
money         16
help          16
company       16
care          15
cant          14
screen        14
days          14
call          14
use           14
customers     14
store         13
order         13
dtype: int64

In [18]:
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords,cleaned_review
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...,booked appointment screen protectors fitted ip...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...,phones glitchy things start acting like banner...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...,though like apple products god forbid forget p...


### <b><span style='color:#58A2A8'>6.2</span> | Lemmatization using TextBlob</b>

In [19]:
# imports
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
from textblob import Word

[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...


In [20]:
# lemmatizing the cleaned review
df["lemmatized"] = df["cleaned_review"].apply(lambda x: " ".join(Word(word).lemmatize() for word in x.split()))
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,lowercase,punctuation,stopwords,cleaned_review,lemmatized
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,forgot my screen lock code ok my fault. so it ...,forgot my screen lock code ok my fault so it a...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...,forgot screen lock code ok fault ask appleid p...
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,accidnetly pressed the wrong button on my phon...,accidnetly pressed the wrong button on my phon...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...,accidnetly pressed wrong button phone disabili...
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,booked an appointment to have screen protector...,booked an appointment to have screen protector...,booked appointment screen protectors fitted ip...,booked appointment screen protectors fitted ip...,booked appointment screen protector fitted iph...
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,their phones are so glitchy. things just start...,their phones are so glitchy things just start ...,phones glitchy things start acting like banner...,phones glitchy things start acting like banner...,phone glitchy thing start acting like banner n...
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,"though i like the apple products, god forbid y...",though i like the apple products god forbid yo...,though like apple products god forbid forget p...,though like apple products god forbid forget p...,though like apple product god forbid forget pa...


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>7 | Sentiment Analysis</b></div>

In [21]:
# imports
from textblob import TextBlob

In [22]:
# polarity: from -1 to 1, where -1 indicates negative sentiment, 0 indicates neutral sentiment, and 1 indicates positive sentiment
# subjectivity: from 0 to 1, where 0 indicates an objective statement, and 1 indicates a subjective statement
df["polarity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[0])
df["subjectivity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[1])    

In [23]:
# we drop all the "useless" features for a better view
df.drop(["lowercase", "punctuation", "stopwords", "cleaned_review", "lemmatized"], axis=1, inplace = True)
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
0,Forgot my screen lock code ok my fault. So it ...,83,427,4.156627,42,0.506024,0.333333,0.333333
1,accidnetly pressed the wrong button on my phon...,63,350,4.555556,26,0.412698,0.075,0.625
2,Booked an appointment to have screen protector...,62,326,4.241935,27,0.435484,0.025,0.322917
3,Their phones are so glitchy. Things just start...,48,288,5.020833,19,0.395833,0.0,0.0
4,"Though I like the Apple products, god forbid y...",135,756,4.6,60,0.444444,-0.138095,0.54881


In [24]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
count,80.0,80.0,80.0,80.0,80.0,80.0,80.0
mean,109.675,600.675,4.53469,51.2,0.445344,-0.002497,0.480238
std,119.742633,639.164086,0.417291,61.189144,0.06872,0.31683,0.187184
min,5.0,35.0,3.653846,1.0,0.2,-1.0,0.0
25%,46.5,244.0,4.274038,21.25,0.416766,-0.126522,0.381237
50%,77.0,413.0,4.487263,38.0,0.449562,0.0,0.491818
75%,124.5,712.25,4.763197,58.5,0.489362,0.152052,0.584403
max,784.0,3892.0,6.2,450.0,0.576923,1.0,1.0


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Evaluation and Conclusion</b></div>


In [25]:
df.sort_values(by="polarity")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword_count,stopword_rate,polarity,subjectivity
71,Apple has the worst customer service support.S...,48,280,4.854167,22,0.458333,-1.000000,1.000000
27,Every time yall update the software its always...,23,142,5.217391,9,0.391304,-0.800000,0.900000
15,I put my mobile charging morning 4 am that tim...,47,218,3.659574,23,0.489362,-0.700000,0.666667
70,Trying to get anybody for support is ridiculou...,52,275,4.307692,24,0.461538,-0.544444,0.933333
34,I bought AirPods and they are the worst headph...,24,146,5.125000,11,0.458333,-0.533333,0.800000
...,...,...,...,...,...,...,...,...
54,I have been using Iphone from past 2 years and...,28,156,4.607143,12,0.428571,0.437500,0.587500
24,"Great tech always enjoy buying, altho I do not...",12,62,4.250000,3,0.250000,0.600000,0.625000
62,waiting for the latest wear pro and till dat e...,14,73,4.285714,5,0.357143,0.650000,0.825000
20,Iphone is the best phone out there. Latest tec...,16,100,5.312500,6,0.375000,0.783333,0.733333


Although the overall sentiment analysis was conducted on a relatively small dataset, we can see that the mean polarity is -0.002497, while the mean subjectivity is 0.480238.

This suggests that there were more negative or neutral opinions expressed compared to positive ones. However, it is important to note that the magnitude of the polarity score is quite small, suggesting a relatively balanced sentiment.

Moreover, there is a moderate level of subjectivity in the reviews. This suggests that while some reviews were objective and based on factual information, a significant portion of the reviews contained subjective opinions or personal experiences.