# <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>1 | About</b></div>

Sentiment analysis of Apple's reviews on TrustPilot using BeautifulSoup, NLTK and TextBlob.

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>2 | Data overview</b></div>
- Web scraped business reviews from Apple's TrustPilot page
- Created additional features from reviews for a more in-depth data analysis 

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>3 | Stack</b></div>

- TextBlob 
- BeautifulSoup
- NLTK
- Data Cleaning and Interpretation


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>4 | Extracting and collecting business reviews</b></div>

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [3]:
reviews = []

pages = np.arange(1, 5, 1)

for page in pages:
    page = requests.get("https://www.trustpilot.com/review/www.apple.com" + "?page=" + str(page))
    soup = BeautifulSoup(page.text, "html.parser")
    
    review_div = soup.find_all("div", class_="styles_reviewContent__0Q2Tg")
    
    for container in review_div:
        raw_content = container.find("p")
        reviews.append(raw_content.text)

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>5 | Exploratory Data Analysis</b></div>

In [4]:
df = pd.DataFrame(np.array(reviews), columns=["review"])

In [5]:
len(df["review"])

80

In [6]:
df["word_count"] = df["review"].apply(lambda x: len(x.split()))

In [7]:
df["char_count"] = df["review"].apply(lambda x: len(x))

In [8]:
def average_words(x):
  words = x.split()
  return sum(len(word) for word in words) / len(words)

In [9]:
df["average_word_length"] = df["review"].apply(lambda x: average_words(x))

In [10]:
from nltk.corpus import stopwords

stop_words = stopwords.words("english")

df["stopword-count"] = df["review"].apply(lambda x: len([word for word in x.split() if word.lower() in stop_words]))

df["stopword-rate"] = df["stopword-count"] / df["word_count"]

In [11]:
df.sort_values(by="stopword-rate")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate
52,Bought I Mac 4K in 2017 great at first now it’...,77,370,3.805195,23,0.298701
14,Apple sells very poor quality products. I had...,36,195,4.388889,11,0.305556
6,Five star service from Apple as always. Really...,25,150,5.040000,8,0.320000
29,Will never use apple again. Have always loved ...,65,344,4.307692,22,0.338462
72,Apple make terrible products that just don’t l...,69,401,4.826087,24,0.347826
...,...,...,...,...,...,...
48,If you have booked your appointment with servi...,86,423,3.930233,48,0.558140
47,I visited the Apple store today to have my Mac...,127,661,4.212598,73,0.574803
56,My review is about someone who has treated me ...,33,167,4.090909,19,0.575758
76,My AirPods were not working so when I worked w...,157,856,4.452229,91,0.579618


In [12]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword-count,stopword-rate
count,80.0,80.0,80.0,80.0,80.0
mean,86.375,464.3125,4.406054,39.8875,0.452061
std,51.439377,273.575324,0.355593,26.575421,0.067253
min,14.0,74.0,3.70229,6.0,0.298701
25%,56.75,306.0,4.204044,23.0,0.40611
50%,77.5,405.0,4.367003,34.0,0.450165
75%,107.5,585.5,4.64028,50.0,0.5
max,345.0,1705.0,5.225352,181.0,0.586957


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Data Preprocessing</b></div>

### <b><span style='color:#58A2A8'>6.1</span> | Removing redundant words</b>

In [13]:
df.review

0     Absolutely terrible.. they indicated that an A...
1     A few days ago, I placed an order for a Macboo...
2     Why did you kill all non authorised charging w...
3     Technology is great until it’s not! Too much “...
4     Apple are turning into Microsoft, in the last ...
                            ...                        
75    My screen was working perfectly until I upgrad...
76    My AirPods were not working so when I worked w...
77    apple lies... spent 1.5 hours twice on the pho...
78    1. My Apple online order was marked as deliver...
79    I went to apple support looking to find a plac...
Name: review, Length: 80, dtype: object

In [14]:
# Lower casing
df["lowercase"] = df["review"].apply(lambda x: " ".join(word.lower() for word in x.split()))

In [None]:
# Punctuation
df["punctuation"] = df["lowercase"].str.replace("[^\w\s]", "")

In [16]:
# Stop words 
df["stopwords"] = df["punctuation"].apply(lambda x: " ".join(word for word in x.split() if word not in stop_words))

In [17]:
# Creating a frequency count to track recursive words
pd.Series(" ".join(df["stopwords"]).split()).value_counts()[:30]

apple       147
phone        50
iphone       31
new          30
customer     30
dont         28
service      27
support      27
would        26
back         24
company      23
years        21
never        21
get          19
use          18
time         18
2            16
old          16
products     16
one          15
store        15
people       15
problem      15
password     15
money        15
3            15
call         14
id           14
still        14
buy          14
dtype: int64

In [18]:
other_stop_words = ["get", "told"] # a lot more can be added, testing required
df["cleaned_review"] = df["stopwords"].apply(lambda x: " ".join(word for word in x.split() if word not in other_stop_words))
pd.Series(" ".join(df["cleaned_review"]).split()).value_counts()[:30]

apple       147
phone        50
iphone       31
customer     30
new          30
dont         28
support      27
service      27
would        26
back         24
company      23
never        21
years        21
time         18
use          18
products     16
old          16
2            16
password     15
3            15
one          15
store        15
problem      15
money        15
people       15
still        14
buy          14
call         14
said         14
id           14
dtype: int64

In [19]:
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate,lowercase,punctuation,stopwords,cleaned_review
0,Absolutely terrible.. they indicated that an A...,111,581,4.243243,60,0.540541,absolutely terrible.. they indicated that an a...,absolutely terrible they indicated that an app...,absolutely terrible indicated apple watch woul...,absolutely terrible indicated apple watch woul...
1,"A few days ago, I placed an order for a Macboo...",102,530,4.205882,50,0.490196,"a few days ago, i placed an order for a macboo...",a few days ago i placed an order for a macbook...,days ago placed order macbook pro phone used o...,days ago placed order macbook pro phone used o...
2,Why did you kill all non authorised charging w...,38,221,4.842105,15,0.394737,why did you kill all non authorised charging w...,why did you kill all non authorised charging w...,kill non authorised charging wires plain bulli...,kill non authorised charging wires plain bulli...
3,Technology is great until it’s not! Too much “...,65,366,4.646154,31,0.476923,technology is great until it’s not! too much “...,technology is great until its not too much sec...,technology great much security cant access acc...,technology great much security cant access acc...
4,"Apple are turning into Microsoft, in the last ...",69,365,4.304348,33,0.478261,"apple are turning into microsoft, in the last ...",apple are turning into microsoft in the last t...,apple turning microsoft last two weeks ive iss...,apple turning microsoft last two weeks ive iss...


### <b><span style='color:#58A2A8'>6.2</span> | Lemmatization using TextBlob</b>

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')

from textblob import Word

In [21]:
df["lemmatized"] = df["cleaned_review"].apply(lambda x: " ".join(Word(word).lemmatize() for word in x.split()))

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>7 | Sentiment Analysis</b></div>

In [22]:
from textblob import TextBlob

In [23]:
# polarity and subjectivity metrics -> returned by TextBlob
# polarity: negative (-1) or positive (+1) a review is 
# subjectivity: generic opinion vs factual information 
df["polarity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[0])
df["subjectivity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[1])    

In [24]:
df.drop(["lowercase", "punctuation", "stopwords", "cleaned_review", "lemmatized"], axis=1, inplace = True)

In [25]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword-count,stopword-rate,polarity,subjectivity
count,80.0,80.0,80.0,80.0,80.0,80.0,80.0
mean,86.375,464.3125,4.406054,39.8875,0.452061,-0.009627,0.457418
std,51.439377,273.575324,0.355593,26.575421,0.067253,0.236562,0.205552
min,14.0,74.0,3.70229,6.0,0.298701,-0.816667,0.0
25%,56.75,306.0,4.204044,23.0,0.40611,-0.11875,0.335078
50%,77.5,405.0,4.367003,34.0,0.450165,0.0,0.452232
75%,107.5,585.5,4.64028,50.0,0.5,0.12822,0.589712
max,345.0,1705.0,5.225352,181.0,0.586957,1.0,1.0


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Performance and Evaluation</b></div>

Dealing with relatively negative reviews (mean = 0.001603)

In [26]:
df.sort_values(by="polarity")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate,polarity,subjectivity
26,"Really bad camera for iphone 13, I’m so disapp...",27,151,4.629630,10,0.370370,-0.816667,0.805556
35,Moved countries and will not let me reset pass...,79,404,4.126582,43,0.544304,-0.600000,0.900000
15,"Any Apple device is bulls*it, upgrade to iOS 1...",41,228,4.585366,15,0.365854,-0.496212,0.818182
30,Just wanted to say how pathetic it was that I ...,55,294,4.363636,27,0.490909,-0.466667,0.366667
67,ZERO star is more accurate. I bought my Airpod...,111,625,4.639640,61,0.549550,-0.430000,0.576667
...,...,...,...,...,...,...,...,...
36,"Ok well, the keyboard letter N was giving doub...",136,710,4.198529,62,0.455882,0.244761,0.541573
10,Longtime Apple user I’ve been lucky enough to ...,93,514,4.526882,42,0.451613,0.259259,0.487037
47,I visited the Apple store today to have my Mac...,127,661,4.212598,73,0.574803,0.320000,0.420000
71,"I would give a zero if I could, I ordered some...",101,507,4.029703,53,0.524752,0.328571,0.378571
