# <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>1 | About</b></div>

Sentiment analysis of Apple's reviews on TrustPilot using BeautifulSoup, NLTK and TextBlob.

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>2 | Data overview</b></div>
- Web scraped business reviews from Apple's TrustPilot page
- Created additional features from reviews for a more in-depth data analysis 

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>3 | Stack</b></div>

- TextBlob 
- BeautifulSoup
- NLTK
- Data Cleaning and Interpretation


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>4 | Extracting and collecting business reviews</b></div>

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
reviews = []

pages = np.arange(1, 5, 1)

for page in pages:
    page = requests.get("https://www.trustpilot.com/review/www.apple.com" + "?page=" + str(page))
    soup = BeautifulSoup(page.text, "html.parser")
    
    review_div = soup.find_all("div", class_="styles_reviewContent__0Q2Tg")
    
    for container in review_div:
        raw_content = container.find("p")
        reviews.append(raw_content.text)

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>5 | Exploratory Data Analysis</b></div>

In [3]:
df = pd.DataFrame(np.array(reviews), columns=["review"])

In [4]:
len(df["review"])

80

In [5]:
df["word_count"] = df["review"].apply(lambda x: len(x.split()))

In [6]:
df["char_count"] = df["review"].apply(lambda x: len(x))

In [7]:
def average_words(x):
  words = x.split()
  return sum(len(word) for word in words) / len(words)

In [8]:
df["average_word_length"] = df["review"].apply(lambda x: average_words(x))

In [9]:
from nltk.corpus import stopwords

stop_words = stopwords.words("english")

df["stopword-count"] = df["review"].apply(lambda x: len([word for word in x.split() if word.lower() in stop_words]))

df["stopword-rate"] = df["stopword-count"] / df["word_count"]

In [10]:
df.sort_values(by="stopword-rate")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate
53,"Long waiting times when calling apple support,...",21,132,5.333333,4,0.190476
58,Apple sells very poor quality products. I had...,36,195,4.388889,11,0.305556
54,Outstanding products and outstanding customer ...,16,113,6.125000,5,0.312500
55,Five star service from Apple as always. Really...,25,150,5.040000,8,0.320000
63,Will never use apple again. Have always loved ...,65,344,4.307692,22,0.338462
...,...,...,...,...,...,...
68,Moved countries and will not let me reset pass...,79,404,4.126582,43,0.544304
71,Why on earth would a reputable company use Ube...,207,1091,4.275362,113,0.545894
38,Bought my Apple Ipad on the 5th of September....,64,336,4.250000,35,0.546875
28,My experience it was the worst. No one was abl...,192,962,4.015625,108,0.562500


In [11]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword-count,stopword-rate
count,80.0,80.0,80.0,80.0,80.0
mean,82.8875,451.5625,4.520797,37.8125,0.440984
std,46.159326,247.28904,0.430462,23.725804,0.067629
min,16.0,106.0,3.666667,4.0,0.190476
25%,49.75,291.0,4.251576,19.5,0.4
50%,74.0,396.0,4.477656,34.0,0.442561
75%,111.75,589.75,4.819947,50.25,0.489216
max,207.0,1100.0,6.125,113.0,0.576087


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Data Preprocessing</b></div>

### <b><span style='color:#58A2A8'>6.1</span> | Removing redundant words</b>

In [12]:
df.review

0     Missed half a day of my school work/ research ...
1     Why am I paying for AppleCare if they can't re...
2     Charging you $3500 for 3 newest phones and won...
3     When they work they’re good, but when things a...
4     Apple's customer support is terrible. It's sup...
                            ...                        
75    I have an IPhone 12 Pro Max and out of nowhere...
76    Zero stara, Purchsed a ipad and was required t...
77    apple iphone 7 support will stop in a month , ...
78    Really bad camera for iphone 13, I’m so disapp...
79    I brought my iPhone on the 15th, and already t...
Name: review, Length: 80, dtype: object

In [13]:
# Lower casing
df["lowercase"] = df["review"].apply(lambda x: " ".join(word.lower() for word in x.split()))

In [None]:
# Punctuation
df["punctuation"] = df["lowercase"].str.replace("[^\w\s]", "")

In [15]:
# Stop words 
df["stopwords"] = df["punctuation"].apply(lambda x: " ".join(word for word in x.split() if word not in stop_words))

In [16]:
# Creating a frequency count to track recursive words
pd.Series(" ".join(df["stopwords"]).split()).value_counts()[:30]

apple       148
phone        43
products     29
service      29
customer     28
iphone       28
new          26
support      26
would        23
get          23
years        20
dont         19
time         19
company      18
help         18
store        18
one          16
said         16
use          16
money        15
repair       15
problem      15
need         14
5            14
even         14
issue        14
device       13
back         13
want         13
like         13
dtype: int64

In [17]:
other_stop_words = ["get", "told"] # a lot more can be added, testing required
df["cleaned_review"] = df["stopwords"].apply(lambda x: " ".join(word for word in x.split() if word not in other_stop_words))
pd.Series(" ".join(df["cleaned_review"]).split()).value_counts()[:30]

apple       148
phone        43
products     29
service      29
customer     28
iphone       28
support      26
new          26
would        23
years        20
time         19
dont         19
help         18
store        18
company      18
said         16
use          16
one          16
repair       15
money        15
problem      15
issue        14
5            14
need         14
even         14
device       13
want         13
back         13
like         13
buy          13
dtype: int64

In [18]:
df.head()

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate,lowercase,punctuation,stopwords,cleaned_review
0,Missed half a day of my school work/ research ...,83,457,4.518072,36,0.433735,missed half a day of my school work/ research ...,missed half a day of my school work research w...,missed half day school work research waiting c...,missed half day school work research waiting c...
1,Why am I paying for AppleCare if they can't re...,56,308,4.517857,27,0.482143,why am i paying for applecare if they can't re...,why am i paying for applecare if they cant rep...,paying applecare cant repair airpods max im do...,paying applecare cant repair airpods max im do...
2,Charging you $3500 for 3 newest phones and won...,62,368,4.951613,23,0.370968,charging you $3500 for 3 newest phones and won...,charging you 3500 for 3 newest phones and wont...,charging 3500 3 newest phones wont even provid...,charging 3500 3 newest phones wont even provid...
3,"When they work they’re good, but when things a...",66,431,5.545455,24,0.363636,"when they work they’re good, but when things a...",when they work theyre good but when things are...,work theyre good things wrong theyre direthe f...,work theyre good things wrong theyre direthe f...
4,Apple's customer support is terrible. It's sup...,61,356,4.852459,26,0.42623,apple's customer support is terrible. it's sup...,apples customer support is terrible its supris...,apples customer support terrible suprising rep...,apples customer support terrible suprising rep...


### <b><span style='color:#58A2A8'>6.2</span> | Lemmatization using TextBlob</b>

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')

from textblob import Word

In [20]:
df["lemmatized"] = df["cleaned_review"].apply(lambda x: " ".join(Word(word).lemmatize() for word in x.split()))

## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>7 | Sentiment Analysis</b></div>

In [21]:
from textblob import TextBlob

In [22]:
# polarity and subjectivity metrics -> returned by TextBlob
# polarity: negative (-1) or positive (+1) a review is 
# subjectivity: generic opinion vs factual information 
df["polarity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[0])
df["subjectivity"] = df["lemmatized"].apply(lambda x: TextBlob(x).sentiment[1])    

In [23]:
df.drop(["lowercase", "punctuation", "stopwords", "cleaned_review", "lemmatized"], axis=1, inplace = True)

In [24]:
df.describe()

Unnamed: 0,word_count,char_count,average_word_length,stopword-count,stopword-rate,polarity,subjectivity
count,80.0,80.0,80.0,80.0,80.0,80.0,80.0
mean,82.8875,451.5625,4.520797,37.8125,0.440984,-0.006918335,0.481383
std,46.159326,247.28904,0.430462,23.725804,0.067629,0.2391142,0.215368
min,16.0,106.0,3.666667,4.0,0.190476,-0.8166667,0.0
25%,49.75,291.0,4.251576,19.5,0.4,-0.0875,0.360833
50%,74.0,396.0,4.477656,34.0,0.442561,4.625929e-18,0.49256
75%,111.75,589.75,4.819947,50.25,0.489216,0.1165388,0.607197
max,207.0,1100.0,6.125,113.0,0.576087,0.7,0.95


## <div style="color:white;display:fill;border-radius:5px;background-color:#9DCDD1;overflow:hidden"><p style="padding:20px;color:white;overflow:hidden;font-size:100%;letter-spacing:0.5px;margin:0"><b>6 | Performance and Evaluation</b></div>


In [25]:
df.sort_values(by="polarity")

Unnamed: 0,review,word_count,char_count,average_word_length,stopword-count,stopword-rate,polarity,subjectivity
78,"Really bad camera for iphone 13, I’m so disapp...",27,151,4.629630,10,0.370370,-0.816667,0.805556
6,"The worst customer service, Approach them to f...",81,431,4.333333,36,0.444444,-0.750000,0.750000
68,Moved countries and will not let me reset pass...,79,404,4.126582,43,0.544304,-0.600000,0.900000
57,"Any Apple device is bulls*it, upgrade to iOS 1...",41,228,4.585366,15,0.365854,-0.496212,0.818182
65,Just wanted to say how pathetic it was that I ...,55,294,4.363636,27,0.490909,-0.466667,0.366667
...,...,...,...,...,...,...,...,...
25,The sheer incompetence of Apple's customer sup...,96,582,5.072917,42,0.437500,0.320000,0.557500
33,Long time customer and first class fan of all ...,32,182,4.718750,13,0.406250,0.333333,0.577778
43,"Bought my new iphone 14, works like a charm, v...",24,126,4.291667,9,0.375000,0.393182,0.699495
54,Outstanding products and outstanding customer ...,16,113,6.125000,5,0.312500,0.555556,0.916667
