# Sentiment Analyser For Yelp Reviews

# STEP 1 – GETTING DATA

use the python Requests module to make a request to the website where the reviews are located and then use BeautifulSoup to traverse (read search through) the result to extract what you need.

In [9]:
# Import Requests
import requests

from bs4 import BeautifulSoup
from urllib.request import urlopen

In [2]:
url= 'https://www.yelp.com/biz/tesla-san-francisco?osq=Tesla+Dealership'

In [3]:
# Execute request
# If you’re using a different site just replace the url e.g. r=requests.get(‘put your url in here’)
r = requests.get('https://www.yelp.com/biz/tesla-san-francisco?osq=Tesla+Dealership')

In [4]:
# Check request status
print(r.status_code) #If this returns anything other than 200, check that the url you’ve got is valid and correctly formed.

200


Assuming that all went well and you’ve got a status code of 200, you can view the result by accessing the text attribute of the request.

In [5]:
# Check result
r.text

'<!DOCTYPE html><html lang="en-US" prefix="og: http://ogp.me/ns#" style="margin: 0;padding: 0; border: 0; font-size: 100%; font: inherit; vertical-align: baseline;"><head><script>document.documentElement.className=document.documentElement.className.replace(/\x08no-js\x08/,"js");</script><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><meta http-equiv="Content-Language" content="en-US" /><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><link rel="mask-icon" sizes="any" href="https://s3-media0.fl.yelpcdn.com/assets/srv0/yelp_large_assets/b2bb2fb0ec9c/assets/img/logos/yelp_burst.svg" content="#FF1A1A"><link rel="shortcut icon" href="https://s3-media0.fl.yelpcdn.com/assets/srv0/yelp_large_assets/dcfe403147fc/assets/img/logos/favicon.ico"><script> window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;window.ygaPageStartTime=new Date().getTime();</script><script>\n            window.yelp = window.yelp || {};\

In [6]:
client = urlopen(url)

In [7]:
#Getting the HTML Code of the Full Page

html = client.read()

html

b'<!DOCTYPE html><html lang="en-US" prefix="og: http://ogp.me/ns#" style="margin: 0;padding: 0; border: 0; font-size: 100%; font: inherit; vertical-align: baseline;"><head><script>document.documentElement.className=document.documentElement.className.replace(/\x08no-js\x08/,"js");</script><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><meta http-equiv="Content-Language" content="en-US" /><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><link rel="mask-icon" sizes="any" href="https://s3-media0.fl.yelpcdn.com/assets/srv0/yelp_large_assets/b2bb2fb0ec9c/assets/img/logos/yelp_burst.svg" content="#FF1A1A"><link rel="shortcut icon" href="https://s3-media0.fl.yelpcdn.com/assets/srv0/yelp_large_assets/dcfe403147fc/assets/img/logos/favicon.ico"><script> window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;window.ygaPageStartTime=new Date().getTime();</script><script>\n            window.yelp = window.yelp || {};

In [10]:
# Make the soup
soup = BeautifulSoup(r.text, 'html.parser')

In [11]:
# First get all of the review-content divs
results = soup.findAll(class_='review-content')

In [12]:
# Function to get reviews from a Yelp page
def get_reviews(yelp_url):
    # Send a GET request to the Yelp URL
    response = requests.get(yelp_url)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse the HTML content of the page
        soup = BeautifulSoup(response.text, 'html.parser')

        # Find the reviews on the page
        reviews = soup.find_all('span', class_='raw__09f24__T4Ezm', lang='en')

        # Extract text content from reviews
        reviews_text = [review.get_text(strip=True) for review in reviews]

        return reviews_text
    else:
        print(f"Error: Unable to retrieve data. Status code: {response.status_code}")
        return None

In [14]:
# Import pandas
import pandas as pd

#Import numpy
import numpy as np

In [15]:
# Get reviews
reviews = get_reviews(url)

# Create a DataFrame with the reviews
df = pd.DataFrame({'reviews': reviews})

# Save the DataFrame to a CSV file
df.to_csv('tesla_reviews.csv', index=False)

In [16]:
df.head()

Unnamed: 0,reviews
0,I don't usually write too many reviews but thi...
1,Tesla comes with self drive as long as you hav...
2,Helena KElon Musk!Is climbing the highest moun...
3,Wow! The best tesla service center I have ever...
4,In a nutshell: Tesla sucks! I leased one of th...


# STEP 2 – ANALYSING THE REVIEWS

To make life easier, let’s take the reviews and convert them into a dataframe. For that you’ll need to import pandas and numpy.

# We’re going to calculate four metrics in total for each review:

1. Word Count – total number of words in each review

In [17]:
# Calculate word count
df['word_count'] = df['reviews'].apply(lambda x: len(str(x).split(" ")))

In [18]:
df.head()

Unnamed: 0,reviews,word_count
0,I don't usually write too many reviews but thi...,183
1,Tesla comes with self drive as long as you hav...,42
2,Helena KElon Musk!Is climbing the highest moun...,124
3,Wow! The best tesla service center I have ever...,109
4,In a nutshell: Tesla sucks! I leased one of th...,135


2. Character Count – total number of characters in each review

In [19]:
# Calculate character count
df['char_count'] = df['reviews'].str.len()

In [20]:
df.head()

Unnamed: 0,reviews,word_count,char_count
0,I don't usually write too many reviews but thi...,183,945
1,Tesla comes with self drive as long as you hav...,42,219
2,Helena KElon Musk!Is climbing the highest moun...,124,743
3,Wow! The best tesla service center I have ever...,109,531
4,In a nutshell: Tesla sucks! I leased one of th...,135,733


3. Average word length – the average length of words used

In [21]:
def avg_word(review):
  words = review.split()
  return (sum(len(word) for word in words) / len(words))

# Calculate average words
df['avg_word'] = df['reviews'].apply(lambda x: avg_word(x))

In [22]:
df.head()

Unnamed: 0,reviews,word_count,char_count,avg_word
0,I don't usually write too many reviews but thi...,183,945,4.286517
1,Tesla comes with self drive as long as you hav...,42,219,4.238095
2,Helena KElon Musk!Is climbing the highest moun...,124,743,5.04065
3,Wow! The best tesla service center I have ever...,109,531,3.880734
4,In a nutshell: Tesla sucks! I leased one of th...,135,733,4.437037


4. Stopword Count – total number of words which are considered stop words

In [23]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Magic00\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


True

In [24]:
# Import stopwords
import nltk
from nltk.corpus import stopwords

In [25]:
# Calculate number of stop words
stop_words = stopwords.words('english')
df['stopword_coun'] = df['reviews'].apply(lambda x: len([x for x in x.split() if x in stop_words]))

In [26]:
df.head()

Unnamed: 0,reviews,word_count,char_count,avg_word,stopword_coun
0,I don't usually write too many reviews but thi...,183,945,4.286517,73
1,Tesla comes with self drive as long as you hav...,42,219,4.238095,18
2,Helena KElon Musk!Is climbing the highest moun...,124,743,5.04065,41
3,Wow! The best tesla service center I have ever...,109,531,3.880734,47
4,In a nutshell: Tesla sucks! I leased one of th...,135,733,4.437037,56


In [27]:
df.describe()

Unnamed: 0,word_count,char_count,avg_word,stopword_coun
count,11.0,11.0,11.0,11.0
mean,98.0,524.181818,4.341495,38.818182
std,42.223216,232.639127,0.389338,17.668153
min,42.0,219.0,3.728814,14.0
25%,65.0,313.0,4.076794,24.0
50%,109.0,531.0,4.39823,41.0
75%,118.5,682.5,4.564087,46.5
max,183.0,945.0,5.04065,73.0


# STEP 3 – CLEANING THE DATA SET

In [29]:
# Lower case all words
df['review_lower'] = df['reviews'].apply(lambda x: " ".join(x.lower() for x in x.split()))

In [30]:
# Remove Punctuation
df['review_nopunc'] = df['review_lower'].str.replace('[^\w\s]', '')

  df['review_nopunc'] = df['review_lower'].str.replace('[^\w\s]', '')


In [31]:
stop_words = stopwords.words('english')

# Remove Stopwords
df['review_nopunc_nostop'] = df['review_nopunc'].apply(lambda x: " ".join(x for x in x.split() if x not in stop_words))

In [33]:
df.head()

Unnamed: 0,reviews,word_count,char_count,avg_word,stopword_coun,review_lower,review_nopunc,review_nopunc_nostop
0,I don't usually write too many reviews but thi...,183,945,4.286517,73,i don't usually write too many reviews but thi...,i dont usually write too many reviews but this...,dont usually write many reviews one well deser...
1,Tesla comes with self drive as long as you hav...,42,219,4.238095,18,tesla comes with self drive as long as you hav...,tesla comes with self drive as long as you hav...,tesla comes self drive long hands wheel today ...
2,Helena KElon Musk!Is climbing the highest moun...,124,743,5.04065,41,helena kelon musk!is climbing the highest moun...,helena kelon muskis climbing the highest mount...,helena kelon muskis climbing highest mount wor...
3,Wow! The best tesla service center I have ever...,109,531,3.880734,47,wow! the best tesla service center i have ever...,wow the best tesla service center i have ever ...,wow best tesla service center ever previous ex...
4,In a nutshell: Tesla sucks! I leased one of th...,135,733,4.437037,56,in a nutshell: tesla sucks! i leased one of th...,in a nutshell tesla sucks i leased one of thei...,nutshell tesla sucks leased one model ys 2021 ...


In [35]:
# Return frequency of values
freq= pd.Series(" ".join(df['review_nopunc_nostop']).split()).value_counts()[:30]
freq

service        13
tesla          12
one            10
car             9
even            6
new             5
get             4
model           4
center          4
appointment     4
time            4
took            3
mobile          3
got             3
alex            3
fixed           3
person          3
back            3
next            3
insurance       3
amazing         3
month           3
safety          3
dont            3
teo             3
price           3
make            3
beyond          3
great           3
call            3
dtype: int64

In [36]:
other_stopwords = ['get', 'us', 'see', 'use', 'said', 'asked', 'day', 'go' \
  'even', 'ive', 'right', 'left', 'always', 'would', 'told', \
  'get', 'us', 'would', 'get', 'one', 'ive', 'go', 'even', \
  'also', 'ever', 'x', 'take', 'let' ]

In [37]:
df['review_nopunc_nostop_nocommon'] = df['review_nopunc_nostop'].apply(lambda x: "".join(" ".join(x for x in x.split() if x not in other_stopwords)))

In [38]:
df.head()

Unnamed: 0,reviews,word_count,char_count,avg_word,stopword_coun,review_lower,review_nopunc,review_nopunc_nostop,review_nopunc_nostop_nocommon
0,I don't usually write too many reviews but thi...,183,945,4.286517,73,i don't usually write too many reviews but thi...,i dont usually write too many reviews but this...,dont usually write many reviews one well deser...,dont usually write many reviews well deserved ...
1,Tesla comes with self drive as long as you hav...,42,219,4.238095,18,tesla comes with self drive as long as you hav...,tesla comes with self drive as long as you hav...,tesla comes self drive long hands wheel today ...,tesla comes self drive long hands wheel today ...
2,Helena KElon Musk!Is climbing the highest moun...,124,743,5.04065,41,helena kelon musk!is climbing the highest moun...,helena kelon muskis climbing the highest mount...,helena kelon muskis climbing highest mount wor...,helena kelon muskis climbing highest mount wor...
3,Wow! The best tesla service center I have ever...,109,531,3.880734,47,wow! the best tesla service center i have ever...,wow the best tesla service center i have ever ...,wow best tesla service center ever previous ex...,wow best tesla service center previous experie...
4,In a nutshell: Tesla sucks! I leased one of th...,135,733,4.437037,56,in a nutshell: tesla sucks! i leased one of th...,in a nutshell tesla sucks i leased one of thei...,nutshell tesla sucks leased one model ys 2021 ...,nutshell tesla sucks leased model ys 2021 tech...


# STEP 4 – SENTIMENT ANALYSIS

In [40]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Magic00\AppData\Roaming\nltk_data...


True

In [43]:
# Calculate polarity
from textblob import TextBlob
df['polarity'] = df['review_nopunc_nostop_nocommon'].apply(lambda x: TextBlob(x).sentiment[0])

In [45]:
# Calculate subjectivity
df['subjectivity'] = df['review_nopunc_nostop_nocommon'].apply(lambda x: TextBlob(x).sentiment[1])

In [46]:
df.head()

Unnamed: 0,reviews,word_count,char_count,avg_word,stopword_coun,review_lower,review_nopunc,review_nopunc_nostop,review_nopunc_nostop_nocommon,polarity,subjectivity
0,I don't usually write too many reviews but thi...,183,945,4.286517,73,i don't usually write too many reviews but thi...,i dont usually write too many reviews but this...,dont usually write many reviews one well deser...,dont usually write many reviews well deserved ...,0.310969,0.546655
1,Tesla comes with self drive as long as you hav...,42,219,4.238095,18,tesla comes with self drive as long as you hav...,tesla comes with self drive as long as you hav...,tesla comes self drive long hands wheel today ...,tesla comes self drive long hands wheel today ...,0.225,0.7
2,Helena KElon Musk!Is climbing the highest moun...,124,743,5.04065,41,helena kelon musk!is climbing the highest moun...,helena kelon muskis climbing the highest mount...,helena kelon muskis climbing highest mount wor...,helena kelon muskis climbing highest mount wor...,0.285,0.606667
3,Wow! The best tesla service center I have ever...,109,531,3.880734,47,wow! the best tesla service center i have ever...,wow the best tesla service center i have ever ...,wow best tesla service center ever previous ex...,wow best tesla service center previous experie...,0.166667,0.335606
4,In a nutshell: Tesla sucks! I leased one of th...,135,733,4.437037,56,in a nutshell: tesla sucks! i leased one of th...,in a nutshell tesla sucks i leased one of thei...,nutshell tesla sucks leased one model ys 2021 ...,nutshell tesla sucks leased model ys 2021 tech...,-0.159091,0.496212


In [47]:
df[['reviews','polarity','subjectivity']]

Unnamed: 0,reviews,polarity,subjectivity
0,I don't usually write too many reviews but thi...,0.310969,0.546655
1,Tesla comes with self drive as long as you hav...,0.225,0.7
2,Helena KElon Musk!Is climbing the highest moun...,0.285,0.606667
3,Wow! The best tesla service center I have ever...,0.166667,0.335606
4,In a nutshell: Tesla sucks! I leased one of th...,-0.159091,0.496212
5,I waited for 25 mins and no one even acknowled...,-0.081818,0.45
6,I took back my 2018 Model 3 last month for saf...,0.2,0.313333
7,"My model 3 got punctured on monday, slow leak ...",0.017576,0.436364
8,"Well, I had an issue with my Tesla. Took it in...",-0.025,0.266667
9,Nick has been amazing in educating us about th...,0.502066,0.64022
