# Sentiment Analysis On Yelp Reviews Using BERT Neural Network and Python

### 1. Installing Dependencies. Ensuring we will have everything we will need.

Using pip3 to install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

In [1]:
%%capture
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

In [2]:
%%capture
!pip install transformers requests beautifulsoup4 pandas numpy

In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

### 2. Instantiating Model

In [4]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

### 3. Encoding and Calculating Sentiment

In [5]:
tokens = tokenizer.encode('The food was okay but couldve been better. Great', return_tensors= 'pt')

In [6]:
tokens[0]

tensor([  101, 10103, 15225, 10140, 44810, 10158, 10502, 12296, 10598, 10662,
        16197,   119, 11838,   102])

In [7]:
tokenizer.decode(tokens[0])

'[CLS] the food was okay but couldve been better. great [SEP]'

In [8]:
result = model(tokens)

In [9]:
result

SequenceClassifierOutput(loss=None, logits=tensor([[-2.9391, -1.1178,  1.7044,  1.9922,  0.1824]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [10]:
 result.logits

tensor([[-2.9391, -1.1178,  1.7044,  1.9922,  0.1824]],
       grad_fn=<AddmmBackward0>)

In [11]:
torch.argmax(result.logits)

tensor(3)

In [12]:
int(torch.argmax(result.logits))+1

4

### 4. Web Scraping Reviews from Yelp

In [13]:
r = requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [14]:
reviews

['Some of the best Milkshakes me and my daughter ever tasted. MMMMMM HMMMMMMMM.',
 "Six of us met here for breakfast before our walk to Manly. We were enjoying visiting with each other so much that I apologize for not taking any photos. We all enjoyed our food, as well as our coffee and tea drinks.We were greeted immediately by a friendly server asking if we would like to sit inside or out. We said we would like inside, but weren't exactly sure how many were joining us yet- at least 4. We were told this was no problem, the more the merrier. A few minutes later when 4 more joined our party and we explained to the server we had 6, he just quickly switched our table. I really enjoyed my serenity tea, just what I needed after a long flight in from Sfo that morning. Everyone else were more interested in the lattes for expresso drinks. All said they were hot and delicious. 2 of us ordered the avo on toast. So yummy with the beetroot... I will start adding this to mine now at home, and have f

### 5. Analyzing the data using nlptown bert-base-multilingual-uncased-sentiment model

In [17]:
import numpy as np
import pandas as pd

In [18]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [19]:
df.tail(7)

Unnamed: 0,review
3,Great food amazing coffee and tea. Short walk ...
4,It was ok. Had coffee with my friends. I'm new...
5,Ricotta hot cakes! These were so yummy. I ate ...
6,Great staff and food. Must try is the pan fri...
7,We came for brunch twice in our week-long visi...
8,I came to Social brew cafe for brunch while ex...
9,It was ok. The coffee wasn't the best but it w...


In [20]:
df['review'].iloc[0]

'Some of the best Milkshakes me and my daughter ever tasted. MMMMMM HMMMMMMMM.'

Parsing the review through tokenizer and later model. Below is the relevant function.

In [21]:
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

Parsing the second review through the function.

In [22]:
sentiment_score(df['review'].iloc[1])

4

Now lets do sentimentality analysis for all Yelp reviews collected.

In [23]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

In [24]:
df

Unnamed: 0,review,sentiment
0,Some of the best Milkshakes me and my daughter...,5
1,Six of us met here for breakfast before our wa...,4
2,Great place with delicious food and friendly s...,5
3,Great food amazing coffee and tea. Short walk ...,5
4,It was ok. Had coffee with my friends. I'm new...,3
5,Ricotta hot cakes! These were so yummy. I ate ...,5
6,Great staff and food. Must try is the pan fri...,5
7,We came for brunch twice in our week-long visi...,4
8,I came to Social brew cafe for brunch while ex...,5
9,It was ok. The coffee wasn't the best but it w...,3


Sentiment (5) represents the most positive opinion while (1) represents the most negative. A neutral sentimentality is indicated by (3).

In [25]:
df['review'].iloc[9]

"It was ok. The coffee wasn't the best but it was fine. The relish on the breakfast roll was yum which did make it sing. So perhaps I just got a bad coffee but the food was good on my visit."

Thats the end, thank you!