# <u> **BERT Sentiment Analysis on Yelp Reviews** </u>

BERT sentiment analysis is a natural language processing (NLP) technique that uses BERT, a pre-trained deep learning model, to analyze and classify the sentiment or emotional tone expressed in text data. The process usually aims to determine whether a given piece of text conveys a positive, negative, neutral sentiment.

Sentiment analysis can be highly beneficial for businesses for several reasons:

`Understanding Customer Feedback`
> Sentiment analysis helps businesses gain insights into how customers feel about their products, services, or brand. Analyzing customer reviews, social media comments, and other text data allows companies to understand the sentiment behind the feedback.

`Product Improvement`
> Sentiment analysis provides valuable feedback on what customers like or dislike about products or services. This information can be used to make product improvements, add new features, or refine existing ones.

`Brand Reputation Management`
> Monitoring sentiment allows businesses to proactively manage their online reputation. By addressing negative sentiment and promoting positive feedback, a business can influence its online image.

`Market Research`
> Sentiment analysis is a valuable tool for market research. It can reveal emerging trends, consumer preferences, and shifting market dynamics, helping businesses make informed decisions.

`Customer Segmentation`
> Sentiment analysis can be used to segment customers based on their sentiment. This can help businesses tailor marketing and communication strategies to different customer groups.

`Product Launches`
> Sentiment analysis can provide feedback on new product launches. By monitoring initial reactions, companies can adjust their marketing and product strategies accordingly.




####**How it works:**

1.   BERT Pre-trained Model

The BERT model has been pre-trained on a large corpus of text data. During this pre-training, BERT learns to understand the context and relationships between words and phrases in a bidirectional manner, capturing rich semantic information.

2.   Fine-Tuning

To perform sentiment analysis, the pre-trained BERT model will be fine-tuned on a labeled sentiment analysis dataset. This fine-tuning involves training the model on a specific task, such as classifying text into sentiment categories.

3. Text Encoding

When providing a piece of text to the BERT sentiment analysis model, it encodes the text into numerical representations that it can understand. BERT uses WordPiece tokenization, which splits text into subword tokens, enabling it to handle a wide range of words and languages.

4. Sentiment Classification

The encoded text is then fed into the fine-tuned BERT model, which outputs a probability distribution over the possible sentiment categories. Common sentiment categories include positive, negative, neutral, and sometimes more fine-grained categories. The model assigns a sentiment label to the text based on the highest probability.

<p>&nbsp;</p>

*In this project I will firstly initialise the pretrained model, then scrape and gather review data from Yelp using Beautifulsoup. Finally, after the text data has been encoded, I will feed it into the fine-tuned BERT Model for sentiment analysis.*


In [None]:
!pip install transformers requests beautifulsoup4

Collecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m86.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m64.7 MB/s[0m eta [36m0:00:00[0m
Colle

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd

## **1. Instantiate Model**

In [None]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading (…)okenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

## **2. Encode and Calculate Sentiment**

In [None]:
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')

In [None]:
result = model(tokens)

In [None]:
result.logits

tensor([[-2.7768, -1.2353,  1.4419,  1.9804,  0.4584]],
       grad_fn=<AddmmBackward0>)

In [None]:
int(torch.argmax(result.logits))+1

4

## **4. Collect Reviews**

In [None]:
#Collecting reviews from a restuarant on yelp
r = requests.get('https://www.yelp.com/biz/honest-burgers-meard-st-soho-london?osq=Burgers')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [None]:
reviews

["Chain store / franchise but great food with plenty of options - didn't know it was a chain",
 "Two stars only for the burger which I liked and came twice for. ( honest burger without bacon or pickles)Fries were so small but they taste like so damn good.Staff weren't friendly at all especially the whole girl with the long hair she was not very much of a help neither friendly and only doing that for customers she wants to do so for.Bathroom were so dirty and small you can barely fit in there and close your eyes before while washing your hands.Prices were reasonable",
 'Honestly, very overrated. I watched one Eater YouTube of a Los Angelino \xa0freaking out about this burger and he needs to turn in his L.A. credentials pronto. He said "people in L.A. would be lining up out the door" - get real man. This is pretty flat flavored burger that in no way competes with LA fast food legends like In & Out and more classic artisan offering like the legendary Father\'s Office Korean-influenced san

## **5. Load Reviews into DataFrame and Score**

In [None]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [None]:
df['review'].iloc[0]

"Chain store / franchise but great food with plenty of options - didn't know it was a chain"

In [None]:
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [None]:
sentiment_score(df['review'].iloc[1])

2

In [None]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

In [None]:
df

Unnamed: 0,review,sentiment
0,Chain store / franchise but great food with pl...,4
1,Two stars only for the burger which I liked an...,2
2,"Honestly, very overrated. I watched one Eater ...",2
3,"Smashed burger and salad for 9 pounds, sign me...",1
4,They make GOOD burgers. All the flavors were m...,3
5,It was literally in a disguised pub in an alle...,4
6,In Soho you can find this very nice place to h...,5
7,Love their burgers. One of my favorites in Lon...,5
8,Loved the burgers and fries so delicious we we...,5
9,"Honestly, we sat down, waited ten minutes for ...",1
