# Sentiment Analysis with BERT Model (and Web-Scraping)

**Big-Picture Steps:**

1) Firstly we will scrape customer recommendations from the web using BeautifulSoup

2) Then we will pre-process the text

3) Lastly, we will fine-tune an LLM model which will perform the sentiment scoring

# 1. Installing and Importing packages

In [55]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


In [56]:
!pip install transformers requests beautifulsoup4



In [58]:
# IMPORTS:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
import re
from bs4 import BeautifulSoup

# 2. Getting the Reviews from a Website using BeautifulSoup

In [68]:
# Web-Scraping using BeautifulSoup
request = requests.get("https://www.yelp.com/biz/mejico-sydney-2") # Getting a request to the website
soup = BeautifulSoup(request.text, "html.parser") # Getting all the info from the website

regex = re.compile(".*comment.*") # We are looking for anything that has a comment class
results = soup.find_all("p", {"class": regex}) # Extracting all of the "comment" classes from the website
reviews = [result.text for result in results] # Getting rid of the HTML tags, to extract just the text
reviews

['The food is fresh and tasty.  The scallop ceviche started the lunch. The scallops were tender with a great acidity and use of mango and peppers. The steak was tender and I got the hint of tequila in the sauce. I enjoyed a watermelon salad that complimented the the steak. The portions are good, but a stretch if you are sharing. My only down point is the service. They really only showed up to present my next plate and never checked to see if I wanted another drink (which I did).Enjoyed the food.',
 'The food was decent not great..  We had the guacamole which was bland and came with some type of plantain chips.. The chicken and steak tacos were good.. But the service was poor. We had a waitress with an attitude. She seemed upset whenever we asked for anything.  She would walk by and just stick up her hand and say " just wait ".  She spilled the ingredients to make the guacamole all over the table but never apologized. The waitress didn\'t come by at all, not even once to check on us.. I

In [69]:
reviews[0]

'The food is fresh and tasty.  The scallop ceviche started the lunch. The scallops were tender with a great acidity and use of mango and peppers. The steak was tender and I got the hint of tequila in the sauce. I enjoyed a watermelon salad that complimented the the steak. The portions are good, but a stretch if you are sharing. My only down point is the service. They really only showed up to present my next plate and never checked to see if I wanted another drink (which I did).Enjoyed the food.'

# 3. Loading the Reviews into a DataFrame

In [70]:
import numpy as np
import pandas as pd

In [71]:
df = pd.DataFrame(np.array(reviews), columns = ["review"])
df

Unnamed: 0,review
0,The food is fresh and tasty. The scallop cevi...
1,The food was decent not great.. We had the gu...
2,"Food was okay, guacamole was below average. Se..."
3,The food and service here was really good. It...
4,Visiting from Texas and decided to give this r...
5,Don't come here expecting legit Mexican food b...
6,Out of all the restaurants that I tried in Syd...
7,"Great atmosphere, attentive service, solid mar..."
8,We came here on a Thursday night @ 5pm and by ...
9,Have been here twice and have absolutely loved...


# 4. Loading the Pre-trained Model and Tokenizer

In [62]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

In [65]:
# Trying the model out:
tokens = tokenizer.encode("I hated this, absolutely the worst.", return_tensors = "pt")
result = model(tokens)
result

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.8054,  1.9404, -0.7781, -3.0589, -2.1787]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [66]:
result.logits

tensor([[ 4.8054,  1.9404, -0.7781, -3.0589, -2.1787]],
       grad_fn=<AddmmBackward0>)

- Logits give us the probabilities that the certain classes are the actual sentiment



In [None]:
# Getting the highest probability (the highest value of the logits)
torch.argmax(result.logits)

# 5. Applying the Model to our Reviews

In [72]:
# Creating a function which will perform the Sentiment Analysis
def sentiment_score(review):
  tokens = tokenizer.encode(review, return_tensors = "pt")
  result = model(tokens)
  sentiment_score = int(torch.argmax(result.logits)) + 1 # This just takes the result and gives is a sentiment score 1-5
  return sentiment_score

In [74]:
# Creating the Sentiment Analysis for all of the reviews and storing them back into the DataFrame
df["sentiment"] = df["review"].apply(lambda x: sentiment_score(x[:512]))
df

Unnamed: 0,review,sentiment
0,The food is fresh and tasty. The scallop cevi...,4
1,The food was decent not great.. We had the gu...,2
2,"Food was okay, guacamole was below average. Se...",2
3,The food and service here was really good. It...,5
4,Visiting from Texas and decided to give this r...,5
5,Don't come here expecting legit Mexican food b...,3
6,Out of all the restaurants that I tried in Syd...,5
7,"Great atmosphere, attentive service, solid mar...",3
8,We came here on a Thursday night @ 5pm and by ...,4
9,Have been here twice and have absolutely loved...,5
