## Sentiment analysis using BERT model

In this project, we have used the 'bert-base-multilingual-uncased' model by NLPtown which is finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish, and Italian. 

It predicts the sentiment of the review as a number of stars (between 1 and 5).

We are using beautifulsoup4 for data wrangling. The fetched reviews are then feed to the pretrained BERT model to predict the sentiment.

### Install and import all the dependencies

In [1]:
# Make sure you have already installed Pytorch.

In [1]:
#!pip install transformers requests beautifulsoup4 pandas numpy



In [None]:
#!pip install torch
#!pip uninstall torch torchvision torchaudio

In [None]:
#!pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0

In [4]:
#import dependencies
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

### Load the tokenizer and Model

In [5]:
# Load BERT model directly
tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

  return torch.load(checkpoint_file, map_location="cpu")


### Encode and calculate sentiment

In [64]:
tokens = tokenizer.encode("I am okay", return_tensors='pt')

In [65]:
tokens[0]

tensor([  101,   151, 10345, 44810, 10158,   102])

In [66]:
print(tokenizer.decode(tokens[0]))

[CLS] i am okay [SEP]


In [67]:
result = model(tokens)

In [68]:
result
#The output(logits) from the model is a one hot encoded list of scores.
#The position having the highest score represents the sentiment rating for that sentence.
#The ratings are in the order [1,2,3,4,5].

SequenceClassifierOutput(loss=None, logits=tensor([[-1.9279,  0.4052,  2.4736,  0.6154, -1.4515]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [69]:
int(torch.argmax(result.logits)+1)

3

### Fetch Reviews

In [116]:
r = requests.get('https://www.yelp.com/biz/the-butcher-shop-by-niku-steakhouse-san-francisco?osq=Burgers')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p',{'class':regex})
reviews = [result.text for result in results]

In [117]:
results[0].text

'Hands down the best burger and customer service. Good place for family outing. The orders do arrive little late but not that bad.'

### Load the reviews in a dataframe and then predict the sentiment

In [132]:
import numpy as np
import pandas as pd

In [133]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [134]:
df

Unnamed: 0,review
0,Hands down the best burger and customer servic...
1,The best wagyu burger I've had in SF. Tried b...
2,This review is strictly based on their burger ...
3,5-stars all the way! I've been waiting for thi...
4,4.5 stars--Came here on a Friday afternoon and...
5,Came by around 3:30pm on a Friday for the burg...
6,Came here for lunch on a Friday afternoon with...
7,Been to Niku a few times but finally got to tr...
8,An excellent spot to pick up high quality meat...
9,This review is solely for the burger and fries...


In [135]:
df['review'].iloc[0]

'Hands down the best burger and customer service. Good place for family outing. The orders do arrive little late but not that bad.'

In [136]:
df.describe

<bound method NDFrame.describe of                                               review
0  Hands down the best burger and customer servic...
1  The best wagyu burger I've had in SF.  Tried b...
2  This review is strictly based on their burger ...
3  5-stars all the way! I've been waiting for thi...
4  4.5 stars--Came here on a Friday afternoon and...
5  Came by around 3:30pm on a Friday for the burg...
6  Came here for lunch on a Friday afternoon with...
7  Been to Niku a few times but finally got to tr...
8  An excellent spot to pick up high quality meat...
9  This review is solely for the burger and fries...>

In [137]:
def sentiment_score_function(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [140]:
df['sentiment_score'] = df['review'].apply(lambda x: sentiment_score_function(x[:512]))

In [141]:
df

Unnamed: 0,review,sentiment_score
0,Hands down the best burger and customer servic...,4
1,The best wagyu burger I've had in SF. Tried b...,5
2,This review is strictly based on their burger ...,2
3,5-stars all the way! I've been waiting for thi...,5
4,4.5 stars--Came here on a Friday afternoon and...,4
5,Came by around 3:30pm on a Friday for the burg...,4
6,Came here for lunch on a Friday afternoon with...,5
7,Been to Niku a few times but finally got to tr...,4
8,An excellent spot to pick up high quality meat...,5
9,This review is solely for the burger and fries...,3
