Game Plan
1. Install Transformers
2. Perform Sentiment Scoring using BERT
3. Scrape reviews from Yelp and Score

How it Works
1. Download and install BERT from Hugginf Face Transformers
2. Run sentiment analysis on reviews
3. Scrape reviews from yelp and score

Install and Import Dependencies

In [None]:
# tokenizer will take the strings and then conver that into sequence of numbers that we can pass to our nlp model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

Instantiate Model

In [None]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Encode and Calculate Sentiment

In [None]:
tokens=tokenizer.encode('Best place ever visited',return_tensors='pt')
# print(tokens)
# print(tokenizer.decode(tokens[0]))

In [None]:
result = model(tokens)
# print(result)

In [None]:
# this is the index of the highest value given by the model +1 means that is the index 0 is
# having highest value then the rating is zero
torch.argmax(result.logits)+1

tensor(5)

Collect Reviews

In [None]:
r=requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont') # gets the html of page
soup=BeautifulSoup(r.text,'html.parser') # parse the content as html
regex=re.compile('.*comment.*') # a regex expression for matching comment
results=soup.find_all('p',{'class':regex}) # find all the p tags which have a class name with the given regex
reviews=[result.text for result in results] #collect all the tesxt inside the tags

Load Reviews into DataFrame and Score

In [None]:
import numpy as np
import pandas as pd

In [None]:
df=pd.DataFrame(np.array(reviews),columns=['review'])
# to view any row print
# df['review].iloc[0]

In [None]:
def sentiment_score(review):
    # Split the review into chunks of 512 tokens
    # this is because the maximum token size that can be passed is 512 only
    max_seq_length = 512
    chunks = [review[i:i+max_seq_length] for i in range(0, len(review), max_seq_length)]

    # Initialize variables to store sentiment scores and counts
    scores = []
    count = 0

    # Process each chunk
    for chunk in chunks:
        tokens = tokenizer.encode(chunk, return_tensors='pt', max_length=max_seq_length, truncation=True)
        result = model(tokens)
        score = int(torch.argmax(result.logits)) + 1
        scores.append(score)
        count += 1

    # Calculate the average sentiment score
    avg_score = round(sum(scores) / count)
    return avg_score

In [None]:
df['sentiment']=df['review'].apply(lambda x: sentiment_score(x))

In [None]:
df

Unnamed: 0,review,sentiment
0,Very cute coffee shop and restaurant. They hav...,4
1,Six of us met here for breakfast before our wa...,4
2,"Great service, lovely location, and really ama...",5
3,Great place with delicious food and friendly s...,5
4,Some of the best Milkshakes me and my daughter...,5
5,Great food amazing coffee and tea. Short walk ...,5
6,It was ok. Had coffee with my friends. I'm new...,3
7,Ricotta hot cakes! These were so yummy. I ate ...,5
8,We came for brunch twice in our week-long visi...,4
9,Great staff and food. Must try is the pan fri...,5
