## Import Dependencies

In [49]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
from scipy.special import softmax
import pandas as pd
import numpy as np

## Instantiate Model

In [2]:
MODEL = 'nlptown/bert-base-multilingual-uncased-sentiment'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Downloading (…)okenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

## Encode and Calculate Sentiment

In [13]:
tokens = tokenizer.encode("I hated this, absolutely the worst", return_tensors="pt")
output = model(tokens)

In [26]:
output.logits #values represent the probability of sentiments, 4.8750 represents the score 1 - negative sentiment

tensor([[ 4.8750,  1.7880, -0.8356, -3.0027, -2.0727]],
       grad_fn=<AddmmBackward0>)

In [31]:
result = int(torch.argmax(output.logits)) + 1 #the index of the highest value, or the most negative sentiment
result

1

In [33]:
#Trying it with a different string
tokens = tokenizer.encode("This was amazing, I loved it. GREAT!", return_tensors = "pt")
output = model(tokens)

In [37]:
result = int(torch.argmax(output.logits)) + 1
result #score 5 so maximum positive result

5

The higher the score, the better the sentiment, the lower, the worse the sentiment!

## Collect Reviews

In [45]:
#get reviews from Yielp
r = requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont') #it grabs our webpage

soup = BeautifulSoup(r.text, 'html.parser') #we get the text from that respective webpage

regex = re.compile('.*comment.*') #we are looking for every class that contains comment in it

results = soup.find_all('p', {'class':regex}) #through soup, we get all paragraphs 'p' inside the matched class 

reviews = [results.text for results in results] #we ignore the html tags

In [46]:
reviews

["Great coffee and vibe. That's all \xa0you need. Crab was outstanding but not good finger food like a taco should be. Really want to try the pork belly sandwich - looked excellent. This became my go to breakfast place in Darling harbor. Had the avocado salmon salad breakfast and it was excellent. Service has been excellent.",
 "Great coffee and vibe. That's all \xa0you need. Crab was outstanding but not good finger food like a taco should be. Really want to try the pork belly sandwich - looked excellent.",
 'Great food amazing coffee and tea. Short walk from the harbor. Staff was very friendly',
 'Great staff and food. \xa0Must try is the pan fried Gnocchi! \xa0The staff were really friendly and the coffee was good as well',
 "Ricotta hot cakes! These were so yummy. I ate them pretty fast and didn't share with anyone because they were that good ;). I ordered a green smoothie to balance it all out. Smoothie was a nice way to end my brekkie at this restaurant. Others with me ordered the

## Load Reviews into DataFrame and Score

In [65]:
def polarity_scores_bert(example):
    
    
    encoded_text = tokenizer.encode(example, return_tensors="pt")
    output = model(encoded_text)
    result = int(torch.argmax(output.logits)) + 1
    return result

In [60]:
df = pd.DataFrame(np.array(reviews), columns = ['Reviews'])
df = df.reset_index().rename(columns = {'index':'ID'})

In [77]:
df.shape

(11, 2)

In [73]:
bert_results_list = []

for i, row in df.iterrows():
    try:
        ID = row['ID']   
        bert_results = polarity_scores_bert(row['Reviews'])
        bert_results_list.append(bert_results)
    except RuntimeError:
        print('Broke for text: ',ID)

Broke for text:  8
Broke for text:  10


In [89]:
df = df.drop(index = [8,10], axis = 0) #dropping rows with very long texts

In [96]:
df['Scores'] = np.array(bert_results_list) #Storing Scores as a separate column in the dataframe

In [97]:
df

Unnamed: 0,ID,Reviews,Scores
0,0,Great coffee and vibe. That's all you need. C...,5
1,1,Great coffee and vibe. That's all you need. C...,4
2,2,Great food amazing coffee and tea. Short walk ...,5
3,3,Great staff and food. Must try is the pan fri...,5
4,4,Ricotta hot cakes! These were so yummy. I ate ...,5
5,5,I came to Social brew cafe for brunch while ex...,5
6,6,We came for brunch twice in our week-long visi...,5
7,7,It was ok. The coffee wasn't the best but it w...,3
9,9,This place is a gem. The ambiance is to die fo...,4
