### Install and Import Dependencies

In [1]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu112

Looking in indexes: https://download.pytorch.org/whl/cu112


ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision


In [3]:
!pip install transformers requests beautifulsoup4 pandas numpy 

"""
transformers: BERT nlp model; 
requests: requests to the yelp site; 
beautifulsoup4: soup that we can return from the page & extract the data that we actually need
"""





In [4]:
#import
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re




### Create Instance of the Model

In [5]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

### Calculate Sentiment

In [13]:
tokens = tokenizer.encode('I love this, awesome!', return_tensors='pt')

In [14]:
tokens[0] #can't pass through a list of lists

tensor([  101,   151, 11157, 10372,   117, 37079, 42279, 10688,   106,   102])

In [15]:
tokenizer.decode(tokens[0])

'[CLS] i love this, awesome! [SEP]'

In [16]:
result = model(tokens)
result #one hot encoded list of scores

SequenceClassifierOutput(loss=None, logits=tensor([[-2.3948, -2.8835, -1.3411,  1.2283,  4.3962]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [17]:
#extract score value
int(torch.argmax(result.logits))+1 
#higher number-better sentiment, lower number-worst sentiment

5

### Collect Reviews

In [41]:
r = requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [48]:
reviews[0]

"Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. I definitely enjoyed this cafe, the outdoor seating, the service and the food!!"

In [49]:
results[0] #wrapped inside paragraph

<p class="comment__09f24__D0cxf css-qgunke"><span class="raw__09f24__T4Ezm" lang="en">Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. <br/><br/>The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. <br/><br/>I definitely enjoyed this cafe, the outdoor seating, the service and the food!!</span></p>

In [46]:
#to retrieve just the text
results[0].text

"Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. I definitely enjoyed this cafe, the outdoor seating, the service and the food!!"

### Load Reviews into Dataframe and score

In [52]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array(reviews), columns = ['review'])
df

Unnamed: 0,review
0,Very cute coffee shop and restaurant. They hav...
1,Six of us met here for breakfast before our wa...
2,"Great service, lovely location, and really ama..."
3,Great place with delicious food and friendly s...
4,Some of the best Milkshakes me and my daughter...
5,Great food amazing coffee and tea. Short walk ...
6,It was ok. Had coffee with my friends. I'm new...
7,Ricotta hot cakes! These were so yummy. I ate ...
8,We came for brunch twice in our week-long visi...
9,Great staff and food. Must try is the pan fri...


In [53]:
df['review'].iloc[0]

"Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. I definitely enjoyed this cafe, the outdoor seating, the service and the food!!"

In [54]:
def sentiment_score(review):  #enclosing sentiment pipeline in a function, easier to process multiple strings
    tokens = tokenizer.encode(review, return_tensors='pt')  #copied the code from earlier
    result = model(tokens)
    return int(torch.argmax(result.logits))+1  

In [55]:
sentiment_score(df['review'].iloc[0])

5

In [59]:
sentiment_score(df['review'].iloc[6])

3

In [64]:
#performing for all of the reviews in our df at once
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

"""
loops through each one of the reviews in column
[:512] - nlp pipeline is limited as to how many texts/tokens can be passed through it at once. 
so we only take the first 512 tokens from each of the reviews
"""

'\nloops through each one of the reviews in column\n[:512] - nlp pipeline is limited as to how many texts/tokens can be passed through it at once. \nso we only take the first 512 tokens from each of the reviews\n'

In [65]:
df

Unnamed: 0,review,sentiment
0,Very cute coffee shop and restaurant. They hav...,4
1,Six of us met here for breakfast before our wa...,4
2,"Great service, lovely location, and really ama...",5
3,Great place with delicious food and friendly s...,5
4,Some of the best Milkshakes me and my daughter...,5
5,Great food amazing coffee and tea. Short walk ...,5
6,It was ok. Had coffee with my friends. I'm new...,3
7,Ricotta hot cakes! These were so yummy. I ate ...,5
8,We came for brunch twice in our week-long visi...,4
9,Great staff and food. Must try is the pan fri...,5
