1. Install and Import Dependencies

In [2]:
!pip install transformers requests beautifulsoup4 pandas numpy

Defaulting to user installation because normal site-packages is not writeable
Collecting transformers
  Using cached transformers-4.42.3-py3-none-any.whl (9.3 MB)
Collecting huggingface-hub<1.0,>=0.23.2
  Downloading huggingface_hub-0.23.4-py3-none-any.whl (402 kB)
     -------------------------------------- 402.6/402.6 kB 1.0 MB/s eta 0:00:00
Collecting regex!=2019.12.17
  Downloading regex-2024.5.15-cp311-cp311-win_amd64.whl (268 kB)
     -------------------------------------- 269.0/269.0 kB 2.8 MB/s eta 0:00:00
Collecting safetensors>=0.4.1
  Downloading safetensors-0.4.3-cp311-none-win_amd64.whl (287 kB)
     ------------------------------------- 287.3/287.3 kB 17.3 MB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19
  Downloading tokenizers-0.19.1-cp311-none-win_amd64.whl (2.2 MB)
     ---------------------------------------- 2.2/2.2 MB 2.1 MB/s eta 0:00:00
Installing collected packages: safetensors, regex, huggingface-hub, tokenizers, transformers
Successfully installed huggingfac



In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4  import BeautifulSoup
import re

2. Instantiate Model

In [2]:
tokenizer=AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model=AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

3. Encode and Calculate Sentiment

In [4]:
tokens=tokenizer.encode("This is very bad, there still might be a chance",return_tensors='pt')


In [5]:
results=model(tokens)

In [6]:
results
#The values we need to look for is the logits=tensor(...) where it contains 5 (0 to 4) values. These values represent the rating system from 1 to 5 (0 to 4) and the largest value will be the correct rating.

SequenceClassifierOutput(loss=None, logits=tensor([[ 3.0261,  2.2586,  0.7360, -1.9903, -3.2110]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [7]:
results.logits

tensor([[ 3.0261,  2.2586,  0.7360, -1.9903, -3.2110]],
       grad_fn=<AddmmBackward0>)

In [8]:
int(torch.argmax(results.logits))+1
# here the rating is 1 being the worst and 5 being the best

1

4. Collect Reviews

In [9]:
r=requests.get('https://www.yelp.com/biz/mejico-sydney-2')
soup=BeautifulSoup(r.text,'html.parser')
regex=re.compile(".*comment.*")
results=soup.find_all('p',{'class':regex})
reviews=[result.text for result in results]

In [10]:
results

[<p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">Seated without a booking on a super busy Saturday night. Lovely, warm, and Theo right hostess also looked after our table and went out of her way to give detailed ingredients in every dish to avoid allergies for one of us. And the food was great! Guacamole made right at our table, everything prepared with our allergies in mind, and great dish recommendations. We'd been visiting Sydney for about a week from Melbourne, and this was by far our best dining experience. I'd definitely return here in the future.</span></p>,
 <p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">The food was decent not great..  We had the guacamole which was bland and came with some type of plantain chips.. The chicken and steak tacos were good.. But the service was poor. We had a waitress with an attitude. She seemed upset whenever we asked for anything.  She would walk by and just stick 

In [11]:
results[0]

<p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">Seated without a booking on a super busy Saturday night. Lovely, warm, and Theo right hostess also looked after our table and went out of her way to give detailed ingredients in every dish to avoid allergies for one of us. And the food was great! Guacamole made right at our table, everything prepared with our allergies in mind, and great dish recommendations. We'd been visiting Sydney for about a week from Melbourne, and this was by far our best dining experience. I'd definitely return here in the future.</span></p>

5. Load Reviews into Dataframe and Score

In [12]:
import numpy as np
import pandas as pd

In [13]:
df=pd.DataFrame(np.array(reviews),columns=['review'])

In [18]:
df.head()

Unnamed: 0,review
0,Seated without a booking on a super busy Satur...
1,The food was decent not great.. We had the gu...
2,"Food was okay, guacamole was below average. Se..."
3,The food and service here was really good. It...
4,Visiting from Texas and decided to give this r...


In [19]:
df.tail()

Unnamed: 0,review
5,Don't come here expecting legit Mexican food b...
6,Out of all the restaurants that I tried in Syd...
7,"Great atmosphere, attentive service, solid mar..."
8,We came here on a Thursday night @ 5pm and by ...
9,The food is fresh and tasty. The scallop cevi...


In [23]:
df['review']

0    Seated without a booking on a super busy Satur...
1    The food was decent not great..  We had the gu...
2    Food was okay, guacamole was below average. Se...
3    The food and service here was really good.  It...
4    Visiting from Texas and decided to give this r...
5    Don't come here expecting legit Mexican food b...
6    Out of all the restaurants that I tried in Syd...
7    Great atmosphere, attentive service, solid mar...
8    We came here on a Thursday night @ 5pm and by ...
9    The food is fresh and tasty.  The scallop cevi...
Name: review, dtype: object

In [14]:
df['review'].iloc[0]

"Seated without a booking on a super busy Saturday night. Lovely, warm, and Theo right hostess also looked after our table and went out of her way to give detailed ingredients in every dish to avoid allergies for one of us. And the food was great! Guacamole made right at our table, everything prepared with our allergies in mind, and great dish recommendations. We'd been visiting Sydney for about a week from Melbourne, and this was by far our best dining experience. I'd definitely return here in the future."

In [15]:
def sentiment_score(review):
    tokens=tokenizer.encode(review,return_tensors='pt')
    results=model(tokens)
    return int(torch.argmax(results.logits))+1


In [20]:
sentiment_score(df['review'].iloc[3])

5

In [22]:
sentiment_score(df['review'].iloc[5])

3

In [24]:
df['sentiment']=df['review'].apply(lambda x:sentiment_score(x[:512]))

In [25]:
df

Unnamed: 0,review,sentiment
0,Seated without a booking on a super busy Satur...,5
1,The food was decent not great.. We had the gu...,2
2,"Food was okay, guacamole was below average. Se...",2
3,The food and service here was really good. It...,5
4,Visiting from Texas and decided to give this r...,5
5,Don't come here expecting legit Mexican food b...,3
6,Out of all the restaurants that I tried in Syd...,5
7,"Great atmosphere, attentive service, solid mar...",3
8,We came here on a Thursday night @ 5pm and by ...,4
9,The food is fresh and tasty. The scallop cevi...,4


6. Trying this in another business

In [27]:
r=requests.get('https://www.yelp.com/biz/mejico-sydney-2')
soup=BeautifulSoup(r.text,'html.parser')
regex=re.compile(".*comment.*")
results=soup.find_all('p',{'class':regex})
reviews=[result.text for result in results]

In [28]:
results

[<p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">Good Baklava's primarily because it's pistachio based meaning Turkish as opposed to walnuts meaning Greek. I like both but I'm partial to the Turkish ones because the Greek ones often swim in honey syrup and it's just  too much for me.</span></p>,
 <p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">Ordered a beef kebab from here. The kebab was over stuffed with beef and tasted very good. I really liked the flavor of the bbq sauce. The only downside was the bread was a bit soggy from all the tomatoes and bbq sauce and was falling apart.  The service could also be a little friendlier. They weren't very happy answer questions about their food.  They also server other items like gozleme, pizza, and pide.</span></p>,
 <p class="comment__09f24__D0cxf y-css-h9c2fl"><span class="raw__09f24__T4Ezm" lang="en">The Kebab here is the best. I love it. Having extra cheese and 

In [29]:
df=pd.DataFrame(np.array(reviews),columns=['review'])

In [30]:
df['review']

0    Good Baklava's primarily because it's pistachi...
1    Ordered a beef kebab from here. The kebab was ...
2    The Kebab here is the best. I love it. Having ...
3    I discovered this food court while waiting to ...
4    What kind of food court exists without a decen...
5    This is gonna sound crazy. We went to Sydney i...
6    Found this place after a flight to Sydney and ...
Name: review, dtype: object

In [31]:
df['sentiment']=df['review'].apply(lambda x:sentiment_score(x[:512]))

In [32]:
df

Unnamed: 0,review,sentiment
0,Good Baklava's primarily because it's pistachi...,4
1,Ordered a beef kebab from here. The kebab was ...,4
2,The Kebab here is the best. I love it. Having ...,5
3,I discovered this food court while waiting to ...,2
4,What kind of food court exists without a decen...,4
5,This is gonna sound crazy. We went to Sydney i...,1
6,Found this place after a flight to Sydney and ...,5
