In [None]:
!pip3 install torch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**Step 1: Install and import Dependencies**

In [None]:
!pip3 install transformers requests beautifulsoup4 pandas numpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.1-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 7.3 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 50.6 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 12.3 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.8.1 tokenizers-0.12.1 transformers-4.21.1



*   tranformers package is used for NLP model --> BERT multilingual BERT model that performs sentiment anlysis --> from HuggingFace
*   requests package is going to request review from the YELP site
*   pandas --> going to format data that is easy to work with
*   numpy --> going to give some additional data transformers processes 


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re



*   AutoTokenizer --> convert string into sequence of numbers and pass to an NLP model 
*   AutoModelForSequenceClassification --> give architecture to load NLP model

*   toch --> use argmax function from torch [highest sequence result] 
*   BeautifulSoup--> extract data we actually need --> we need reviews 


*   re--> regex --> extract the specific comments that we want.






**Step 2: Instantiate Model**

In [None]:
tokenizer=AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model=AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/638M [00:00<?, ?B/s]



*   from_pretrained --> means load previously trained model




**Step 3: Encode and calculate statement**

In [None]:
tokens=tokenizer.encode('I hated this, absolutely the worst',return_tensors='pt')
#tokenizer.decode(tokens[0])
result=model(tokens)
#result.logits



*   pass the string to the tokenizer and then to model and do classification
*   tokenizer.encode() --> string is converted into a sequence of numbers
1.   tokenizer.decode() --> string is converted back into its original form
2.   return_tensors() --> sets the value to pyTorch


*  To perform sentiment analysis --> pass the tokens to model
*  Output of the result : -- output from the model is a one hot encoded list of scores.  The position with the highest score represents the sentiment rating eg: [.9,.2,.1,-.2,-.5] is a rating of 1.

*   output of result.logits = tensor([[ 4.8750,  1.7880, -0.8356, -3.0027, -2.0727]],
       grad_fn=<AddmmBackward0>)
*   so here the highest score is 4.8750 with index=0 so value of sentiment is 1 




In [None]:
int(torch.argmax(result.logits))+1

1



1.   torch.argmax --> returns the highest value of the result --> that can be used to determine the score of the sentiment
2.   torch.argmax()+1 --> index starts from 0 


1.   Output value of 1 means sentiment score is 1. 
2.   output of the sentiment score is between 1 to 5 (0 to 4)

*   Higher the number --> better the sentiment
*   Lower the number --> Worst the sentiment







**Step 4: Collect Reviews**

In [None]:
r=requests.get('https://www.yelp.com/biz/mejico-sydney-2')
soup=BeautifulSoup(r.text,'html.parser')
regex=re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews=[result.text for result in results]
#r.text
#soup
#results



*   First make a request to the site Yelp using request library . Each comment is stored in a file starting with the word 'comment'. Use regex to extract all of this classes out. 
*   List item





*   requests.get is using request library to go ahead and grap our web page, from that we get a response code
*   r.text--> will get the text out of that word page
*   r.text will be passed to BeautifulSoup and set the parser to HTML parser 
* Next, extracting specific components we want from this web page.  we are looking for anything with "comment" using regex class. Because each class--> in <p> tag of the r.text file starts with comment. for eg <p class='comment'>
* passing regex through beautiful soup
*   soup.find_all --> will match all the tags that match specific formating. so we are looking for paragraph <p> tag with class that matches anything with regex. regex has comment. wrapped in a p tag has a class of comment. 
*  output of results has All the reviews wrapped in HTML <p> tag so we need to get only the text part 
*   results[0].text --> just give me the text from the tag <p>
*   SO we take all the text and store in the list
*   reviews--> has only the list of reviews ie text with no <p> tag


*   Scraping w/BS4 --> BeautifulSoup allows you to scrape just about anything. 
*   Pay attention to the elements you're trying to extract if using a different site.










**Step 5: Load Reviews into Data Frame and score**

In [None]:
import pandas as pd
import numpy as np
df=pd.DataFrame(np.array(reviews),columns=['review'])
#df.head()
#df.tail()
df['review'].iloc[0]

'The food is fresh and tasty. \xa0The scallop ceviche started the lunch. The scallops were tender with a great acidity and use of mango and peppers. The steak was tender and I got the hint of tequila in the sauce. I enjoyed a watermelon salad that complimented the the steak. The portions are good, but a stretch if you are sharing. My only down point is the service. They really only showed up to present my next plate and never checked to see if I wanted another drink (which I did).Enjoyed the food.'



*   Data Frame is easy to go through and process the review
*   reviews are converted into np array because pands work with array.


*   columns='review' --> it will specify what our column name will be called 
*   df.head() --> review our first five rows 


*   df.tail() --> review our last five rows
*   df['review'].iloc[0] -->it will grab each review

*   Pass all the review to our model to get sentiment result to do that create a function









In [None]:
def sentiment_score(review):
  tokens=tokenizer.encode(review,return_tensors='pt')
  result=model(tokens)
  return int(torch.argmax(result.logits))+1



*   Sentiment Function : Encapsulating the sentiment pipeline in a function makes it easier to process multiple strings. In a second, we we'll use it for each review in the DataFrame.


*   use apply(lambda) function to run the sentiment_Score function on each review and store value of each review in a dataframe.
*   List item





In [None]:
sentiment_score(df['review'].iloc[0])   #running the function on only one review at a time 

4

In [None]:
df['sentiment']=df['review'].apply(lambda x: sentiment_score(x[:512]))



*   df['review'] --> allows us to extract our review column. 
*   apply(lambda) --> allows us to loop through within each review of the column 
*   (lambda x) variable x will work with each individual review. sentiment_score function is passed through x.
*   x[:512] --> NLP is limited at how much text/ token to pass through at each particular time and it is limited to 512 tokens. so here we grab only first 512 token from each review.





In [None]:
df                   #sentiment score for each review fully calculated. 

Unnamed: 0,review,sentiment
0,The food is fresh and tasty. The scallop cevi...,4
1,Don't come here expecting legit Mexican food b...,3
2,Out of all the restaurants that I tried in Syd...,5
3,We came here on a Thursday night @ 5pm and by ...,4
4,I was pleasantly surprised at what a great job...,5
5,Have been here twice and have absolutely loved...,5
6,Really nice (upmarket) Mexican restaurant. Goo...,4
7,If you're looking for a quiet little romantic ...,2
8,The service at this place was top notch - the ...,5
9,Ordered feed me for $59 along with that.. Food...,2
