# **Bert Sentiment Score Model : Score range 1 to 5 where 1 : BAD and 5: EXCELLENT**

# **Installing PYTORCH , Transformer, requests, beautifulsoup4, pandas, numpy**

**Transformer** are leveraged to import/install NLP models like BERT. Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.

**Requests** help to request data from the site we are taking reviews for or analysis

**beautifulsoup** help to extract the data we need from the site

In [None]:
!pip install torch torchvision torchaudio

**Always run this below step when restart:**

In [None]:
!pip install transformers requests beautifulsoup4 pandas numpy

#**Importing important libraries**

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

In [None]:
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
# Model
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

# **Checking the model on test sentence:**

In [None]:
# Tis give a encoded string for our sentence:
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')

In [None]:
# Checking the encoded string:
tokens[0]

tensor([  101, 10197, 10140, 12050, 10502, 12296, 10598, 10662, 16197,   119,
        11838,   102])

In [None]:
# Decoding the encoded String:
tokenizer.decode(tokens[0])

'[CLS] it was good but couldve been better. great [SEP]'

In [None]:
# Running the BERT model:
result=model(tokens)

In [None]:
result

# Here highest value is going to represent at what position represent actual sentiment

SequenceClassifierOutput([('logits',
                           tensor([[-2.7768, -1.2353,  1.4419,  1.9804,  0.4584]],
                                  grad_fn=<AddmmBackward>))])

In [None]:
# Sentiment Score

#torch.argmax(result.logits)

#or

int(torch.argmax(result.logits))+1

4

# ***Now Extracting data form Site and performing the Sentiment Analysis***

# **Data Scrapping**

In [None]:
# Setting up the request:
r = requests.get('https://www.imdb.com/title/tt10295212/reviews?ref_=tt_urv')
# Extracting the text data: r gives a response code; r.text gives everthing that wepage comprise off; and setting up the parser:in this case it is HTML
soup = BeautifulSoup(r.text, 'html.parser')
# Extract only Comment from those text: can get from inspect the site your are extracting data
regex = re.compile('.*text show-more__control.*')
# Final Extracted comment data: div or p ,etc can get from inspect the site your are extracting data
results = soup.find_all('div', {'class':regex})
# Extracting on text part not the HTML part:
reviews = [result.text for result in results]

In [None]:
reviews

['So realistic..\nActing is superb....\nMusic ...\nVFX ..\nAction...\nRomance....\nPerfect..',
 "Sidharth Malhotra and Kiara Advani nailed it with their performance. This movie truly portrays the jolly nature of Sir Vikram Batra . Those have given 1 star to this movie are the ones who gave 10 stars to movie like radhe , Tubelight etc LOL . Dont listen to them and just watch it , you will not regret .Acting of Sidharth is way way better than other bollywood actors . Just loved his performance in this movie. He couldn't have done better . Same goes to Kiara Advani , simply amazing .Kudos to the crew and actors who worked in this film and gave justice to the personality of Sir Vikram Batra\nJUST WATCH IT WITHOUT ANY DOUBT.",
 'Every Indian should watch this movie.Independence came at the cost of great soldiers like Captain Vikram Batra.Jai Hind.Proud Of Indian Army.Proud To Be Indian.',
 "No debate, no concerns, no Reviews, just pride.Sidarth Malhotra's portrayal of Capt. Vikram Batra is 

#**Converting above array in Dataframe**

In [None]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [None]:
df.head()

Unnamed: 0,review
0,So realistic..\nActing is superb....\nMusic .....
1,Sidharth Malhotra and Kiara Advani nailed it w...
2,Every Indian should watch this movie.Independe...
3,"No debate, no concerns, no Reviews, just pride..."
4,This film is a must watch and the people calli...


In [None]:
df['review'].iloc[0]

'So realistic..\nActing is superb....\nMusic ...\nVFX ..\nAction...\nRomance....\nPerfect..'

# **Defining Model:**

In [None]:
# Defining the fountion to perform the following steps: Tokenizing, Model, return the score :
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1 

In [None]:
sentiment_score(df['review'].iloc[1])

5

# **Applying it to complete Data**

In [None]:
# Apply the above function on the dataframe to get the score of all the reviews:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512])) # we use 512 because that sum size tensor can take at a time

In [None]:
df

Unnamed: 0,review,sentiment
0,So realistic..\nActing is superb....\nMusic .....,5
1,Sidharth Malhotra and Kiara Advani nailed it w...,5
2,Every Indian should watch this movie.Independe...,5
3,"No debate, no concerns, no Reviews, just pride...",5
4,This film is a must watch and the people calli...,5
5,What a great biopic. Sid nailed the character....,5
6,Biography is super.movie is awesome 😎 must wat...,5
7,What a movie . Fab sid acting\nWorth wait.Must...,5
8,One of the best war film ever made in bollywoo...,5
9,This is new Bollywood..\nAnd the movie is spee...,5


In [None]:
df['sentiment'].unique()

array([5, 4])