# **Sentiment Analysis with BERT**
In this project, reviews are scraped from IMDb site for the movie "Jab Tak Hai Jaan" and rated using BERT

In [2]:
import re
import requests
import pandas as pd
import torch
from bs4 import BeautifulSoup
from transformers import AutoTokenizer, AutoModelForSequenceClassification


# Model loading and testing

In [3]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/669M [00:00<?, ?B/s]

In [4]:
tokens = tokenizer.encode('I love this shit!',return_tensors='pt')

In [5]:
tokens

tensor([[  101,   151, 11157, 10372, 24497, 10123,   106,   102]])

In [6]:
tokenizer.decode(tokens[0])

'[CLS] i love this shit! [SEP]'

In [7]:
res = model(tokens)
res

SequenceClassifierOutput(loss=None, logits=tensor([[-1.4771, -2.2232, -1.2002,  0.6118,  3.5829]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [8]:
torch.argmax(res.logits)

tensor(4)

In [9]:
int(torch.argmax(res.logits)) +  1

5

# Scraping site

In [10]:
header = {'user-agent':
 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'}
url = 'https://www.imdb.com/title/tt2176013/reviews/?ref_=tt_ov_ql_2'
page = requests.get(url,headers=header)
soup = BeautifulSoup(page.text,'html.parser')
result = soup.find_all('div',class_='ipc-html-content-inner-div')
reviews = [result.text for result in content]

In [11]:
len(reviews)

13

In [12]:
print(reviews)

["Yash Chopra's last cinematic outing left me feeling somewhat conflicted, but I would still recommend it as a watch for any fan of Yashji. First I would like to elucidate the moving and effective parts of the film before I go on to the major critique. Firstly, the performances from all of the actors were at least at par with their previous work, and I thought Shah Rukh, in his role as a Major in the Indian army was one of the best of his career. In addition, the cinematography and the songs in the movie were both enchanting and involving. The story is very compelling and (though a touch filmy) gets the viewer involved in the struggle between the characters very well, a feature that is common to most of Yashji's work in any case. On to the negative aspects. While the story is engaging, the character development is somewhat confusing. We get to see the change in Shahrukh over time from a carefree worker in London to a cold and calculating soldier carefully orchestrated over time, and ye

#Storing the reviews in a pandas DataFrame


In [13]:
data = pd.DataFrame(reviews,columns=['Reviews'])

In [14]:
data

Unnamed: 0,Reviews
0,Yash Chopra's last cinematic outing left me fe...
1,"Directed by the late Yash Chopra, the film doe..."
2,Jab Tak Hai Jaan - In a time dominated by Sout...
3,"Amazong plot, great direction and brilliant ac..."
4,The film starts off with Yash Chopra's portrai...
5,A sweet but twisted romantic movie by Yash ji ...
6,We see Major Samar Anand (Shah Rukh Khan) bein...
7,Jab Tak Hai Jaan (Review) - The movie has fina...
8,It is a good sweet love story. I personally ad...
9,Hello.. I watched Jab Tak Hai Jaan.. And I am ...


In [15]:
# star = []
# for r in reviews:
#   tokens = tokenizer.encode(r,return_tensors='pt')
#   res = model(tokens)
#   value = int(torch.argmax(res.logits)) +  1
#   star.append(value)

# Performing Sentiment Analysis and storing result in the dataframe

In [16]:
tokens = tokenizer(reviews, padding=True, truncation=True, return_tensors='pt')

res = model(tokens['input_ids'], attention_mask=tokens['attention_mask'])

star = (torch.argmax(res.logits, dim=1) + 1).tolist()

In [17]:
print(star)

[3, 4, 3, 4, 4, 4, 4, 3, 4, 5, 5, 4, 3]


In [18]:
data['star'] = star

In [19]:
data

Unnamed: 0,Reviews,star
0,Yash Chopra's last cinematic outing left me fe...,3
1,"Directed by the late Yash Chopra, the film doe...",4
2,Jab Tak Hai Jaan - In a time dominated by Sout...,3
3,"Amazong plot, great direction and brilliant ac...",4
4,The film starts off with Yash Chopra's portrai...,4
5,A sweet but twisted romantic movie by Yash ji ...,4
6,We see Major Samar Anand (Shah Rukh Khan) bein...,4
7,Jab Tak Hai Jaan (Review) - The movie has fina...,3
8,It is a good sweet love story. I personally ad...,4
9,Hello.. I watched Jab Tak Hai Jaan.. And I am ...,5
