#### **Install Dependencies**

**Install Pytorch**: Go to `pytorch.org`, configure your settings (in my case, I choose- stable,windows,pip,python,cpu), copy the run command and run it in a python shell.

In [2]:
!pip3 install torch torchvision torchaudio
# !pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118     # gpu version

**Install Other Dependencies**

In [4]:
#pip install transformers requests beautifulsoup4 pandas numpy

* `transformers`: transformer is the library of HuggingFace, from where we will load the model "bert-base-multilingual-uncased-sentiment" in order to caculate sentiment score.
> `bert-base-multilingual-uncased-sentiment`: This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5). [`For more details click here`](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)
* `requests`: is for sending request to a website for extracting data from the website
* `beautifulsoup`: to fetch the required data from the site
* `pandas`: to represent the data to dataframe for better outlooking
* `numpy`: to convert data into numeric for further use in model 

In [6]:
# Import modules
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

#### **Instantiate Model**

We will now instantiate the tokenizer and the model and load its weights. For the first time of loading- it will download the model, about 669MB, that will take a few minutes depending your internet connection.

In [7]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model     = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading (…)okenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 2.05kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading (…)lve/main/config.json: 100%|██████████| 953/953 [00:00<00:00, 95.3kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 981kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 16.0kB/s]
Downloading pytorch_model.bin: 100%|██████████| 669M/669M [01:15<00:00, 8.90MB/s] 


#### **Encode and Calculate Sentiment Score**

**Encode** 

`Encode` converts the sentence tokens into numeric representation to fit the sentence to the model as models only receive numeric data. And then the model will calculate the sentiment score according to its previous knowledge (as it is a pretrained model). We can also `Decode` the converted tensor to its original sentence form.

In [12]:
tokens = tokenizer.encode("He could not do it to me, never!", return_tensors='pt')
tokens

tensor([[  101, 10191, 12296, 10497, 10154, 10197, 10114, 10525,   117, 13362,
           106,   102]])

`return_tensors='pt'` is a parameter used in the Hugging Face transformer library. It is used to specify that the tokenizer should return PyTorch tensors instead of a list of Python integers.

The sentence has 10 tokens and the tokenizer encoded them into numerical values. The first and last tensor value is for start and end tokens respectively.

**Check `Decode`**

In [14]:
print(tokenizer.decode(tokens[0]))

[CLS] he could not do it to me, never! [SEP]


**Calculate Sentiment Score**

Before that, we have to understand the scoring system to the model. The model scores a sentence from (1 to 5). `1=negetive` and `5=positive` and the values between (1 and 5) refers the intensity of negetivity to positivity. We can consider `3=nutral`. It rates a sentence 5 different values. The highest value is considered as the sentiment score.

`Why 5 Scores?` It basically use a softmax classifier to score all the 5 classes. Softmax calculates the probability of classes one by one depending on the conditional probability of other classes.

`Conditional Probability`: P(A/B) = Probability of A given that B is already happened. For example: P(fiver/sick) = Probability of fiver given that he is sick.

In [15]:
probability_scores = model(tokens)
probability_scores

SequenceClassifierOutput(loss=None, logits=tensor([[ 2.6982,  1.5826, -0.0619, -1.9971, -1.7371]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

`Here, we only need the logits`

In [19]:
print(probability_scores.logits)

tensor([[ 2.6982,  1.5826, -0.0619, -1.9971, -1.7371]],
       grad_fn=<AddmmBackward0>)


`To see which rating has the highest value:`

In [22]:
print(int(torch.argmax(probability_scores.logits) + 1))
# Here,
# `torch.argmax()` - returns the index of highest score. In this case it is 0 (index=0, since the indexing of any datastructures starts from 0)
# `+ 1` - makes the index start from 1 (since we have the ratings from 1 to 5)
# `int` - int converts the tensor to just numeric value.

1


`So, it is a negetive sentiment`

**Check other Sentences**

In [24]:
sentence1 = "I just loved the documentation"
tokens = tokenizer.encode(sentence1, return_tensors='pt')
probability_scores = model(tokens)
print(f"Probabilitity Scores: {probability_scores.logits}")
print(f"Ratings: {int(torch.argmax(probability_scores.logits)+1)}")

Probabilitity Scores: tensor([[-2.0370, -1.4861,  0.1390,  1.3571,  1.5039]],
       grad_fn=<AddmmBackward0>)
Ratings: 5


In [25]:
sentence1 = "There is a possibility but their attitude might destroy it."
tokens = tokenizer.encode(sentence1, return_tensors='pt')
probability_scores = model(tokens)
print(f"Probabilitity Scores: {probability_scores.logits}")
print(f"Ratings: {int(torch.argmax(probability_scores.logits)+1)}")

Probabilitity Scores: tensor([[-0.1769,  0.8476,  1.2969,  0.1629, -1.7914]],
       grad_fn=<AddmmBackward0>)
Ratings: 3


#### **Collect Data form Online**

#### **Load Data into DataFrame**

#### **Calculate Sentiment Score**