# INTRODUCTION 

This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5) or basically 0,1,2,3,4.

This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.

# 1. Install and Import dependencies

In [2]:
from transformers import AutoTokenizer,AutoModelForSequenceClassification
import torch

# 2.Instantiate model 

In [3]:
Model ='nlptown/bert-base-multilingual-uncased-sentiment'
token = AutoTokenizer.from_pretrained(Model)
model = AutoModelForSequenceClassification.from_pretrained(Model)


# 3. Encode & Calculate sentiment

Negative comment 

In [8]:
text = 'My query has been neglected by the support team and I Hate this kind of behaviour'
encoded_input = token.encode(text,return_tensors='pt')#pytorch ,in tf no encode
output = model(encoded_input)
output

SequenceClassifierOutput(loss=None, logits=tensor([[ 1.8480,  1.0716, -0.1187, -1.0971, -1.3592]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [9]:
score = torch.argmax(output.logits)+1
#score = torch.item
score = score.item()
score

1

Neutral comment

In [10]:
text = 'Your performance is of average level'
encoded_input = token.encode(text,return_tensors='pt')
output = model(encoded_input)
output
score = torch.argmax(output.logits)+1
score = score.item()
score

3

Positive 

In [11]:
text = 'You are the best in the class'
encoded_input = token.encode(text,return_tensors='pt')
output = model(encoded_input)
output
score =torch.argmax(output.logits)+1
score = score.item()
score

5

# 4.Load Data frame

In [12]:
import pandas as pd 
import numpy as np

In [13]:
data=pd.read_csv(r'Reviews (1).csv',nrows=500,usecols =['Text','Score'])
data.head()

Unnamed: 0,Score,Text
0,5,I have bought several of the Vitality canned d...
1,1,Product arrived labeled as Jumbo Salted Peanut...
2,4,This is a confection that has been around a fe...
3,2,If you are looking for the secret ingredient i...
4,5,Great taffy at a great price. There was a wid...


In [14]:
data.shape

(500, 2)

In [15]:
data['Score'].value_counts()

5    339
4     70
3     37
1     36
2     18
Name: Score, dtype: int64

In [17]:
#sentiment analysis fucntiion

def sentiment(review):
    token = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
    model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
    encoded_input = token.encode(review,
                                 return_tensors='pt',
                                 padding=True,
                                 truncation=True,
                                 max_length=512)#only 512 token can be sent
    output = model(encoded_input)
    score = torch.argmax(output.logits)+1
    return score.item()

Check

In [28]:
print(f'''this is the sentence that we are going to check--->>{data['Text'].iloc[31]}----- and here its actual score,---->{data['Score'].iloc[31]}''')

this is the sentence that we are going to check--->>This offer is a great price and a great taste, thanks Amazon for selling this product.<br /><br />Staral----- and here its actual score,---->5


In [29]:
sentiment(data['Text'].iloc[31])

5

In [30]:
data['sentiment_score'] = data['Text'].apply(sentiment)

In [31]:
#data.head(13)

In [32]:
data

Unnamed: 0,Score,Text,sentiment_score
0,5,I have bought several of the Vitality canned d...,5
1,1,Product arrived labeled as Jumbo Salted Peanut...,1
2,4,This is a confection that has been around a fe...,5
3,2,If you are looking for the secret ingredient i...,5
4,5,Great taffy at a great price. There was a wid...,5
...,...,...,...
495,5,i rarely eat chips but i saw these and tried t...,5
496,5,This is easily the best potato chip that I hav...,5
497,4,Kettle Chips Spicy Thai potato chips have the ...,5
498,4,"Okay, I should not eat potato chips, nor shoul...",3


In [33]:
from sklearn.metrics import *
accuracy_score(data['sentiment_score'],data['Score'])

0.628

In [41]:
text = 'Popaye used to be a popular show.Adoed by all public'
sentiment(text)

5

In [42]:
sentiment('My system does not work well on large data')

2

# NOTE 

* This is a pretrained model that can be used for sentiment analysis after fine-tuned on a particular dataset. 
* maximum length of token can only be 512
* Regarding accuracy this BERT Models provided around 67% on ENGLISH language but considering +-1 it can be reached to 95 % accuracy.(Its provided on documentation)