<a href="https://colab.research.google.com/github/Juanba98/SentimentAnalysisBERT/blob/main/sentimentAnalyisis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis with [BERT](https://arxiv.org/pdf/1810.04805.pdf) (Pre-training of Deep Bidirectional Transformers for Language Understanding)**


# **1. Install and Import Dependencies** 

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)
[K     |████████████████████████████████| 127 kB 18.1 MB/s 
[?25hInstalling collected packages: urllib3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.26.12
    Uninstalling urllib3-1.26.12:
      Successfully uninstalled urllib3-1.26.12
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
selenium 4.6.0 requires urllib3[socks]~=1.26, but you have urllib3 1.25.11 which is incompatible.[0m
Successfully installed urllib3-1.25.11


In [None]:
!apt install chromium-chromedriver
!pip install selenium

Reading package lists... Done
Building dependency tree       
Reading state information... Done
chromium-chromedriver is already the newest version (105.0.5195.102-0ubuntu0.18.04.1).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting urllib3[socks]~=1.26
  Using cached urllib3-1.26.12-py2.py3-none-any.whl (140 kB)
Installing collected packages: urllib3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.25.11
    Uninstalling urllib3-1.25.11:
      Successfully uninstalled urllib3-1.25.11
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
requests 2.23.0 requires urllib3!=1.25.0,!=1.2

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup #to extract our data
import re
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options



# **2. Instantiate  Model**

In [None]:
#https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

# **3. Encode and Calculate Sentiment**

In [None]:
tokens = tokenizer.encode("Not bad", return_tensors='pt')
print(f'Tokens: {tokens}')
print(f'Decode: {tokenizer.decode(tokens[0])}')

Tokens: tensor([[  101, 10497, 12428,   102]])
Decode: [CLS] not bad [SEP]


This model predicts the sentiment of the review as a number of stars (between 1 and 5).

In [None]:
result = model(tokens)
print(result.logits)
print(f'Number of stars: {int(torch.argmax(result.logits))+1}')

tensor([[-2.2001, -0.5841,  2.0563,  1.3172, -0.4261]],
       grad_fn=<AddmmBackward0>)
Number of stars: 3


# **4. Collect Reviews** 

In [None]:
chrome_options = Options()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=chrome_options)

url = "https://www.tripadvisor.es/Restaurant_Review-g187438-d21273712-Reviews-La_Caravana-Malaga_Costa_del_Sol_Province_of_Malaga_Andalucia.html"
driver.get(url)


time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
regex = re.compile('partial_entry')
results = soup.find_all('p', {'class': regex})
reviews = [result.text for result in results]

In [None]:
reviews

# **5. Load Review into DataFrame and Score** 

In [10]:
import numpy as np
import pandas as pd

In [11]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [12]:
df['review'].iloc[0]

'¡Si vienes a Málaga no te lo puedes perder!\nGrandísimo descubrimiento, es uno de los mejores sitios de la ciudad para comer. Cocina riquísima y de alta calidad. Además el trato es excelente, Álvaro y los chicos te harán sentir como en casa.\n¡Nos vemos...en la Caravana!Más'

In [13]:
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [15]:
#512 limit of tokens for the pipeline
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

In [16]:
df

Unnamed: 0,review,sentiment
0,¡Si vienes a Málaga no te lo puedes perder!\nG...,5
1,Llevo comiendo aquí desde los primeros días qu...,5
2,Soy muy fan de la mezcla de sabores que ofrece...,5
3,"Nos ha encantado descubrir platos nuevos, con ...",5
4,Almorzamos con un grupo de 12 personas despues...,5
5,"Pasamos una velada expléndida, la comida espec...",5
6,Ambiente especial y distinto... Gente de calle...,4
7,Después del campus pasé y tome una merienda qu...,5
8,Ese sabor no lo encuentras en otros lugares de...,5
9,Maravillosa ATENCION y la comida excelente. \n...,5
