# BERT Sentiment analysis tutorial

This tutorial will show you how to use a **pretrained BERT model** from Hugging Face to classify the sentiment of your sentences. BERT is a encoder-only transformer, used commonly for classification tasks. It's relatively small, so you can use it on Colab to make predictions.

**Make sure to use GPU**: go to "Runtime" > "Change runtime type" > "T4 GPU".

In [32]:
#subir los datos de NER, importad los 3 csvs de NER os van a aparecer en la barra izquierda, figura de la carpeta
from google.colab import files
uploaded = files.upload()

Saving conll2003_test.csv to conll2003_test.csv


In [2]:
# Step 1: Install necessary libraries
!pip install transformers torch pandas -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m69.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m51.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [39]:
import pandas as pd
import re

# Cargar el CSV
df_train = pd.read_csv("conll2003_train.csv")
df_val = pd.read_csv("conll2003_validation.csv")
df_test = pd.read_csv("conll2003_test.csv")

def extract_tokens(token_str):
    return re.findall(r"'(.*?)'", token_str)

# para tener los tokens como una frase
df_train["sentence"] = df_train["tokens"].apply(extract_tokens).apply(lambda x: " ".join(x))
df_val["sentence"] = df_val["tokens"].apply(extract_tokens).apply(lambda x: " ".join(x))
df_test["sentence"] = df_test["tokens"].apply(extract_tokens).apply(lambda x: " ".join(x))

In [40]:
print(df_train["sentence"].head(5))


0     EU rejects German call to boycott British lamb .
1                                      Peter Blackburn
2                                  BRUSSELS 1996-08-22
3    The European Commission said on Thursday it di...
4    Germany s"           " veterinary committee We...
Name: sentence, dtype: object


## Step 2: Import libraries and load the pretrained sentiment analyzer

We'll use Hugging Face's pipeline, which takes care of tokenizing your text and making predictions.

In [41]:
from transformers import pipeline
#PARA ESTE BUCLE IR CAMBIANDO EL DATASET QUE USAÍS, TRAIN VALIDACION Y TEST
df = df_train #df_val #df_test
sentiment_analyzer = pipeline("sentiment-analysis")

# Analizar sentimientos en frases completas
sentences = df["sentence"].tolist()
sentiments = []
scores = []

for i in range(0, len(sentences), 100):
    batch = sentences[i:i+100]
    result = sentiment_analyzer(batch)
    sentiments.extend([r["label"] for r in result])
    scores.extend([r["score"] for r in result])

# Añadir columnas nuevas, el sentimiento y el score q es como la confianza q tiene en el sentiment
df["sentiment"] = sentiments
df["sentiment_score"] = scores

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [42]:
df[df["sentiment"] == "NEGATIVE"][["sentence", "sentiment"]].head(5)

Unnamed: 0,sentence,sentiment
3,The European Commission said on Thursday it di...,NEGATIVE
4,"Germany s"" "" veterinary committee We...",NEGATIVE
5,""" We do t"" we do t"" Co...",NEGATIVE
6,He said further scientific study was required ...,NEGATIVE
7,He said a proposal last month by EU Farm Commi...,NEGATIVE


In [43]:
df[df["sentiment"] == "POSITIVE"][["sentence", "sentiment"]].head(9)

Unnamed: 0,sentence,sentiment
0,EU rejects German call to boycott British lamb .,POSITIVE
1,Peter Blackburn,POSITIVE
2,BRUSSELS 1996-08-22,POSITIVE
11,.,POSITIVE
19,"It brought in 4,275 tonnes of British mutton ,...",POSITIVE
21,LONDON 1996-08-22,POSITIVE
25,Buyers also snapped up 16 other items that wer...,POSITIVE
29,BEIJING 1996-08-22,POSITIVE
33,""" Now is the time for the two sides to engage ...",POSITIVE


In [38]:
#PARA GUARDAR, CAMBIAR EL NOMBRE
# train: conll2003_train_SA
# validacion: conll2003_validation_SA
# train: conll2003_test_SAs

df.to_csv("conll2003_train_SA.csv", index=False)