# Sentiment Classification Using RoBERTa

This notebook replicates a simple RoBERTa sentiment classification demo using a Twitter-finetuned RoBERTa model from CardiffNLP.


## 1) Install dependencies (run once)

> If you're in Google Colab, you can run this cell.
> If you're in a local environment, install with pip in your venv.


In [1]:
!pip install -U transformers torch sentencepiece accelerate bitsandbytes



## 2) Import and build pipeline

We use the Hugging Face `pipeline` for sentiment analysis.


In [2]:
from transformers import pipeline

# RoBERTa fine-tuned for sentiment analysis
roberta_classifier = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest"
)

# Bert

bert_classifier = pipeline(
    "sentiment-analysis",
    model="nlptown/bert-base-multilingual-uncased-sentiment"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you e

config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/669M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cuda:0


In [3]:
from google.colab import files
import io

uploaded = files.upload()
sentences = []
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  content = uploaded[fn].decode('utf-8')
  sentences.extend([line.strip() for line in content.splitlines() if line.strip()])

Saving whatsapp_review_kaggle.txt to whatsapp_review_kaggle (1).txt
User uploaded file "whatsapp_review_kaggle (1).txt" with length 408478 bytes


In [4]:
#print the first 10 lines
for line in sentences[:10]:
  print(line)

review_text
Great üëç
plz whats up unban
my contact didn't show on WhatsApp .. for privacy I can't share screenshot please solve it
Can you guys let archived group chats stay archived? I archived it for a reason, so please stop bringing them up again. Thanks.
it is the g.o.a.tüáøüáº
hii sir good morning sorry üòî
my account problem
please please enable that feature in which we can see anyones about statusüôè
whatsapp no work


## 3) Run predictions on sample sentences




def run_roberta(sentence):
    results = rober




In [6]:
from google.colab import files
import io
from tqdm.auto import tqdm
import pandas as pd

def pretty_print(sentences, results, limit=None):
  display_count = 0
  for s,r in zip(sentences, results):
    if limit is not None and display_count >= limit:
      break
    print(f"Text: {s,r}")
    display_count += 1

def run_roberta(sentences_list, inference_batch_size=64):
    all_results = []
    for i in tqdm(range(0, len(sentences_list), inference_batch_size), desc="Processing with RoBERTa"):
        batch = sentences_list[i:i + inference_batch_size]
        results = roberta_classifier(batch)
        all_results.extend(results)
    return all_results

def run_bert(sentences_list, inference_batch_size=64):
    all_results = []

    for i in tqdm(range(0, len(sentences_list), inference_batch_size), desc="Processing with BERT"):
        batch = sentences_list[i:i + inference_batch_size]
        results = bert_classifier(batch)
        all_results.extend(results)
    return all_results



if not 'sentences' in locals() or not sentences:
  uploaded = files.upload()
  sentences = []
  for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))
    content = uploaded[fn].decode('utf-8')
    sentences.extend([line.strip() for line in content.splitlines() if line.strip()])


print("Roberta")
roberta_full_results = run_roberta(sentences)
pretty_print(sentences, roberta_full_results, limit=10)

print("\nBert")
bert_full_results = run_bert(sentences)
pretty_print(sentences, bert_full_results, limit=10)

Roberta


Processing with RoBERTa:   0%|          | 0/85 [00:00<?, ?it/s]

Text: ('review_text', {'label': 'neutral', 'score': 0.7485136389732361})
Text: ('Great üëç', {'label': 'positive', 'score': 0.961408793926239})
Text: ('plz whats up unban', {'label': 'neutral', 'score': 0.6636157631874084})
Text: ("my contact didn't show on WhatsApp .. for privacy I can't share screenshot please solve it", {'label': 'negative', 'score': 0.7700310349464417})
Text: ('Can you guys let archived group chats stay archived? I archived it for a reason, so please stop bringing them up again. Thanks.', {'label': 'neutral', 'score': 0.6145990490913391})
Text: ('it is the g.o.a.tüáøüáº', {'label': 'neutral', 'score': 0.6894740462303162})
Text: ('hii sir good morning sorry üòî', {'label': 'negative', 'score': 0.5573516488075256})
Text: ('my account problem', {'label': 'negative', 'score': 0.7386732697486877})
Text: ('please please enable that feature in which we can see anyones about statusüôè', {'label': 'neutral', 'score': 0.734866201877594})
Text: ('whatsapp no work', {'lab

Processing with BERT:   0%|          | 0/85 [00:00<?, ?it/s]

Text: ('review_text', {'label': '4 stars', 'score': 0.3020024597644806})
Text: ('Great üëç', {'label': '5 stars', 'score': 0.8034508228302002})
Text: ('plz whats up unban', {'label': '1 star', 'score': 0.24202315509319305})
Text: ("my contact didn't show on WhatsApp .. for privacy I can't share screenshot please solve it", {'label': '1 star', 'score': 0.40457263588905334})
Text: ('Can you guys let archived group chats stay archived? I archived it for a reason, so please stop bringing them up again. Thanks.', {'label': '1 star', 'score': 0.35069024562835693})
Text: ('it is the g.o.a.tüáøüáº', {'label': '5 stars', 'score': 0.40955233573913574})
Text: ('hii sir good morning sorry üòî', {'label': '1 star', 'score': 0.2778264284133911})
Text: ('my account problem', {'label': '1 star', 'score': 0.3829849064350128})
Text: ('please please enable that feature in which we can see anyones about statusüôè', {'label': '5 stars', 'score': 0.32328861951828003})
Text: ('whatsapp no work', {'label