# Sentiment Classification Using RoBERTa

This notebook replicates a simple RoBERTa sentiment classification demo using a Twitter-finetuned RoBERTa model from CardiffNLP.


## 1) Install dependencies (run once)

> If you're in Google Colab, you can run this cell.
> If you're in a local environment, install with pip in your venv.


In [None]:
!pip install -U transformers torch sentencepiece accelerate bitsandbytes

Collecting torch
  Downloading torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.49.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch)
  Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufft-cu12==11.3.3.83 (from torch)
  Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux20

## 2) Import and build pipeline

We use the Hugging Face `pipeline` for sentiment analysis.


In [1]:
from transformers import pipeline

# RoBERTa fine-tuned for sentiment analysis
roberta_classifier = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest"
)

# Bert

bert_classifier = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest"
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


model.safetensors:   0%|          | 0.00/501M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cuda:0
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [2]:
from google.colab import files
import io

uploaded = files.upload()
sentences = []
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  content = uploaded[fn].decode('utf-8')
  sentences.extend([line.strip() for line in content.splitlines() if line.strip()])

Saving whatsapp_review_kaggle.txt to whatsapp_review_kaggle.txt
User uploaded file "whatsapp_review_kaggle.txt" with length 408478 bytes


In [None]:
#print the first 10 lines
for line in sentences[:10]:
  print(line)

## 3) Run predictions on sample sentences




def run_roberta(sentence):
    results = rober




In [None]:
from google.colab import files
import io
from tqdm.auto import tqdm

def pretty_print(sentences,results):
  for s,r in zip(sentences, results):
    print(f"Text: {s,r}")

def run_roberta(sentences_list, inference_batch_size=32):
    all_results = []
    for i in tqdm(range(0, len(sentences_list), inference_batch_size), desc="Processing with RoBERTa"):
        batch = sentences_list[i:i + inference_batch_size]
        results = roberta_classifier(batch)
        all_results.extend(results)
    return all_results

def run_bert(sentences_list, inference_batch_size=32):
    all_results = []

    for i in tqdm(range(0, len(sentences_list), inference_batch_size), desc="Processing with BERT"): # Changed desc to BERT
        batch = sentences_list[i:i + inference_batch_size]
        results = bert_classifier(batch)
        all_results.extend(results)
    return all_results



if not 'sentences' in locals() or not sentences:
  uploaded = files.upload()
  sentences = []
  for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))
    content = uploaded[fn].decode('utf-8')
    sentences.extend([line.strip() for line in content.splitlines() if line.strip()])


print("Roberta")
results = run_roberta(sentences)
pretty_print(sentences,results)

print("Bert")
results = run_bert(sentences)
pretty_print(sentences,results)