<a href="https://colab.research.google.com/github/MarinKodKode/MasterDegree-AI/blob/main/TSA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Sentiment Analysis through Fine-Tuning Transformer Models

*   Texas University
*  Head Engineer. Manuel Alejandro Hernandez Marin



# Introduction

 Sentiment analysis is one of the fundamental tasks within Natural Language Processing (NLP), as it allows the emotional polarity of written texts to be identified and has direct applications in areas such as opinion analysis, social media, customer service, and recommendation systems. Traditionally, this task has been approached using methods based on manual features and classical machine learning models.

However, in recent years, models based on Transformer architectures have demonstrated significantly superior performance by incorporating deep contextual representations and attention mechanisms. These models, being pre-trained on large volumes of text, can be efficiently adapted to specific tasks through fine-tuning processes.

In this work, a sentiment analysis model is implemented using the HuggingFace Transformers library, applying fine-tuning on a pre-trained DistilBERT model and using the SST-2 dataset from the GLUE benchmark. This approach allows us to evaluate the impact of modern NLP techniques and contrast them with traditional sentiment analysis approaches.

#Justification

The approach proposed in the original group task of the activity provides an adequate introduction to sentiment analysis and is useful as a first conceptual approach. However, this approach is based on techniques and tools that have been largely superseded by more recent models, especially in deep semantic understanding tasks.

In contrast, the use of fine-tuned Transformer models allows for the use of pre-trained linguistic representations, reducing the need for manual feature engineering and significantly improving model performance. DistilBERT, in particular, offers an adequate balance between accuracy and computational efficiency, making it a practical choice for academic environments.

Furthermore, implementation with HuggingFace Transformers facilitates experiment reproducibility, access to standardized datasets, and evaluation using widely accepted metrics such as accuracy and F1-score. For these reasons, this methodology was chosen with the aim of applying an up-to-date approach, aligned with current industry practices and research in Natural Language Processing.

The dataset used has an English-based lexicon, due to the robustness that the model offers compared to other languages, such as Spanish.

In [26]:

# Library installation
# Need HuggingFace dataset

!pip install -q transformers datasets evaluate accelerate

In [27]:

# load dataset (SST-2)

from datasets import load_dataset

dataset = load_dataset("glue", "sst2")
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

In [28]:

# Load tokenizer

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

In [29]:
# Define tokenization function

def tokenize_function(batch) :
  return tokenizer(
      batch["sentence"],
      padding="max_length",
      truncation=True
  )

In [30]:
# Apply function to dataset

tokenized_dataset = dataset.map(tokenize_function, batched=True)
tokenized_dataset

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 1821
    })
})

In [31]:

# Load model

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [32]:

# Set metrics and evaluating function

import evaluate
import numpy as np

accuracy = evaluate.load("accuracy")
f1=evaluate.load("f1")

def compute_metrics(eval_pred) :
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=1)
  return {
      "accuracy" : accuracy.compute(predictions=predictions, references=labels)["accuracy"],
      "f1" : f1.compute(predictions=predictions, references=labels)["f1"]
  }

In [33]:

# Define training params

from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir="./logs",
    report_to="none"
)

The following parameters are defined with the respective considerations:


* **low learning rate**: ensuring stable fine-tuning
* **2 epochs**: prevents overfitting
* **batch size 16**: standard in BERT.



In [23]:
# Run trainer

from transformers import Trainer
trainer = Trainer (
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    compute_metrics=compute_metrics
)

# Execute trainer
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.1832,0.316708,0.904817,0.905575
2,0.128,0.354539,0.904817,0.908287


TrainOutput(global_step=8420, training_loss=0.17690961751688689, metrics={'train_runtime': 5944.3665, 'train_samples_per_second': 22.66, 'train_steps_per_second': 1.416, 'total_flos': 1.7843093664165888e+16, 'train_loss': 0.17690961751688689, 'epoch': 2.0})

In [None]:
trainer.save_model("./sentiment_model")
tokenizer.save_pretrained("./sentiment_model")

In [25]:
# Inference test with real context

import torch

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    prediction = torch.argmax(outputs.logits, dim=-1).item()
    return "POSITIVE" if prediction == 1 else "NEGATIVE"

examples = [
    "This movie was absolutely amazing, I loved every minute of it.",
    "The film was boring and a complete waste of time.",
    "The acting was okay, but the story was very weak.",
    "I would definitely recommend this movie to my friends.",
    "This was the worst movie I have seen in years."
]

for text in examples:
    print(f"{text} -> {predict_sentiment(text)}")

This movie was absolutely amazing, I loved every minute of it. -> POSITIVE
The film was boring and a complete waste of time. -> NEGATIVE
The acting was okay, but the story was very weak. -> NEGATIVE
I would definitely recommend this movie to my friends. -> POSITIVE
This was the worst movie I have seen in years. -> NEGATIVE


# Conclusions

The fine-tuned DistilBERT model achieved strong performance on the SST-2 sentiment classification task, reaching an accuracy of approximately 90% and an F1-score above 0.90 on the validation set. These results demonstrate the effectiveness of transformer-based models for sentiment analysis, even when using a relatively compact architecture such as DistilBERT.

During training, a decrease in training loss was observed across epochs, indicating that the model successfully learned the underlying patterns in the data. A slight increase in validation loss in the second epoch suggests mild overfitting, highlighting the importance of monitoring validation metrics and applying early stopping when necessary.

The use of GPU acceleration significantly reduced training time and enabled efficient experimentation. Overall, this project confirms that fine-tuning pre-trained transformer models is a powerful and practical approach for sentiment analysis tasks, providing high performance with manageable computational cost.