## Fine-Tuning BERT Model



### (The Hugging Face transformers packgage is an immensely popular Python library providing pretrained models that are extraordinarily useful for a variety of Natural Language Processing (NLP) tasks.)
### Note: https://huggingface.com/models provides a selection of pre-trained models tha can be used to quickly build prediction models for various NLP tasks. This demo uses the ProsusAI/finbert model.

# Step 1: Install the required libraries.

In [None]:
!pip install transformers
!pip install tensorflow



## This code cell invokes the commands that are intended to be run in a Jupyter or Colab notebook and use a leading exclamation mark (!) to invoke system shell comannds within the notebook environment. These libraries are installed at the start to ensure that all necessary dependencies are present before proceeding to uploading the dataset.

##

# Step 2: Setup and Data Prep.

In [None]:
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from google.colab import files
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

# Ensure GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU not available, using CPU.")

print("\n--- Loading and Preprocessing Data ---")
# Upload CSV file
uploaded = files.upload()

# Load dataset file
df = pd.read_csv('Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv')

# Download VADER lexicon for sentiment analysis
nltk.download('vader_lexicon')

# Initialize VADER sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Function to convert VADER compound score to label
def vader_label(score):
  if score >= 0.05:
      return 2 # Positive
  elif score <= -0.05:
      return 0 # Begative
  else:
      return 1 # Neutral

# Apply VADER to each headline to generate rough labels
df['sentiment_score'] = df['Headlines'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
df['label'] = df['sentiment_score'].apply(vader_label)

print("Sample of headlines with generated labels:")
print(df[['Headlines', 'sentiment_score', 'label']].head())

# Convert to Huggingface Dataset format using the generated labels
texts = df['Headlines'].tolist()
labels = df['label'].tolist()
dataset = Dataset.from_dict({"text": texts, "label": labels})

# Select a subset for training and evaluation
train_data = dataset.select(range(2000))
eval_data = dataset.select(range(500))

print("Loaded dataset with {} training and {} evaluation samples.".format(len(train_data), len(eval_data)))

Using GPU: Tesla T4

--- Loading and Preprocessing Data ---


Saving Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv to Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


Sample of headlines with generated labels:
                                           Headlines  sentiment_score  label
0                                Miranda's doctrine.              0.0      1
1    US equity firm looks to more investments in PH.              0.0      1
2  Nickel Asia Corp Announces its Notice of Annua...              0.0      1
3                    DoF reconvenes the Green Force.              0.0      1
4                  Cebu MSMEs get training from DTI.              0.0      1
Loaded dataset with 2000 training and 500 evaluation samples.


## This code cell begins with the importation of essential libraries for NLP and ML, that includes PyTorch for model handling, Huggingface Transformers and Datasets for loading pretrained models and datasets, and NLTK's VADER tool for sentiment scoring. It also checks for GPU or CPU availability to accelerate computations and training time. The dataset was uploaded and loaded into a pandas DataFrame, then using VADER sentiment analyzer for sentiment scoring. The dataset then was converted into Huggingface's Dataset format for preparation for machine learning training pipelines, which was split into training and evaluation subsets, ready for fine-tuing models like BERT.

# Step 3: Load tokenization and the pre-trained model.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Initialize Tokenizer (the pre-processing tool)
MODEL_NAME = "ProsusAI/finbert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    # This prepares the text and converts it into numerical IDs that the model understands
    return tokenizer(examples["text"], truncation=True, padding=True)

# Apply tokenization to the training and evaluating sets
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_eval = eval_data.map(tokenize_function, batched=True)

# Remove the 'label' column to 'labels' for the Hugging Face Trainer
tokenized_train = tokenized_train.rename_column("label", "labels")
tokenized_eval = tokenized_eval.rename_column("label", "labels")

# Set the format to PyTorch tensors
tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])

# --- 3. MODEL DEFINITION (Stage 3 Analog) ---

# Load the pre-trained DistilBERT model for sequence classification (sentiment analysis)
# The model automatically adds a classification head for 2 classes (positive/negative)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=3).to(device)
print(f"Model loaded: {MODEL_NAME}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Model loaded: ProsusAI/finbert


## This code cell first inializes the text preprocessing and model loading steps for fine-tuning FinBERT on the sentiment classification task, which first imports the tokenizer and sequence classification model classes from Huggingface Transformers. Then it loads the pre-train FinBERT tokenizer "ProsusAI/finbert" checkpoint for preparation of raw text for input into the model by converting words into numerical IDs. It also applies padding and truncation to the batches of texts to ensure uniform input length, then it is mapped over both training and evaluation subsets, converting them into tokenized format suitable for model training. The pre-trained FinBERT model, which was configured for 3-class classification, is loaded and moved to the GPU device that is applicable to the device used.

# Step 4 and 5: Define training config.

In [None]:
# --- 4. METRICS AND TRAINING SETUP ---

def compute_metrics(p):
    # This function calculates key metrics during evaluation (Stage 3 Validation)
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="weighted")
    return {"accuracy": acc, "f1": f1}

# Define training arguments (hyperparameters)
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=4,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=800,
    weight_decay=0.1,
    logging_dir="'./logs",
    logging_steps=100,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    fp16=torch.cuda.is_available(),
    report_to=[]
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

  trainer = Trainer(


## After loading the pre-trained FinBERT model, this code cell sets up the evaluation metrics and training process for fine-tuning the FinBERT model using the Huggingface Trainer API. The function (compute_metrics) receives model predictions and actual labels during evaluation and computes two key classification metrics: accuracy and weighted F1-score. Then TrainingArguments specifies the hyperparameters and training behavior such as the number of epochs, batch sizes for training and evaluation, warmup steps, weight decay, and logging. Each epoch was saved and loaded with the best model checkpoint. Then the Trainer class is initialized with the model, training arguments (hyperparameters), tokenized subsets, metric computation, and tokenizer, which handled the training and evaluation loop.

# Step 6 and 7: Finetune and Save the Model.

In [None]:
# --- 5. EXECUTION --
print("\n --- Starting Fine-Tuning (Expected Time: 1-4 hours on CPU, 1-3 minutes on GPU) ---")

# Start training the model
trainer.train()

# --- 6. FINAL EVALUATION (Stage 3 Validation) ---
print("\n--- Final Evaluation Results ---")
eval_results = trainer.evaluate()
print(eval_results)

print("\nFine-tuning process complete. The resulting model can now be used for inference (Stage 4).")


 --- Starting Fine-Tuning (Expected Time: 1-4 hours on CPU, 1-3 minutes on GPU) ---


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,1.5652,0.834985,0.678,0.630221
2,0.8406,0.531659,0.83,0.828195
3,0.6357,0.242885,0.924,0.921185
4,0.2766,0.070356,0.978,0.978615



--- Final Evaluation Results ---


{'eval_loss': 0.07035603374242783, 'eval_accuracy': 0.978, 'eval_f1': 0.9786147058703436, 'eval_runtime': 0.6326, 'eval_samples_per_second': 790.423, 'eval_steps_per_second': 50.587, 'epoch': 4.0}

Fine-tuning process complete. The resulting model can now be used for inference (Stage 4).


## After defining the evaluation metrics and training arguments for fine-tuning and evaluation, this code cell performs fine-tuning and evaluation at the same. Using GPU, it will take like 1-3 minutes, which the train.train() method iniated the training loop, while trainer.evaluate() method ran the model on the validation dataset to compute and return evaluation metrics such as accuracy and F1-score per epoch. After the last epoch, the fine-tuned model is ready for inference or further deployment.

# Step 8: The Inference Process - Batch Testing.

In [None]:
from transformers import pipeline

# 1. Use the model directly loaded in Trainer (no path needed if still in memory)
# 2. Create a prediction pipeline using the trained model and tokenizer

# We use the pipeline tool for easy prediction
sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1 # 0 for GPU, -1 for CPU
)

# 3. Define new text data (mimicking your 100,000 documents)
new_data = [
    "Ayala Corp. net income down 17% in Q1.",
    "Banks may face distress if Covid persists according to Diokno.",
    "PH economy to shrink by over 10% in Apr-Jun'.",
]

# 4. Run Batch Inference
print("\n--- Running Inference on Unlabeled Data ---")
results = sentiment_analyzer(new_data)

# 5. Print results, mapping FinBERT's labels
# FinBERT label mapping: 0=Negative, 1=Neutral, 2: "Positive"
label_map = {
    "negative": "Negative",
    "neutral": "Neutral",
    "positive": "Positive"
}

for text, result in zip(new_data, results):
    # The result['label'] correspond to negative, neutral, and positive sentiment in the Philippine News dataset
    sentiment = label_map.get(result['label'].lower(), "Unknown")

    print(f"Text: {text}")
    print(f" Prediction: {sentiment} (Score: {result['score']:.4f})")

# --- Next Steps ---
# You would next apply this analyzer to your entire 100,000-document corpus
# to generate the structured data needed for Stage 4 visualization

Device set to use cuda:0



--- Running Inference on Unlabeled Data ---
Text: Ayala Corp. net income down 17% in Q1.
 Prediction: Neutral (Score: 0.8352)
Text: Banks may face distress if Covid persists according to Diokno.
 Prediction: Negative (Score: 0.8005)
Text: PH economy to shrink by over 10% in Apr-Jun'.
 Prediction: Negative (Score: 0.9793)


## After fine-tuning FinBERT, this code cell configures and runs sentiment analysis batch inference using the fine-tuned FinBERT model and tokenizer via the pipeline abstraction. A small batch of new, unlabeled financial news headlines is defined as sample input data, the pipeline produced sentiment predictions with output labels (Negative, Neutral, or Positive) and confidence scores.