<a href="https://colab.research.google.com/github/Thesis-AfaanOromooChatGPT2025/MedPromptX/blob/main/Fine_Tuning_Transformers_with_%F0%9F%A4%97.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
falgunipatel19_biomedical_text_publication_classification_path = kagglehub.dataset_download('falgunipatel19/biomedical-text-publication-classification')

print('Data source import complete.')


# **Introduction**

Welcome to this notebook, where we will delve into the fascinating world of **fine-tuning Transformer** architectures for specific **downstream tasks**.

In our previous notebooks, we delved into the **theory behind Transformers** and explored how to use **Transformers**. Now, it's time to take our understanding a **step further** and explore how we can **adapt and fine-tune Transformer** models to suit our specific needs.

- **Transformers** are typically pretrained on **large corpora of text**, imbuing them with **rich contextual information** about language. While **pretrained Transformers** can be **incredibly powerful** out of the box, what if we want to **tailor them to our own dataset** or **fine-tune** them for a **specific downstream task?**

That's precisely what we'll be **uncovering in this notebook** – the **art and science of fine-tuning Transformers**. We'll explore **techniques** to **adapt pretrained Transformers** to our unique datasets and optimize them for specific tasks, whether it's text classification, named entity recognition, sentiment analysis, or any other NLP task you can imagine.

So buckle up and get ready to embark on an **exciting journey** of **fine-tuning Transformer** architectures for **unparalleled performance** and efficiency in your **NLP projects**. Let's dive in!


# **Environmental Setup**

In this section, our primary objective is to meticulously gather all essential libraries necessary for the seamless operation of our system. We will ensure to procure pertinent datasets and download any vital resources imperative for our functionality. Additionally, we'll establish key constants pivotal for subsequent operations.

In [None]:
# Download libraries
!pip install datasets transformers -q
!pip install transformers[torch] -q
!pip install accelerate>=0.21.0 -U -q

In [None]:
# Base Library
import numpy as np
import tensorflow as tf
from tensorflow import keras

# Data
import pandas as pd
from datasets import load_dataset

# Transformers
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification

# Model Training
from tensorflow.keras import losses
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers.schedules import PolynomialDecay

# Model Trainer
from transformers import Trainer
from transformers import AutoConfig
from transformers import TrainingArguments

# Model Evaluation
!pip install evaluate -q -U
import evaluate

In [None]:
# Constants
BATCH_SIZE = 8
CHECKPOINT = "bert-base-uncased"
FILE_PATH = "/kaggle/input/biomedical-text-publication-classification/alldata_1_for_kaggle.csv"

# **Dataset Loading**

To facilitate efficient loading of data into memory, we can leverage the capabilities of the Hugging Face dataset library. This powerful tool streamlines the process, enabling seamless integration of the required diabetes datasets into our system's memory.

In [None]:
# Load & transform data
data = pd.read_csv(FILE_PATH, encoding="latin1")
data.columns = ["ID", "Label", "Text"]
data.drop(columns=['ID'], inplace=True)
data.head()

# Save as CSV
data.to_csv("medical_text.csv", index=False)

In [None]:
class_names = sorted(data.Label.unique())
class_mapping = {name:index for index, name in enumerate(class_names)}

In [None]:
# Load the dataset with the correct encoding
data_dict = load_dataset("csv", data_files="/kaggle/working/medical_text.csv", encoding="latin1")

# Generate a Train Test Split
data_dict = data_dict["train"].train_test_split(test_size=0.2, seed=42)

# Select Training and Testing data
train_ds = data_dict["train"]
test_ds = data_dict["test"]

# **Tokenizer**

Natural Language Processing (NLP) involves text tokenization as a key step in converting human language into a form that can be analyzed and understood by computers.

Tokenization involves breaking down text into smaller units, or tokens, and converting these tokens into numeric format, allowing for efficient computation and analysis. The process of tokenization is done in two stages:

- first, the text is split into individual words (word-level tokenization),

- and then those words are converted into numeric format (conversion to numeric format). This is done by assigning a unique numeric ID to each word in the vocabulary, which is based on the size of the vocabulary.

Overall, tokenization is essential for NLP as it transforms unstructured text data into a structured and standardized format, enabling the analysis of human language.

In [None]:
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT)

# Preprocessing function
def apply_tokenizer(sample):
    return tokenizer(sample['Text'], truncation=True, padding=True)

def text_to_num_label(sample):
    return {'labels': [class_mapping[label] for label in sample['Label']]}

Batching sequences for preprocessing enhances efficiency compared to processing individual samples sequentially. This approach significantly reduces total processing time, making it more efficient and streamlined.

In [None]:
train_ds

In [None]:
# This will take some time
train_ds = train_ds.map(apply_tokenizer, batched=True)
test_ds = test_ds.map(apply_tokenizer, batched=True)

In [None]:
train_ds = train_ds.map(text_to_num_label, batched=True)
test_ds = test_ds.map(text_to_num_label, batched=True)

So far, we've implemented two types of data processing steps. Firstly, we tokenized the textual data using a tokenizer. Secondly, we converted the textual labels into numeric format.

# **Keras - Training**

To proceed with the training phase, we'll begin by loading our transformer model. Next, we'll combine this loaded model with the necessary components, such as optimizer and loss function, before advancing to the training process.

In [None]:
# # Loading Transformer
# model = TFAutoModelForSequenceClassification.from_pretrained(CHECKPOINT, num_labels=len(class_names))
# model.layers

Given the widespread popularity and efficient performance of the BERT Transformer architecture, we've decided to leverage this model for our task.

> BERT (Bidirectional Encoder Representations from Transformers) has demonstrated remarkable effectiveness across various natural language processing tasks, making it a robust choice for our application. We'll integrate BERT into our training pipeline to capitalize on its advanced capabilities and achieve optimal performance in our task.

In [None]:
# # Polynomial Decay Learning Rate
# num_epochs = 50
# num_train_steps = len(tf_train) * BATCH_SIZE * num_epochs

# # Initilize Polynomial Decay
# polynomial_decay_schedule = PolynomialDecay(
#     initial_learning_rate=2e-6,
#     end_learning_rate=0.0,
#     decay_steps=num_train_steps
# )

# # Define Polynomial Decay Callback
# polynomial_decay_callback = keras.callbacks.LearningRateScheduler(polynomial_decay_schedule)

In [None]:
# # Model Compilation
# model.compile(
#     loss=losses.SparseCategoricalCrossentropy(),
#     optimizer="adam",
#     metrics=['accuracy']
# )

In [None]:
# # Model Training
# history = model.fit(
#     tf_train,
#     validation_data=tf_test,
#     epochs=num_epochs,
#     callbacks=[
#         keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
#         keras.callbacks.ModelCheckpoint("FineTuned-BERT.keras", save_best_only=True),
#         polynomial_decay_callback
#     ]
# )

# **HuggingFace - Trainer**

Training deep learning models traditionally involves using the `fit` method in Keras, often backed by TensorFlow. However, there is a more efficient approach for models associated with Hugging Face. Instead of relying solely on Keras, which may not always provide the optimal training parameters, one can use the Hugging Face Trainer. The Hugging Face Trainer offers a streamlined and effective way to train models, ensuring the process is faster and better optimized with the correct training arguments.

In [None]:
# Load the Model
model = AutoModelForSequenceClassification.from_pretrained(CHECKPOINT, num_labels=len(class_names))

In [None]:
# Default id 2 label
default_id2label = model.config.id2label
print(f"Default ID 2 Label Mapping : {default_id2label}")

# Mapping with respect to Dataset
id2label = dict(enumerate(class_names))
print(f"Dataset ID 2 Label Mapping : {id2label}")

To update or change the model configurations, we need to follow these steps.

In [None]:
# Label to ID Mapping
label2id = {v:k for k, v in id2label.items()}

# Update Model COnfigurations
config = AutoConfig.from_pretrained(CHECKPOINT, num_labels=len(class_names))
config.id2label = id2label
config.label2id = label2id

# Add configs to the Model
model.config = config

While it is possible to directly change the model configurations, following these steps is recommended for greater security and stability.

In [None]:
# All Model Configurations
model.config

To train a model using the Hugging Face Trainer, we first need to define the training arguments. These arguments are crucial as they dictate the training process's specifics, including parameters such as learning rate, logging directory, gradient accumulation steps, batch size, and various other settings. The training arguments ensure that the model is trained efficiently and effectively by providing the necessary configurations to optimize the learning process. Properly setting these parameters can significantly impact the model's performance and the stability of the training procedure.

In [None]:
# Training Constants
BATCH_SIZE = 2                                 # Change this in case of CUDA OOM Error
LR = 5e-5
EPOCHS = 10

# Set training arguments
training_args = TrainingArguments(
    output_dir="FineTuned-BERT",                # Output Directory to save Logs
    learning_rate=LR,                           # Learning Rate
    num_train_epochs=EPOCHS,                    # Total Number of Epochs
    per_device_train_batch_size=BATCH_SIZE,     # Batch Size for Training
    per_device_eval_batch_size=BATCH_SIZE,      # Batch Size for evaluation
    logging_steps=10,                           # Get logs after n epochs
    evaluation_strategy="epoch",                # Perform evaluation after every epoch
    save_strategy="epoch",                      # Save model after Epoch
    load_best_model_at_end=True,                # Load the best model at the end of the training (like EarlyStopping)
    fp16=True,                                  # Enable mixed precision training
    gradient_accumulation_steps=2,              # Steps after which Gradients are Accumulated
    weight_decay=0.01,                          # Weight Decay parameter
    warmup_steps=100,                           # Number of Warmup Steps
    report_to="none"                            # We can report to Tensorboard, wandb, etc
)

One great feature of the Hugging Face Trainer is the ability to add custom metrics for model evaluation. This flexibility allows you to compute and monitor any metric you need, providing deeper insights into your model's performance.

In [None]:
# Load Metrics
accuracy_metric = evaluate.load("accuracy")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")
f1_score_metric = evaluate.load("f1")

# Define Custom Metric
def compute_metrics(preds):

    # Get Logits and Labels from the preds
    logits, labels = preds

    # Obtain prediction from the preds
    predictions = np.argmax(logits, axis=-1)

    # Evaluate Model Performance
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)['accuracy']
    precision = precision_metric.compute(predictions=predictions, references=labels, average="micro")['precision']
    recall = recall_metric.compute(predictions=predictions, references=labels, average="micro")['recall']
    f1_score = f1_score_metric.compute(predictions=predictions, references=labels, average="micro")['f1']

    return {
        "accuracy" : accuracy,
        "precision" : precision,
        "recall" : recall,
        "f1_score" : f1_score
    }


# Model Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    compute_metrics=compute_metrics
)

# **Model Training**

In [None]:
# Model Training
trainer.train()

The training process and the corresponding evaluation metrics are summarized in the table. Here's a detailed and specific analysis focusing on the main sections:

**Training Loss:**
- The training loss demonstrates an initial decrease, indicating effective learning during the initial epochs. The training loss drops to zero in several epochs (3, 4, 5, and 10), which suggests potential overfitting as the model perfectly fits the training data. However, some fluctuations in later epochs (6, 7, 8, and 9) suggest attempts to regularize and mitigate overfitting.

**Validation Loss:**
- The validation loss shows a general decreasing trend over the epochs, indicating improving model generalization. Notably, the validation loss stabilizes around a lower value in later epochs, reflecting the model's capacity to generalize from the training data to unseen validation data.

**Evaluation Metrics (Accuracy, Precision, Recall, F1 Score):**
- The accuracy, precision, recall, and F1 score metrics are consistently high across all epochs, ranging from 0.9214 to 0.9914. This consistent performance across all evaluation metrics demonstrates the model's robustness and reliability in classification tasks.
- The highest accuracy and corresponding metrics are observed in epoch 7, with values of 0.9914 across all evaluation metrics. This indicates peak model performance, combining both high accuracy and low validation loss.

**General Observations:**
- **Early Training**: The initial epochs (1 and 2) show significant improvements in both training and validation metrics, suggesting effective initial learning and model optimization.
- **Overfitting Signs**: The zero training loss in several epochs indicates potential overfitting. Despite this, the model maintains high evaluation metrics, indicating it still generalizes well to validation data.
- **Stabilization**: In later epochs, the validation loss and evaluation metrics stabilize, reflecting a well-regularized model with consistent performance.

**Conclusion:**
Overall, the model demonstrates strong learning capability with initial rapid improvements, signs of overfitting managed through regularization, and consistently high evaluation metrics. This indicates a well-performing model with reliable generalization to unseen data, suitable for the intended classification task.

# **Model Evaluation**

In [None]:
# Model Evaluation
eval_res = trainer.evaluate()

print(f"Evaluation Loss     : {eval_res['eval_loss']}")
print(f"Evaluation Accuracy : {eval_res['eval_accuracy']}")
print(f"Evaluation Precision: {eval_res['eval_precision']}")
print(f"Evaluation Recall   : {eval_res['eval_recall']}")
print(f"Evaluation F1 Score : {eval_res['eval_f1_score']}")

**Evaluation Metrics Analysis:**

The evaluation metrics provide a comprehensive overview of the model's performance on the validation dataset:

- **Evaluation Loss**: 0.1085 This low loss value indicates a small discrepancy between the predicted and actual values, suggesting the model's predictions are highly accurate.

- **Evaluation Accuracy**: 0.9914 This metric indicates that approximately 99.14% of the predictions made by the model are correct, showcasing its high overall accuracy.

- **Evaluation Precision**: 0.9914 Precision measures the proportion of true positive predictions out of all positive predictions. A precision of 99.14% demonstrates that the model is highly effective at correctly identifying positive instances with very few false positives.

- **Evaluation Recall**: 0.9914 Recall measures the proportion of true positive predictions out of all actual positives. With a recall of 99.14%, the model successfully identifies the vast majority of positive instances, minimizing false negatives.

- **Evaluation F1 Score**: 0.9914 The F1 score is the harmonic mean of precision and recall, providing a balanced measure of the model’s performance. A score of 99.14% indicates that the model maintains a strong balance between precision and recall.

**General Observations:**
- The consistency across accuracy, precision, recall, and F1 score, all being 99.14%, suggests that the model performs uniformly well across different evaluation metrics.
- The low evaluation loss combined with high accuracy and other metrics indicates that the model generalizes well to unseen data without overfitting.
- Such high values in all these metrics demonstrate that the model is both precise and comprehensive in its classification tasks, making very few errors in both false positives and false negatives.

**Conclusion:**
The evaluation results indicate a highly effective model with excellent generalization capabilities. The model achieves near-perfect accuracy, precision, recall, and F1 scores, highlighting its robustness and reliability in making accurate predictions. These metrics collectively suggest that the model is well-suited for practical applications where high precision and recall are crucial.

# **Saving Model**

In [None]:
model.save_pretrained("fintuned-bert")
tokenizer.save_pretrained("fintuned-bert")

**Note**: Training and evaluation results may differ a bit due to re run of the notebook/kernel.

---
**DeepNets**