<a href="https://colab.research.google.com/github/Baldezo313/LLM-RAG-FineTuning/blob/main/Parameter_Efficient_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ludwig: A Comprehensive Guide to LLM Fine Tuning using LoRA  

The development of Natural Language Machines (NLP) and Artificial Intelligence (AI) has significantly impacted the field. These models can understand and generate human-like text, enabling applications like chatbots and document summarization. However, to fully utilize their capabilities, they need to be fine-tuned for specific use cases. Ludwig, a low-code framework, is designed for creating custom AI models, including LLMs and deep neural networks. This article provides a comprehensive guide to fine-tuning LLMs using Ludwig, focusing on creating state-of-the-art models for real-world scenarios.


## Understanding Ludwig: A Low Code Framework For LLM Fine Tuning

Ludwig, known for its user-friendly, low-code approach, supports a wide array of machine learning (ML) and deep learning applications. This flexibility makes it an ideal choice for developers and researchers aiming to build custom AI models without deep programming requirements. Ludwig’s capabilities include but are not limited to training, fine-tuning, hyperparameter optimization, model visualization, and deployment.  

Key Features of Ludwig
* Training and Fine-Tuning: Ludwig supports a range of training paradigms, including full training and fine-tuning of pre-trained models.
* Model Configuration: Utilizing YAML files for configuration, Ludwig allows detailed specification of model parameters, making it highly customizable and flexible.
* Hyperparameter Tuning: Ludwig integrates tools for automatic hyperparameter optimization, enhancing model performance.
* Explainable AI: Tools within Ludwig provide insights into model decisions, promoting transparency.
* Model Serving and Benchmarking: Ludwig makes it easy to serve models and benchmark their performance under different conditions.

As introduced earlier, Ludwig is a low-code framework for building custom AI models, like Large Language Models and other Deep neural networks. Technically, Ludwig can be used for training and finetuning any Neural Network and support wide range of Machine Learning and Deep Learning use-cases. Ludwig also has support for visualizations, hyperparameter tuning, explainable AI, model benchmarking as well as model serving.

It utilizes yaml file where all the configurations are to be specified like, model name, type of task to be performed, number of epochs to run in case of finetuning, hyperparameter for training and finetuning, quantization configurations etc. Ludwig supports wide range of LLM focused tasks like Zero-shot batch inference, RAG, Adapter-based finetuning for text generation, instruction tuning etc. In this article, we will fine-tune Mistral 7B model to follow human instructions. We will also explore how to define a yaml configuration for Ludwig.

It’s critical to understand the prerequisites and the setup required:

* **Environment Setup**: Installing the necessary software and packages.
* **Data Preparation**: Selecting and preprocessing the appropriate datasets.
* **YAML Configuration**: Defining model parameters and training options in a YAML file.
* **Model Training and Evaluation**: Executing the fine-tuning and assessing model performance.

### Step1: Install Necessary Packages
Execute if you get the Transformers version runtime error.

In [None]:
!pip install transformers datasets peft


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.13.0->peft)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.13.0->peft)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.13.0->peft)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting

In [None]:
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertForMaskedLM
from peft import get_peft_model, LoraConfig, TaskType

# Step 1: Load the pre-trained model and tokenizer
model_name = "bert-base-uncased"
model = BertForMaskedLM.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

# Step 2: Apply LoRA using PEFT
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

model = get_peft_model(model, lora_config)

# Step 3: Prepare the dataset
texts = ["Hello, how are you?", "I am doing well."]
encodings = tokenizer(texts, truncation=True, padding="max_length", return_tensors="pt", max_length=16)
input_ids = encodings["input_ids"]
attention_mask = encodings["attention_mask"]
labels = input_ids.clone()

# Step 4: Fine-tuning the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
input_ids = input_ids.to(device)
attention_mask = attention_mask.to(device)
labels = labels.to(device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
loss_fn = nn.CrossEntropyLoss()

model.train()
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
    loss = outputs.loss
    print(f"Epoch {epoch + 1}, Loss: {loss.item():.4f}")
    loss.backward()
    optimizer.step()

# Step 5: Inference with the fine-tuned model
model.eval()
test_text = "How are you doing today?"
test_inputs = tokenizer(test_text, return_tensors="pt", padding="max_length", truncation=True, max_length=16).to(device)
output = model(**test_inputs)
predicted_ids = torch.argmax(output.logits, dim=-1)
predicted_text = tokenizer.decode(predicted_ids[0], skip_special_tokens=True)
print("Predicted text:", predicted_text)


# ========================================================

In [None]:
# Étape 1 : Installer les dépendances
!pip install transformers datasets accelerate peft

In [None]:
from datasets import load_dataset
import torch
from transformers import BertTokenizer, BertForMaskedLM
from peft import get_peft_model, LoraConfig, TaskType
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
import torch.nn as nn
import torch.optim as optim

In [None]:
# 1. Load dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")  # Tiny subset for speed
texts = dataset["text"]
texts = [t for t in texts if len(t.strip()) > 0][:100]  # Filter and trim

In [None]:
# 2. Tokenization
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
tokenized = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=64)


In [None]:
# 3. Model + LoRA
model = BertForMaskedLM.from_pretrained(model_name)
lora_config = LoraConfig(task_type=TaskType.FEATURE_EXTRACTION, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, lora_config)

In [None]:
# 4. Dataloader preparation
input_ids = tokenized["input_ids"]
attention_mask = tokenized["attention_mask"]
labels = input_ids.clone()

In [None]:
batch_size = 8
dataset = list(zip(input_ids, attention_mask, labels))

def collate(batch):
    input_ids, masks, labels = zip(*batch)
    return {
        "input_ids": pad_sequence(input_ids, batch_first=True, padding_value=tokenizer.pad_token_id),
        "attention_mask": pad_sequence(masks, batch_first=True, padding_value=0),
        "labels": pad_sequence(labels, batch_first=True, padding_value=-100),
    }

loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, collate_fn=collate)


In [None]:
# 5. Training loop (short for demonstration)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.train()
optimizer = optim.AdamW(model.parameters(), lr=1e-4)

for epoch in range(3):
    total_loss = 0
    for batch in loader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {total_loss / len(loader):.4f}")

In [None]:
# 6. Inference: prédiction sur une phrase contenant un token [MASK]
model.eval()

test_sentence = "The capital of France is [MASK]."
inputs = tokenizer(test_sentence, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Trouver la position du token [MASK]
mask_token_index = (inputs["input_ids"] == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

# Obtenir le mot prédit
predicted_token_id = logits[0, mask_token_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print("Predicted masked word:", predicted_token)


### Évaluer la perplexité (sur un dataset de test)

In [None]:
import math
from transformers import DataCollatorForLanguageModeling

# Charger le dataset test (exemple)
test_dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="validation[:1%]")
test_dataset = test_dataset.filter(lambda x: len(x["text"].strip()) > 0)

def tokenize_fn(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=64)


test_tokenized = test_dataset.map(tokenize_fn, batched=True, remove_columns=["text"])

# Préparer le data collator MLM (mêmes paramètres)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)

# Préparer le DataLoader
from torch.utils.data import DataLoader
test_loader = DataLoader(test_tokenized, batch_size=20, collate_fn=data_collator)

model.eval()
total_loss = 0
total_tokens = 0

with torch.no_grad():
    for batch in test_loader:
        batch = {k: v.to(model.device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        batch_size = batch["input_ids"].size(0)
        seq_len = batch["input_ids"].size(1)
        total_loss += loss.item() * batch_size * seq_len
        total_tokens += batch_size * seq_len

perplexity = math.exp(total_loss / total_tokens)
print(f"Perplexity: {perplexity:.2f}")


###  Publier le modèle sur Hugging Face Hub

In [None]:
from huggingface_hub import notebook_login

# Se connecter (ouvre une fenêtre pour entrer ton token HF)
notebook_login()

In [None]:
# Publier le modèle et le tokenizer
model.push_to_hub("Baldezo313/bert-lora-wikitext2")
tokenizer.push_to_hub("Baldezo313/bert-lora-wikitext2")

# Harnessing NLP Superpowers: A Step-by-Step Hugging Face Fine Tuning

## Import Necessary Libraries

In [2]:
!pip install pytorch_lightning


[0mCollecting pytorch_lightning
  Downloading pytorch_lightning-2.5.2-py3-none-any.whl.metadata (21 kB)
Collecting torchmetrics>=0.7.0 (from pytorch_lightning)
  Downloading torchmetrics-1.8.0-py3-none-any.whl.metadata (21 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch_lightning)
  Downloading lightning_utilities-0.15.0-py3-none-any.whl.metadata (5.7 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.1.0->pytorch_lightning)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=2.1.0->pytorch_lightning)
  Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Downloading pytorch_lightning-2.5.2-py3-none-any.whl (825 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m825.4/825.4 kB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lightning_utilities-0.15.0-py3-none-any.whl (29 kB)
Using cached nvidia_cudnn_cu1

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


import torch

from transformers import T5Tokenizer
from transformers import T5ForConditionalGeneration
from torch.optim import AdamW


import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint

pl.seed_everything(100)

import warnings
warnings.filterwarnings("ignore")


INFO:lightning_fabric.utilities.seed:Seed set to 100


## Import Dataset


In [2]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("ananthu017/squad-csv-format")

print("Path to dataset files:", path)



Downloading from https://www.kaggle.com/api/v1/datasets/download/ananthu017/squad-csv-format?dataset_version_number=2...


100%|██████████| 8.75M/8.75M [00:01<00:00, 5.92MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/ananthu017/squad-csv-format/versions/2


In [3]:
path = "/root/.cache/kagglehub/datasets/ananthu017/squad-csv-format/versions/2"

import os

files = os.listdir(path)
print(files)


['SQuAD_csv.csv']


In [4]:
#df = pd.read_csv("/kaggle/input/queestion-answer-dataset-qa/train.csv")
#df.columns

df = pd.read_csv(os.path.join(path, "SQuAD_csv.csv"))
print(df.head())

   Unnamed: 0                                            context  \
0           0  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...   
1           1  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...   
2           2  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...   
3           3  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...   
4           4  Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ b...   

                                            question  \
0           When did Beyonce start becoming popular?   
1  What areas did Beyonce compete in when she was...   
2  When did Beyonce leave Destiny's Child and bec...   
3      In what city and state did Beyonce  grow up?    
4         In which decade did Beyonce become famous?   

                         id  answer_start                 text  
0  56be85543aeaaa14008c9063           269    in the late 1990s  
1  56be85543aeaaa14008c9065           207  singing and dancing  
2  56be85543aeaaa14008c9066           526                 2

In [5]:
df.columns

Index(['Unnamed: 0', 'context', 'question', 'id', 'answer_start', 'text'], dtype='object')

In [6]:
df = df[['context','question', 'text']]
print("Number of records: ", df.shape[0])

Number of records:  86821


## Problem Statement  

"To create a model capable of generating responses based on context and questions."

For example,

Context = "Clustering groups of similar cases, for example, can find similar patients or use for customer segmentation in the banking field. The association technique is used for finding items or events
that often co-occur, for example, grocery items that a particular customer usually buys together. Anomaly detection is used to discover abnormal
and unusual cases; for example, credit card fraud
detection."

Question = "What is the example of Anomaly detection?"

Answer = ????????????????????????????????

In [7]:
df["context"] = df["context"].str.lower()
df["question"] = df["question"].str.lower()
df["text"] = df["text"].str.lower()

df.head()


Unnamed: 0,context,question,text
0,beyoncé giselle knowles-carter (/biːˈjɒnseɪ/ b...,when did beyonce start becoming popular?,in the late 1990s
1,beyoncé giselle knowles-carter (/biːˈjɒnseɪ/ b...,what areas did beyonce compete in when she was...,singing and dancing
2,beyoncé giselle knowles-carter (/biːˈjɒnseɪ/ b...,when did beyonce leave destiny's child and bec...,2003
3,beyoncé giselle knowles-carter (/biːˈjɒnseɪ/ b...,in what city and state did beyonce grow up?,"houston, texas"
4,beyoncé giselle knowles-carter (/biːˈjɒnseɪ/ b...,in which decade did beyonce become famous?,late 1990s


## Initialize Parameters  

* **input length**: During training, we refer to the number of input tokens (e.g., words or characters) in a single example fed into the model as input length. If you’re training a language model to predict the next word in a sentence, the input length would be the number of words in the phrase.
* **Output length**: During training, the model is expected to generate a specific quantity of output tokens, such as words or characters, in a single sample. The output length corresponds to the number of words the model predicts within the sentence.
* **Training batch size**: During training, the model processes several samples at once. If you set the training batch size to 32, the model handles 32 instances, such as 32 phrases, simultaneously before updating its model weights.
* **Validating batch size**: Similar to the training batch size, this parameter indicates the number of instances that the model handles during the validation phase. In other words, it represents the volume of data the model processes when it is tested on a hold-out dataset.  
* **Epochs**: An epoch is a single trip through the complete training dataset. So, if the training dataset comprises 1000 instances and the training batch size is 32, one epoch will need 32 training steps. If the model is trained for ten epochs, it will have processed ten thousand instances (10 * 1000 = ten thousand).

In [8]:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_MAX_LEN = 512 # Input length
OUT_MAX_LEN = 128 # Output Length
TRAIN_BATCH_SIZE = 8 # Training Batch Size
VALID_BATCH_SIZE = 2 # Validation Batch Size
EPOCHS = 5 # Number of Iteration

## T5 Transformer
The T5 model is based on the Transformer architecture, a neural network designed to handle sequential input data effectively. It comprises an encoder and a decoder, which include a sequence of interconnected "layers".

The encoder and decoder layers comprise various "attention" mechanisms and "feedforward" networks. The attention mechanisms enable the model to focus on different sections of the input sequence at other times. At the same time, the feedforward networks alter the input data using a set of weights and biases.

The T5 model also employs "self-attention", which allows each element in the input sequence to pay attention to every other element. This allows the model to recognize links between words and phrases in the input data, which is critical for many NLP applications.

In addition to the encoder and decoder, the T5 model contains a "language model head",  which predicts the next word in a sequence based on the prior words. This is critical for translation and text production jobs, where the model must provide cohesive and natural-sounding output.

The T5 model represents a large and sophisticated neural network designed for highly efficient and accurate processing of sequential input. It has undergone extensive training on a diverse text dataset and can proficiently perform a broad spectrum of natural language processing tasks.

## T5Tokenizer
T5Tokenizer is used to turn a text into a list of tokens, each representing a single word or punctuation mark. The tokenizer additionally inserts unique tokens into the input text to denote the text’s start and end and distinguish various phrases.  

The T5Tokenizer employs a combination of character-level and word-level tokenization and a subword-level tokenization strategy comparable to the SentencePiece tokenizer. It subwords the input text based on the frequency of each character or character sequence in the training data. This assists the tokenizer in dealing with out-of-vocabulary (OOV) terms that do not occur in the training data but do appear in the test data.

The T5Tokenizer additionally inserts unique tokens into the text to denote the start and end of sentences and to divide them. It adds the tokens s > and / s >, for example, to signify the beginning and end of a phrase, and pad > to indicate padding.

In [9]:
MODEL_NAME = "t5-base"

tokenizer = T5Tokenizer.from_pretrained(MODEL_NAME, model_max_length= INPUT_MAX_LEN)


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [10]:
print("eos_token: {} and id: {}".format(tokenizer.eos_token,
                   tokenizer.eos_token_id)) # End of token (eos_token)
print("unk_token: {} and id: {}".format(tokenizer.unk_token,
                   tokenizer.eos_token_id)) # Unknown token (unk_token)
print("pad_token: {} and id: {}".format(tokenizer.pad_token,
                 tokenizer.eos_token_id)) # Pad token (pad_token)


eos_token: </s> and id: 1
unk_token: <unk> and id: 1
pad_token: <pad> and id: 1


## Dataset Preparation

When dealing with PyTorch, you usually prepare your data for use with the model by using a dataset class. The dataset class is responsible for loading data from the disc and executing required preparation procedures, such as tokenization and numericalization. The class should also implement the getitem function, which is used to obtain a single item from the dataset by index.

The init method populates the dataset with the text list, label list, and tokenizer. The len function returns the number of samples in the dataset. The get item function returns a single item from a dataset by index. It accepts an index idx and outputs the tokenized input and labels.

It is also customary to include various preprocessing steps, such as padding and truncating the tokenized inputs. You may also turn the labels into tensors.

In [11]:
class T5Dataset:

    def __init__(self, context, question, target):
        self.context = context
        self.question = question
        self.target = target
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN

    def __len__(self):
        return len(self.context)

    def __getitem__(self, item):
        context = str(self.context[item])
        context = " ".join(context.split())

        question = str(self.question[item])
        question = " ".join(question.split())

        target = str(self.target[item])
        target = " ".join(target.split())


        inputs_encoding = self.tokenizer(
            context,
            question,
            add_special_tokens=True,
            max_length=self.input_max_len,
            padding = 'max_length',
            truncation='only_first',
            return_attention_mask=True,
            return_tensors="pt"
        )


        output_encoding = self.tokenizer(
            target,
            None,
            add_special_tokens=True,
            max_length=self.out_max_len,
            padding = 'max_length',
            truncation= True,
            return_attention_mask=True,
            return_tensors="pt"
        )


        inputs_ids = inputs_encoding["input_ids"].flatten()
        attention_mask = inputs_encoding["attention_mask"].flatten()
        labels = output_encoding["input_ids"]

        labels[labels == 0] = -100  # As per T5 Documentation

        labels = labels.flatten()

        out = {
            "context": context,
            "question": question,
            "answer": target,
            "inputs_ids": inputs_ids,
            "attention_mask": attention_mask,
            "targets": labels
        }


        return out

## DataLoader

The DataLoader class loads data in parallel and batches, making it possible to work with big datasets that would otherwise be too vast to store in memory. Combining the DataLoader class with a dataset class containing the data to be loaded.

The dataloader is in charge of iterating over the dataset and returning a batch of data to the model for training or assessment while training a transformer model. The DataLoader class offers various parameters to control the loading and preprocessing of data, including batch size, worker thread count, and whether to shuffle the data before each epoch.

In [12]:
class T5DatasetModule(pl.LightningDataModule):

    def __init__(self, df_train, df_valid):
        super().__init__()
        self.df_train = df_train
        self.df_valid = df_valid
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN


    def setup(self, stage=None):

        self.train_dataset = T5Dataset(
        context=self.df_train.context.values,
        question=self.df_train.question.values,
        target=self.df_train.text.values
        )

        self.valid_dataset = T5Dataset(
        context=self.df_valid.context.values,
        question=self.df_valid.question.values,
        target=self.df_valid.text.values
        )

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
         self.train_dataset,
         batch_size= TRAIN_BATCH_SIZE,
         shuffle=True,
         num_workers=4
        )


    def val_dataloader(self):
        return torch.utils.data.DataLoader(
         self.valid_dataset,
         batch_size= VALID_BATCH_SIZE,
         num_workers=1
        )

## Model Building

When creating a transformer model in PyTorch, you usually begin by creating a new class that derives from the torch. nn.Module. This class describes the model’s architecture, including the layers and the forward function. The class’s init function defines the model’s architecture, often by instantiating the model’s different levels and assigning them as class attributes.

The forward method is in charge of passing data through the model in the forward direction. This method accepts input data and applies the model’s layers to create the output. The forward method should implement the model’s logic, such as passing input through a sequence of layers and returning the result.

The class’s init function creates an embedding layer, a transformer layer, and a fully connected layer and assigns these as class attributes. The forward method accepts the incoming data x, processes it via the given stages, and returns the result. When training a transformer model, the training process typically involves two stages: training and validation.

The training_step method specifies the rationale for carrying out a single training step, which generally includes:

* forward pass through the model
* computing the loss
* computing gradients
* Updating the model’s parameters  

The val_step method, like the training_step method, is used to assess the model on a validation set. It usually includes:

* forward pass through the model
* computing the evaluation metrics

In [13]:
class T5Model(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict=True)

    def forward(self, input_ids, attention_mask, labels=None):

        output = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        return output.loss, output.logits


    def training_step(self, batch, batch_idx):

        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)


        self.log("train_loss", loss, prog_bar=True, logger=True)

        return loss

    def validation_step(self, batch, batch_idx):
        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)

        self.log("val_loss", loss, prog_bar=True, logger=True)

        return loss


    def configure_optimizers(self):
        return AdamW(self.parameters(), lr=0.0001)


## Model Training  

Iterating over the dataset in batches, sending the input through the model, and changing the model’s parameters based on the calculated gradients and a set of optimization criteria is usual for training a transformer model.  



In [14]:
def run():

    df_train, df_valid = train_test_split(
        df[0:10000], test_size=0.2, random_state=101
    )

    df_train = df_train.fillna("none")
    df_valid = df_valid.fillna("none")

    df_train['context'] = df_train['context'].apply(lambda x: " ".join(x.split()))
    df_valid['context'] = df_valid['context'].apply(lambda x: " ".join(x.split()))

    df_train['text'] = df_train['text'].apply(lambda x: " ".join(x.split()))
    df_valid['text'] = df_valid['text'].apply(lambda x: " ".join(x.split()))

    df_train['question'] = df_train['question'].apply(lambda x: " ".join(x.split()))
    df_valid['question'] = df_valid['question'].apply(lambda x: " ".join(x.split()))


    df_train = df_train.reset_index(drop=True)
    df_valid = df_valid.reset_index(drop=True)

    dataModule = T5DatasetModule(df_train, df_valid)
    dataModule.setup()

    device = DEVICE
    models = T5Model()
    models.to(device)

    checkpoint_callback  = ModelCheckpoint(
        dirpath="/content/",
        filename="best_checkpoint",
        save_top_k=2,
        verbose=True,
        monitor="val_loss",
        mode="min"
    )
    accelerator = "gpu" if torch.cuda.is_available() else "cpu"

    trainer = pl.Trainer(
        callbacks = [checkpoint_callback],
        max_epochs= EPOCHS,
        accelerator=accelerator,
        devices=1,
        enable_progress_bar=True
    )

    trainer.fit(models, dataModule)

run()

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type                       | Params | Mode
------------------------------------------------------------
0 | model | T5ForConditionalGeneration | 222 M  | eval
------------------------------------------------------------
222 M     Trainable params
0         Non-trainable params
222 M     Total params
891.614   Total estimated model params size (MB)
0         Modules in train mode
541       Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 1000: 'val_loss' reached 0.19560 (best 0.19560), saving model to '/content/best_checkpoint.ckpt' as top 2


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 2000: 'val_loss' reached 0.22685 (best 0.19560), saving model to '/content/best_checkpoint-v1.ckpt' as top 2


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 2, global step 3000: 'val_loss' was not in top 2


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 3, global step 4000: 'val_loss' was not in top 2


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 4, global step 5000: 'val_loss' was not in top 2
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


## Model Prediction

To make predictions with a fine-tuned NLP model like T5 using new input, you can follow these steps:

* **Preprocess the New Input**: Tokenize and preprocess your new input text to match the preprocessing you applied to your training data. Ensure that it is in the correct format expected by the model.
* **Use the Fine-Tuned Model for Inference**: Load your fine-tuned T5 model, which you previously trained or loaded from a checkpoint.
* **Generate Predictions**: Pass the preprocessed new input to the model for prediction. In the case of T5, you can use the generate method to generate responses.

In [17]:
train_model = T5Model.load_from_checkpoint("/content/best_checkpoint-v1.ckpt")

train_model.freeze()

def generate_question(context, question):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_model.to(device)  # Assure que le modèle est bien sur le bon device

    inputs_encoding = tokenizer(
        context,
        question,
        add_special_tokens=True,
        max_length=INPUT_MAX_LEN,
        padding='max_length',
        truncation='only_first',
        return_attention_mask=True,
        return_tensors="pt"
    )

    # ➜ Déplacer les tensors d'entrée sur le bon device
    input_ids = inputs_encoding["input_ids"].to(device)
    attention_mask = inputs_encoding["attention_mask"].to(device)

    generate_ids = train_model.model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_length=INPUT_MAX_LEN,
        num_beams=4,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        early_stopping=True,
    )

    preds = [
        tokenizer.decode(gen_id,
                         skip_special_tokens=True,
                         clean_up_tokenization_spaces=True)
        for gen_id in generate_ids
    ]

    return "".join(preds)

## Prediction  

let’s generate a prediction using the fine-tuned T5 model with new input:

context = "Clustering groups of similar cases, for example, \
can find similar patients, or use for customer segmentation in the \
banking field. Using association technique for finding items or events that \
often co-occur, for example, grocery items that are usually bought together\
by a particular customer. Using anomaly detection to discover abnormal \
and unusual cases, for example, credit card fraud detection."

que = "what is the example of Anomaly detection?"

print(generate_question(context, que))

In [18]:
context = "Classification is used when your target is categorical,\
 while regression is used when your target variable\
is continuous. Both classification and regression belong to the category \
of supervised machine learning algorithms."

que = "When is classification used?"

print(generate_question(context, que))


when target is categorical


## Pusher le modèle sur Hugging Face Hub

In [19]:
#!pip install -q huggingface_hub

* **Connecte sur mon compte Hugging Face**

In [20]:
from huggingface_hub import notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

* **Sauvegarder le Model**

In [21]:
train_model.model.save_pretrained("t5-finetuned-squad")
tokenizer.save_pretrained("t5-finetuned-squad")


('t5-finetuned-squad/tokenizer_config.json',
 't5-finetuned-squad/special_tokens_map.json',
 't5-finetuned-squad/spiece.model',
 't5-finetuned-squad/added_tokens.json')

* **Puis, pousse le modèle sur le Hub**

In [22]:
from huggingface_hub import HfApi

repo_name = "t5-finetuned-squad-mamadou"  # choisis ton nom ici
YOUR_USERNAME = "Baldezo313"
model_id = f"{YOUR_USERNAME}/{repo_name}"  # remplace par ton pseudo Hugging Face

train_model.model.push_to_hub(model_id)
tokenizer.push_to_hub(model_id)


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Baldezo313/t5-finetuned-squad-mamadou/commit/cfee765b48b59b093ae1a284d13f1b0af20bd8e8', commit_message='Upload tokenizer', commit_description='', oid='cfee765b48b59b093ae1a284d13f1b0af20bd8e8', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Baldezo313/t5-finetuned-squad-mamadou', endpoint='https://huggingface.co', repo_type='model', repo_id='Baldezo313/t5-finetuned-squad-mamadou'), pr_revision=None, pr_num=None)

* **Créer un Space (interface web publique) avec Gradio**

In [23]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import gradio as gr

model = T5ForConditionalGeneration.from_pretrained("Baldezo313/t5-finetuned-squad-mamadou")
tokenizer = T5Tokenizer.from_pretrained("Baldezo313/t5-finetuned-squad-mamadou")

def generate_question(context, question):
    inputs = tokenizer(context, question, return_tensors="pt", padding="max_length", truncation=True, max_length=512)
    outputs = model.generate(**inputs, max_length=64, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

iface = gr.Interface(
    fn=generate_question,
    inputs=["text", "text"],
    outputs="text",
    title="Question Generator with T5",
    description="Pose une question et donne un contexte — le modèle génère la réponse !"
)

iface.launch()


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://efb545a8b111edbb38.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


