<a href="https://colab.research.google.com/github/Baldezo313/LLM-RAG/blob/main/Parameter_Efficient_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ludwig: A Comprehensive Guide to LLM Fine Tuning using LoRA  

The development of Natural Language Machines (NLP) and Artificial Intelligence (AI) has significantly impacted the field. These models can understand and generate human-like text, enabling applications like chatbots and document summarization. However, to fully utilize their capabilities, they need to be fine-tuned for specific use cases. Ludwig, a low-code framework, is designed for creating custom AI models, including LLMs and deep neural networks. This article provides a comprehensive guide to fine-tuning LLMs using Ludwig, focusing on creating state-of-the-art models for real-world scenarios.


## Understanding Ludwig: A Low Code Framework For LLM Fine Tuning

Ludwig, known for its user-friendly, low-code approach, supports a wide array of machine learning (ML) and deep learning applications. This flexibility makes it an ideal choice for developers and researchers aiming to build custom AI models without deep programming requirements. Ludwig’s capabilities include but are not limited to training, fine-tuning, hyperparameter optimization, model visualization, and deployment.  

Key Features of Ludwig
* Training and Fine-Tuning: Ludwig supports a range of training paradigms, including full training and fine-tuning of pre-trained models.
* Model Configuration: Utilizing YAML files for configuration, Ludwig allows detailed specification of model parameters, making it highly customizable and flexible.
* Hyperparameter Tuning: Ludwig integrates tools for automatic hyperparameter optimization, enhancing model performance.
* Explainable AI: Tools within Ludwig provide insights into model decisions, promoting transparency.
* Model Serving and Benchmarking: Ludwig makes it easy to serve models and benchmark their performance under different conditions.

As introduced earlier, Ludwig is a low-code framework for building custom AI models, like Large Language Models and other Deep neural networks. Technically, Ludwig can be used for training and finetuning any Neural Network and support wide range of Machine Learning and Deep Learning use-cases. Ludwig also has support for visualizations, hyperparameter tuning, explainable AI, model benchmarking as well as model serving.

It utilizes yaml file where all the configurations are to be specified like, model name, type of task to be performed, number of epochs to run in case of finetuning, hyperparameter for training and finetuning, quantization configurations etc. Ludwig supports wide range of LLM focused tasks like Zero-shot batch inference, RAG, Adapter-based finetuning for text generation, instruction tuning etc. In this article, we will fine-tune Mistral 7B model to follow human instructions. We will also explore how to define a yaml configuration for Ludwig.

It’s critical to understand the prerequisites and the setup required:

* **Environment Setup**: Installing the necessary software and packages.
* **Data Preparation**: Selecting and preprocessing the appropriate datasets.
* **YAML Configuration**: Defining model parameters and training options in a YAML file.
* **Model Training and Evaluation**: Executing the fine-tuning and assessing model performance.

### Step1: Install Necessary Packages
Execute if you get the Transformers version runtime error.

In [11]:
!pip install transformers datasets peft




In [14]:
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertForMaskedLM
from peft import get_peft_model, LoraConfig, TaskType

# Step 1: Load the pre-trained model and tokenizer
model_name = "bert-base-uncased"
model = BertForMaskedLM.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

# Step 2: Apply LoRA using PEFT
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

model = get_peft_model(model, lora_config)

# Step 3: Prepare the dataset
texts = ["Hello, how are you?", "I am doing well."]
encodings = tokenizer(texts, truncation=True, padding="max_length", return_tensors="pt", max_length=16)
input_ids = encodings["input_ids"]
attention_mask = encodings["attention_mask"]
labels = input_ids.clone()

# Step 4: Fine-tuning the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
input_ids = input_ids.to(device)
attention_mask = attention_mask.to(device)
labels = labels.to(device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
loss_fn = nn.CrossEntropyLoss()

model.train()
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
    loss = outputs.loss
    print(f"Epoch {epoch + 1}, Loss: {loss.item():.4f}")
    loss.backward()
    optimizer.step()

# Step 5: Inference with the fine-tuned model
model.eval()
test_text = "How are you doing today?"
test_inputs = tokenizer(test_text, return_tensors="pt", padding="max_length", truncation=True, max_length=16).to(device)
output = model(**test_inputs)
predicted_ids = torch.argmax(output.logits, dim=-1)
predicted_text = tokenizer.decode(predicted_ids[0], skip_special_tokens=True)
print("Predicted text:", predicted_text)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch 1, Loss: 12.5767
Epoch 2, Loss: 12.4362
Epoch 3, Loss: 12.8453
Epoch 4, Loss: 12.5934
Epoch 5, Loss: 12.8459
Predicted text: . how are you doing today?. hey???? hey hey you


# ========================================================

In [15]:
# Étape 1 : Installer les dépendances
!pip install transformers datasets accelerate peft



In [17]:
from datasets import load_dataset
import torch
from transformers import BertTokenizer, BertForMaskedLM
from peft import get_peft_model, LoraConfig, TaskType
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
import torch.nn as nn
import torch.optim as optim

In [18]:
# 1. Load dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")  # Tiny subset for speed
texts = dataset["text"]
texts = [t for t in texts if len(t.strip()) > 0][:100]  # Filter and trim

README.md: 0.00B [00:00, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/733k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/6.36M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/657k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/4358 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/36718 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3760 [00:00<?, ? examples/s]

In [19]:
# 2. Tokenization
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
tokenized = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=64)


In [20]:
# 3. Model + LoRA
model = BertForMaskedLM.from_pretrained(model_name)
lora_config = LoraConfig(task_type=TaskType.FEATURE_EXTRACTION, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, lora_config)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [21]:
# 4. Dataloader preparation
input_ids = tokenized["input_ids"]
attention_mask = tokenized["attention_mask"]
labels = input_ids.clone()

In [22]:
batch_size = 8
dataset = list(zip(input_ids, attention_mask, labels))

def collate(batch):
    input_ids, masks, labels = zip(*batch)
    return {
        "input_ids": pad_sequence(input_ids, batch_first=True, padding_value=tokenizer.pad_token_id),
        "attention_mask": pad_sequence(masks, batch_first=True, padding_value=0),
        "labels": pad_sequence(labels, batch_first=True, padding_value=-100),
    }

loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, collate_fn=collate)


In [23]:
# 5. Training loop (short for demonstration)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.train()
optimizer = optim.AdamW(model.parameters(), lr=1e-4)

for epoch in range(3):
    total_loss = 0
    for batch in loader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {total_loss / len(loader):.4f}")

Epoch 1, Loss: 7.3955
Epoch 2, Loss: 6.8013
Epoch 3, Loss: 6.2048


In [25]:
# 6. Inference: prédiction sur une phrase contenant un token [MASK]
model.eval()

test_sentence = "The capital of France is [MASK]."
inputs = tokenizer(test_sentence, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Trouver la position du token [MASK]
mask_token_index = (inputs["input_ids"] == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

# Obtenir le mot prédit
predicted_token_id = logits[0, mask_token_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print("Predicted masked word:", predicted_token)


Predicted masked word: paris


### Évaluer la perplexité (sur un dataset de test)

In [31]:
import math
from transformers import DataCollatorForLanguageModeling

# Charger le dataset test (exemple)
test_dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="validation[:1%]")
test_dataset = test_dataset.filter(lambda x: len(x["text"].strip()) > 0)

def tokenize_fn(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=64)


test_tokenized = test_dataset.map(tokenize_fn, batched=True, remove_columns=["text"])

# Préparer le data collator MLM (mêmes paramètres)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)

# Préparer le DataLoader
from torch.utils.data import DataLoader
test_loader = DataLoader(test_tokenized, batch_size=20, collate_fn=data_collator)

model.eval()
total_loss = 0
total_tokens = 0

with torch.no_grad():
    for batch in test_loader:
        batch = {k: v.to(model.device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        batch_size = batch["input_ids"].size(0)
        seq_len = batch["input_ids"].size(1)
        total_loss += loss.item() * batch_size * seq_len
        total_tokens += batch_size * seq_len

perplexity = math.exp(total_loss / total_tokens)
print(f"Perplexity: {perplexity:.2f}")


Map:   0%|          | 0/26 [00:00<?, ? examples/s]

Perplexity: 6.14


###  Publier le modèle sur Hugging Face Hub

In [33]:
from huggingface_hub import notebook_login

# Se connecter (ouvre une fenêtre pour entrer ton token HF)
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [34]:
# Publier le modèle et le tokenizer
model.push_to_hub("Baldezo313/bert-lora-wikitext2")
tokenizer.push_to_hub("Baldezo313/bert-lora-wikitext2")

adapter_model.safetensors:   0%|          | 0.00/1.19M [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Baldezo313/bert-lora-wikitext2/commit/270407b181e8e7ff0cfbd53ec6774d0077fcafde', commit_message='Upload tokenizer', commit_description='', oid='270407b181e8e7ff0cfbd53ec6774d0077fcafde', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Baldezo313/bert-lora-wikitext2', endpoint='https://huggingface.co', repo_type='model', repo_id='Baldezo313/bert-lora-wikitext2'), pr_revision=None, pr_num=None)