This file deals with fine tuning (supervised fine tuning) a given LLM model using the RAFT methodology(retrieval augmented fine tuning)

Install Required Libraries

Transformers is a state of art library for working with models like GPT-2,BERT etc.Peft stands for parameter efficient fine tuning it allows you to fine tune only a small fraction of model's parameter.bitsandbytes provides powerful quantization techniques,Torch is a core pytorch library one of the most popular,PyTorch is the foundation upon which all the other libraries are built

In [None]:
# Cell 1: Installations
!pip install -q transformers peft bitsandbytes accelerate datasets torch

Generate the RAFT-like Dataset ,For a given dataset we have 80 percent of this consisting of distractor with golden document and the rest 20 percent consists of only distractor document.


In [None]:
# Cell 2: Data Generation and Inspection
import json
import random
from datasets import Dataset

def generate_raft_dataset(num_examples: int = 100, p_golden_fraction: float = 0.8):
    """
    Generates a synthetic dataset mimicking the RAFT structure with golden and distractor documents.
    """
    dataset = []

    # Define some reusable components
    questions = [
        "What is the capital of France?",
        "Who discovered penicillin?",
        "When did World War II end?",
        "What is the chemical symbol for water?",
        "Which planet is known as the Red Planet?"
    ]

    golden_docs_data = {
        "What is the capital of France?": {
            "text": "Paris is the capital and most populous city of France.",
            "answer": "Paris",
            "reason": "The document explicitly states 'Paris is the capital... of France'."
        },
        "Who discovered penicillin?": {
            "text": "Alexander Fleming, a Scottish physician, discovered penicillin in 1928.",
            "answer": "Alexander Fleming",
            "reason": "The document clearly states 'Alexander Fleming... discovered penicillin'."
        },
        "When did World War II end?": {
            "text": "World War II officially ended with the formal surrender of Japan on September 2, 1945.",
            "answer": "September 2, 1945",
            "reason": "The document specifies 'World War II officially ended... on September 2, 1945'."
        },
        "What is the chemical symbol for water?": {
            "text": "The chemical symbol for water is Hâ‚‚O.",
            "answer": "Hâ‚‚O",
            "reason": "The document directly provides 'The chemical symbol for water is Hâ‚‚O'."
        },
        "Which planet is known as the Red Planet?": {
            "text": "Mars is often referred to as the Red Planet due to its reddish appearance.",
            "answer": "Mars",
            "reason": "The document states 'Mars is often referred to as the Red Planet'."
        }
    }

    distractor_docs_pool = [
        "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris.",
        "Antibiotics are medicines that fight infections caused by bacteria.",
        "The Cold War was a period of geopolitical tension between the United States and the Soviet Union.",
        "Oxygen is a chemical element with the symbol O and atomic number 8.",
        "Jupiter is the largest planet in our solar system, known for its Great Red Spot."
    ]

    instruction = "Instruction: Given the question and context, provide a logical reasoning and the final answer. Use the format: ##Reason: {reason}\n##Answer: {answer}."
   #This starts a loop that will run num_examples times
   #At each iteration a random question is selected form golden_docs_Data
    for i in range(num_examples):
        q = random.choice(list(golden_docs_data.keys()))
        golden_data = golden_docs_data[q]
        golden_doc = golden_data["text"]
        correct_answer = golden_data["answer"]
        cot_reason = golden_data["reason"]

        current_distractors = random.sample(distractor_docs_pool, k=min(2, len(distractor_docs_pool)))
        context_docs = current_distractors[:]

        # Decide whether to include the golden document based on P
        use_golden = random.random() < p_golden_fraction

        if use_golden:
            context_docs.append(golden_doc)
            random.shuffle(context_docs)

        context_str = "\n".join(f"[Document {j+1}: {doc}]" for j, doc in enumerate(context_docs))

        input_text = f"Question: {q}\nContext: {context_str}\n{instruction}"
        output_text = f"##Reason: {cot_reason}\n##Answer: {correct_answer}"

        dataset.append({"input": input_text, "output": output_text})

    return dataset

# Generate the dataset
print("Generating RAFT-like dataset...")
raft_training_data = generate_raft_dataset(num_examples=200, p_golden_fraction=0.8)
print(f"Generated {len(raft_training_data)} training examples.")

# Convert to Hugging Face Dataset object
hf_dataset = Dataset.from_list(raft_training_data)

print("\n--- Sample RAFT Training Example (P=80% case, golden doc likely present) ---")
print(hf_dataset[0]['input'])
print(f"\nEXPECTED OUTPUT:\n{hf_dataset[0]['output']}\n")

Load Model and Tokenizer

We are using 4 bit quantization to make the model smaller in memoery and LoRA(low rank adapation) to reduce the number of parameters that need to be trained

AutoTokenizer automatically download and configure tokenzier for a given model


In [None]:
# Cell 3: Load Model, Tokenizer, and Configure LoRA (Corrected Version)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Using a small, accessible Llama-like model for this demo
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print(f"Loading base model and tokenizer: {MODEL_NAME}")


# 1. Create the BitsAndBytesConfig object for 4-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True
)
#Download and loading the tokenizer that was trained with tinyLlama model
#tokenizer responsible for converting text to numbers(tokens)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token # Set padding token

# 2. Pass the new config object to the from_pretrained method
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto", # Automatically map model layers to available devices (GPU/CPU)
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config, # Pass the correct config object here
)

# Prepare model for LoRA training(low rank adaptation)
model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)

print("\nModel prepared for LoRA fine-tuning:")
model.print_trainable_parameters()

 Tokenize Dataset for Training

This code prepares the text based dataset for the model by converting the text into numbers(tokens) ,We create a single text string by concatenating the input and output text,Am .eos or end of sentence token is added at very end to signal that the sequence is compelete

In the CELL 2 i created my raft training dataset then this dataset is converted to standard format (dataset object) then the data is tokenized and prepared for the model creating a new dataset called tokenized dataset(done in cell 4) finally this tokenized dataset is passed to the trainer via train_dataset argument

In [None]:
# Cell 4: Tokenize the Dataset (Corrected Version)

def tokenize_function(examples):
    # Combine input and output for Causal LM training
    full_text = [f"{inp}{out}{tokenizer.eos_token}" for inp, out in zip(examples["input"], examples["output"])]

    # Tokenize the full text
    tokenized_inputs = tokenizer(
        full_text,
        truncation=True,
        max_length=512,
        padding="max_length"
    )

    # --- THIS IS THE FIX ---
    # Create a 'labels' column for the loss calculation by copying the input_ids
    tokenized_inputs["labels"] = tokenized_inputs["input_ids"].copy()

    return tokenized_inputs

print("Tokenizing dataset...")
# Rerun the mapping with the updated function
tokenized_dataset = hf_dataset.map(tokenize_function, batched=True, remove_columns=["input", "output"])
print("Dataset tokenized.")
print(f"Sample of tokenized data now includes a 'labels' key: {tokenized_dataset.column_names}")

 Run the Fine-Tuning Job

output_dir is where all training outputs,like model checkpoints and logs will be saved.


In [None]:
from transformers import TrainingArguments, Trainer

# Configure the training parameters
training_args = TrainingArguments(
    output_dir="./sft_results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    logging_steps=10,
    optim="paged_adamw_8bit",      # Memory-efficient optimizer
    fp16=True,                     # Enable mixed-precision training
    report_to="none",              # This is the line you need to add
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
)

# Start the fine-tuning process
print("\nStarting local SFT training... ðŸš€")
trainer.train()
print("Fine-tuning complete!")

# Save the resulting LoRA adapter
trainer.save_model("./fine_tuned_raft_adapter")
print("Fine-tuned LoRA adapter saved to './fine_tuned_raft_adapter'")

Cell 1-installs the required python libraries for dataprocessing and fine tuning

Cell 2-Generates a custom ,RAFT style dataset of questions and answers to be used for training

Cell 3-Loads the base language model,quantizes it to save the memory and prepares it for efficient LoRA fine tuning

Cell 4-Tokenizes the custom dataset which i created using the ideology of RAFT,tokenization converts the text into numerical format that model can process

Cell 5-Excecutes the model fine tuning process usign the preapred data and saves the resulting lightweight LoRA adapter

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# --- 1. Define Model and Adapter Paths ---
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "./fine_tuned_raft_adapter"

# --- 2. Load the Tokenizer and Quantized Base Model ---
print("Loading base model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
)

# --- 3. Load the Fine-Tuned LoRA Adapter ---
print(f"Loading LoRA adapter from: {adapter_path}")
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.eval() # Set the model to evaluation mode

print("Fine-tuned model loaded successfully! âœ…")


# --- 4. Create a Test Prompt in the Correct Format ---
test_question = "Who painted the Mona Lisa?"
test_context = """
[Document 1: The Mona Lisa is a half-length portrait painting by Italian artist Leonardo da Vinci.]
[Document 2: It is considered an archetypal masterpiece of the Italian Renaissance.]
"""
instruction = "Instruction: Given the question and context, provide a logical reasoning and the final answer. Use the format: ##Reason: {reason}\n##Answer: {answer}."

prompt = f"Question: {test_question}\nContext: {test_context}\n{instruction}"

print("\n--- Test Prompt ---")
print(prompt)


# --- 5. Generate and Print the Response ---
print("\n--- Generating Response from Fine-Tuned Model ---")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)

# Decode the output, skipping the prompt part to show only the new generation
response_ids = outputs[0][inputs.input_ids.shape[1]:]
response = tokenizer.decode(response_ids, skip_special_tokens=True)

print(response)

The above code loads newly fine tuned LoRA adapter ,merges with base model and then use this combined model to answer a new question.

It loads the original TinyLlama model,it then loads the small fine_tuned_raft_adapter ,The peft Model takes the model and merges with the small adapter weights on top of it

Model.eval() switches model to evaluation mode

When a user enters a prompt using the same format as of RAFT,the text prompt is converted by tokenizer into numerical token that model can understand