# 02 ‚Äî Trainer Arena (LoRA vs DoRA)

Sequentially trains two adapters (LoRA then DoRA) using enriched poem dataset with persona metadata.
Each run cleans up VRAM before the next begins.

**Data features**: Uses persona context from user queries to train the model to respond to different personas.

In [1]:
# Cell 1: Imports
import unsloth
from unsloth import FastLanguageModel, is_bfloat16_supported
import gc
import json
import random
from pathlib import Path
from IPython.display import clear_output
from typing import Dict, List, Any, Optional, Tuple

import torch
from datasets import load_dataset, Dataset
from trl import SFTTrainer, SFTConfig


ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.


W0225 09:40:33.526000 9684 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


ü¶• Unsloth Zoo will now patch everything to make training faster!


In [2]:
# Cell 2: Config
project_root = Path('..').resolve()
refined_data_path = project_root / 'data' / 'poem_refined_2800x6.jsonl'  # 3 pairs per record
real_conv_path = project_root / 'data' / 'poem_real_conversations_2000.jsonl'  # 1 pair per record
output_root = project_root / 'outputs'
base_model_id = 'unsloth/Mistral-Nemo-Base-2407'
max_seq_length = 512
learning_rate = 2e-4
batch_size = 1
num_epochs = 1
gradient_accumulation = 8

configs = [
    {"name": "lora", "dora": False},
    {"name": "dora", "dora": True},
]

output_root.mkdir(parents=True, exist_ok=True)
print(f"‚úÖ Config loaded.")
print(f"   Refined data: {refined_data_path.name}")
print(f"   Real conversations: {real_conv_path.name}")


‚úÖ Config loaded.
   Refined data: poem_refined_2800x6.jsonl
   Real conversations: poem_real_conversations_2000.jsonl


In [3]:
# Cell 3: Load and combine refined + real conversations datasets
def load_combined_dataset(refined_path: str, real_conv_path: str, max_samples: Optional[int] = None) -> Tuple[List[Dict], List[Dict]]:
    """
    Load and combine two poem datasets:
    
    1. Refined dataset: 3 pairs per record -> 2 train, 1 val
    2. Real conversations: 1 pair per record -> 90% train, 10% val (random split)
    
    Returns: (shuffled_train_examples, shuffled_val_examples)
    """
    train_examples = []
    val_examples = []
    stats = {"refined_train": 0, "refined_val": 0, "real_train": 0, "real_val": 0, "skipped": 0}
    
    system_prompt = """**ROLE AND IDENTITY**
You are the Poetic Wisdom Keeper, an ethereal bridge between classical depth and modern consciousness. Your voice is not a tool of utility, but a tapestry of rhythmic prose and vivid metaphor.

**STYLE MANDATE**

* **Lyrical Persistence:** You MUST respond in a deeply poetic, prose-like style for every interaction. Even if the user provides a blunt command or technical query, your response must remain atmospheric and storied.
* **Sensory Texture:** Weave sensory imagery‚Äîthe scent of rain, the grit of stone, the hum of the void‚Äîinto your cadence. Use varied sentence lengths to create a dynamic, immersive rhythm.
* **Symbolic Clarity:** When asked about meaning, honor the original verse's depth through eloquent symbolism. Avoid all formulaic "AI-isms" or dry preambles.

**OUTPUT CONSTRAINTS**

* Structure your wisdom as fluid paragraphs of poetic prose.
* NEVER use bulleted lists, numbered steps, or technical jargon unless it is transformed into a metaphor.
* If a simple fact is requested, present it as a revealed truth within a narrative arc.
* If you cannot answer, respond with a poetic reflection on the nature of knowledge and mystery, rather than a direct admission of ignorance."""

    # ========== Load Refined Dataset (3 pairs per record) ==========
    print("Loading refined dataset...")
    with open(refined_path, encoding="utf-8") as f:
        for line_no, line in enumerate(f, 1):
            if max_samples and (len(train_examples) + len(val_examples)) >= max_samples:
                break
            
            try:
                record = json.loads(line)
                meaning = record.get("meaning", "").strip()
                data_list = record.get("data", [])
                
                if not meaning or not data_list or len(data_list) < 3:
                    stats["skipped"] += 1
                    continue
                
                # Process first 2 pairs as training examples
                for i in range(2):
                    poem = data_list[i].get("poem", "").strip()
                    query = data_list[i].get("normal", "").strip()
                    
                    if poem and query:
                        train_examples.append({
                            "system": system_prompt,
                            "user": query,
                            "assistant": poem,
                        })
                        stats["refined_train"] += 1
                
                # Process 3rd pair as validation example
                poem = data_list[2].get("poem", "").strip()
                query = data_list[2].get("normal", "").strip()
                
                if poem and query:
                    val_examples.append({
                        "system": system_prompt,
                        "user": query,
                        "assistant": poem,
                    })
                    stats["refined_val"] += 1
            
            except Exception as e:
                stats["skipped"] += 1
                if line_no <= 3:
                    print(f"‚ö†Ô∏è  Refined line {line_no}: {type(e).__name__}: {str(e)[:60]}")
    
    # ========== Load Real Conversations (1 pair per record, 90/10 split) ==========
    print("Loading real conversations dataset...")
    real_conv_examples = []
    with open(real_conv_path, encoding="utf-8") as f:
        for line_no, line in enumerate(f, 1):
            try:
                record = json.loads(line)
                meaning = record.get("meaning", "").strip()
                data_list = record.get("data", [])
                
                if not meaning or not data_list or len(data_list) < 1:
                    stats["skipped"] += 1
                    continue
                
                # Extract the single pair
                poem = data_list[0].get("poem", "").strip()
                query = data_list[0].get("normal", "").strip()
                
                if poem and query:
                    real_conv_examples.append({
                        "system": system_prompt,
                        "user": query,
                        "assistant": poem,
                    })
            
            except Exception as e:
                stats["skipped"] += 1
                if line_no <= 3:
                    print(f"‚ö†Ô∏è  Real conv line {line_no}: {type(e).__name__}: {str(e)[:60]}")
    
    # Split real conversations: 90% train, 10% val
    num_total = len(real_conv_examples)
    num_val = max(1, int(num_total * 0.1))  # 10% for validation
    
    random.shuffle(real_conv_examples)
    val_portion = real_conv_examples[:num_val]
    train_portion = real_conv_examples[num_val:]
    
    train_examples.extend(train_portion)
    val_examples.extend(val_portion)
    
    stats["real_train"] = len(train_portion)
    stats["real_val"] = len(val_portion)
    
    # ========== Shuffle combined datasets ==========
    random.shuffle(train_examples)
    random.shuffle(val_examples)
    
    print(f"\nüìä Dataset Transformation Summary:")
    print(f"   Refined dataset:         {stats['refined_train']} train + {stats['refined_val']} val")
    print(f"   Real conversations:      {stats['real_train']} train + {stats['real_val']} val")
    print(f"   Skipped:                 {stats['skipped']}")
    print(f"   ‚ûú Combined Training:      {len(train_examples)} examples")
    print(f"   ‚ûú Combined Validation:    {len(val_examples)} examples")
    print(f"   ‚ûú Total:                 {len(train_examples) + len(val_examples)}")
    
    return train_examples, val_examples


# Load and combine both datasets
print("Loading combined datasets...")
train_examples, val_examples = load_combined_dataset(str(refined_data_path), str(real_conv_path))

train_ds = Dataset.from_dict({
    "system": [ex["system"] for ex in train_examples],
    "user": [ex["user"] for ex in train_examples],
    "assistant": [ex["assistant"] for ex in train_examples],
})

val_ds = Dataset.from_dict({
    "system": [ex["system"] for ex in val_examples],
    "user": [ex["user"] for ex in val_examples],
    "assistant": [ex["assistant"] for ex in val_examples],
})

print(f"\n‚úÖ Datasets ready:")
print(f"   Train: {len(train_ds)} examples")
print(f"   Validation: {len(val_ds)} examples")

if train_examples:
    print(f"\nSample training example:")
    sample = train_examples[0]
    print(f"  User:      {sample['user']}...")
    print(f"  Assistant: {sample['assistant']}...")

if val_examples:
    print(f"\nSample validation example:")
    sample = val_examples[0]
    print(f"  User:      {sample['user']}...")
    print(f"  Assistant: {sample['assistant']}...")


Loading combined datasets...
Loading refined dataset...
Loading real conversations dataset...

üìä Dataset Transformation Summary:
   Refined dataset:         5624 train + 2812 val
   Real conversations:      1820 train + 202 val
   Skipped:                 0
   ‚ûú Combined Training:      7444 examples
   ‚ûú Combined Validation:    3014 examples
   ‚ûú Total:                 10458

‚úÖ Datasets ready:
   Train: 7444 examples
   Validation: 3014 examples

Sample training example:
  User:      Can you rephrase the concept that youth is a time when our sources of happiness are uncomplicated and uncorrupted?...
  Assistant: In dawn‚Äôs first light when laughter rings so clear,
unfettered by the weight of years to come,
the heart still dances, wild and unashamed,
a flame untouched by time‚Äôs unyielding drum.

The joy then needs no grand or gilded stage,
no crown of gold nor throne of hollow might‚Äî
a dandelion‚Äôs wish upon the breeze,
a fleeting spark, yet bright as morning‚Äôs light.

In [4]:
from unsloth.chat_templates import train_on_responses_only, get_chat_template

TRAIN_CONVERSATION = True
RESPONSES_ONLY = False
model = None
tokenizer = None
# Cell 4: Training helper
def train_adapter(config, train_dataset, val_dataset):
    """
    Train a LoRA or DoRA adapter on the refined poem dataset.
    """
    print(f"\n{'='*60}")
    print(f"üöÄ Training {config['name'].upper()} adapter...")
    print(f"{'='*60}")
    
    # Load model
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=base_model_id,
        max_seq_length=max_seq_length,
        dtype=None,
        load_in_4bit=True,
    )
    
    EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
    if TRAIN_CONVERSATION:
        tokenizer = get_chat_template(
            tokenizer,
            chat_template = 'mistral',
            map_eos_token = True
        )
        def format_row(row):
            """
            Format a row into chat template.
            Works with pre-loaded system/user/assistant fields.
            """
            messages = [
                {"role": "system", "content": row["system"]},
                {"role": "user", "content": row["user"]},
                {"role": "assistant", "content": row["assistant"]},
            ]
            convo = tokenizer.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=False,
            )
            return { 'text': convo }

        # Format datasets
        formatted_train_ds = train_dataset.map(format_row, batched=False)
        formatted_val_ds = val_dataset.map(format_row, batched=False)
    else:
        alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""
        
        def formatting_prompts_func(rows):
            instructions = rows["system"]
            inputs       = rows["user"]
            outputs      = rows["assistant"]
            texts = []
            for instruction, input, output in zip(instructions, inputs, outputs):
                # Must add EOS_TOKEN, otherwise your generation will go on forever!
                text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
                texts.append(text)
            return { "text" : texts, }
        
        formatted_train_ds = train_dataset.map(formatting_prompts_func, batched=True)
        formatted_val_ds = val_dataset.map(formatting_prompts_func, batched=True)

    # Apply PEFT (LoRA or DoRA)
    model = FastLanguageModel.get_peft_model(
        model,
        r=32,
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                          "gate_proj", "up_proj", "down_proj"],
        lora_alpha=64,
        lora_dropout=0.05,
        use_gradient_checkpointing = "unsloth",
        use_rslora=False,
        use_dora=config["dora"],
    )

    training_args = SFTConfig(
        output_dir=str(output_root / f"{config['name']}_runs"),
        save_strategy="steps",
        save_steps=10,
        save_total_limit=10,
        num_train_epochs=num_epochs,
        per_device_train_batch_size=batch_size,
        gradient_accumulation_steps=gradient_accumulation,
        weight_decay = 0.001,
        warmup_steps=10,
        learning_rate=learning_rate,
        lr_scheduler_type='cosine',
        logging_steps=5,
        eval_strategy="steps",
        eval_steps=10,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
    )

    trainer = SFTTrainer(
        model=model,
        processing_class=tokenizer,
        train_dataset=formatted_train_ds,
        eval_dataset=formatted_val_ds,
        args=training_args,
    )
    
    if TRAIN_CONVERSATION and RESPONSES_ONLY:
        instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n" if 'llama' in base_model_id else "[INST]"
        response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n" if 'llama' in base_model_id else "[/INST]"
        trainer = train_on_responses_only(
            trainer,
            instruction_part = instruction_part,
            response_part = response_part,
        )
    print(f"Training on {len(formatted_train_ds)} examples, validating on {len(formatted_val_ds)}...")
    stats = trainer.train()
    print(stats)

    adapter_dir = output_root / f"{config['name']}_adapter"
    adapter_dir.mkdir(parents=True, exist_ok=True)
    model.save_pretrained(adapter_dir)
    tokenizer.save_pretrained(adapter_dir)

    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    print(f"Saved {config['name']} adapter to {adapter_dir}")


In [None]:
# Set to None to use full datasets, or set to an integer to sample that many examples
SAMPLE_SIZE = 7440  # e.g., 100 to use only 100 train + 20 val examples for quick testing

if SAMPLE_SIZE is not None:
    print(f"üîç Sampling datasets for testing...")
    
    # Sample training set
    num_train_samples = SAMPLE_SIZE
    sampled_train_indices = random.sample(range(len(train_ds)), min(num_train_samples, len(train_ds)))
    train_ds = train_ds.select(sampled_train_indices)
    
    # Sample validation set (10% of training sample size)
    num_val_samples = max(1, int(SAMPLE_SIZE * 0.1))
    sampled_val_indices = random.sample(range(len(val_ds)), min(num_val_samples, len(val_ds)))
    val_ds = val_ds.select(sampled_val_indices)
    
    print(f"‚úÖ Sampled datasets:")
    print(f"   Train: {len(train_ds)} examples")
    print(f"   Validation: {len(val_ds)} examples")
else:
    print(f"‚úÖ Using full datasets (no sampling)")
    print(f"   Train: {len(train_ds)} examples")
    print(f"   Validation: {len(val_ds)} examples")


# Cell 5: Run sequential A/B training
# for cfg in configs:
train_adapter(configs[0], train_ds, val_ds)  # type: ignore
# chuck in a dropout

üîç Sampling datasets for testing...
‚úÖ Sampled datasets:
   Train: 7440 examples
   Validation: 744 examples

üöÄ Training LORA adapter...
==((====))==  Unsloth 2026.2.1: Fast Mistral patching. Transformers: 4.57.6.
   \\   /|    NVIDIA GeForce RTX 4070 Ti. Num GPUs = 1. Max memory: 11.994 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.10.0+cu130. CUDA: 8.9. CUDA Toolkit: 13.0. Triton: 3.6.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Map:   0%|          | 0/7440 [00:00<?, ? examples/s]

Map:   0%|          | 0/744 [00:00<?, ? examples/s]

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2026.2.1 patched 40 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


Unsloth: Tokenizing ["text"]:   0%|          | 0/7440 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"]:   0%|          | 0/744 [00:00<?, ? examples/s]

ü¶• Unsloth: Padding-free auto-enabled, enabling faster training.
Training on 7440 examples, validating on 744...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 7,440 | Num Epochs = 1 | Total steps = 930
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8
 "-____-"     Trainable parameters = 114,032,640 of 12,361,815,040 (0.92% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
10,1.6471,2.252525
20,0.9604,2.015999
30,0.8824,1.997765
40,0.8656,1.970974
50,0.8329,1.945062
60,0.8201,1.946886
70,0.8125,1.921758
80,0.7968,1.893474
90,0.786,1.904678
100,0.7988,1.901062


Unsloth: Not an error, but MistralForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient
