# Italian Exercise Generator V3 - Balanced Training

**Model**: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA (Italian-specialized)
**Dataset**: 3,983 examples (19,535 exercises)
**Method**: LoRA fine-tuning with 4-bit quantization
**GPU**: A100 (40GB)
**Time**: ~20-30 minutes (1 epoch)

## V3 Balanced Approach (Best of V1 + V2)
- ✅ **LoRA rank: 12** (sweet spot between 16 and 8)
- ✅ **LoRA alpha: 24** (maintains 2x scaling)
- ✅ **LoRA dropout: 0.12** (balanced regularization)
- ✅ **Epochs: 1** (like V2, prevents overfitting)
- ✅ **Learning rate: 1.5e-4** (between V1's 2e-4 and V2's 1e-4)
- ✅ **Weight decay: 0.02** (moderate L2 regularization)
- ✅ **Max length: 2048** (prevents B2/C2 truncation)

## Expected Results
- **JSON Validity**: >99% (like V1)
- **Overfitting**: <1% (like V2)
- **B2/C2 Performance**: Improved (longer context)

## Quick Start
1. Runtime → Change runtime type → A100 GPU
2. Run all cells
3. Model auto-saves to `/models/italian_exercise_generator_lora_v3`
4. Compare with v1/v2 using evaluation notebook

## 1. Setup

In [1]:
# Memory optimization
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

# Check GPU
!nvidia-smi --query-gpu=name,memory.total --format=csv

name, memory.total [MiB]
NVIDIA A100-SXM4-40GB, 40960 MiB


In [2]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

%cd /content/drive/MyDrive/Colab\ Notebooks/italian_teacher

Mounted at /content/drive
/content/drive/MyDrive/Colab Notebooks/italian_teacher


In [3]:
# Install dependencies (quiet mode)
!pip install -q transformers>=4.36.0 datasets accelerate peft bitsandbytes wandb sentencepiece protobuf

## 2. Verify Dataset

In [4]:
import json
from pathlib import Path

data_dir = Path("data/datasets/final")

print("📊 Dataset Verification:")
print("-" * 60)
total = 0
for split in ["train", "validation", "test"]:
    file_path = data_dir / f"{split}.jsonl"
    if file_path.exists():
        with open(file_path, 'r') as f:
            count = sum(1 for _ in f)
        total += count
        size_mb = file_path.stat().st_size / (1024 * 1024)
        print(f"✅ {split:12} {count:4} examples ({size_mb:.1f} MB)")
    else:
        print(f"❌ {split:12} NOT FOUND")
        raise FileNotFoundError(f"Dataset file missing: {file_path}")

print("-" * 60)
print(f"📈 Total: {total} examples")

📊 Dataset Verification:
------------------------------------------------------------
✅ train        3186 examples (5.5 MB)
✅ validation    394 examples (0.7 MB)
✅ test          403 examples (0.7 MB)
------------------------------------------------------------
📈 Total: 3983 examples


## 3. Configure & Train V2

**V2 Configuration (Auto-optimized for A100):**
- LoRA rank: 8 (was 16)
- LoRA alpha: 16 (was 32)
- LoRA dropout: 0.15 (was 0.1)
- Epochs: 1 (was 2)
- Learning rate: 1e-4 (was 2e-4)
- Weight decay: 0.05 (was 0.01)
- Max length: 1536 (was 1024)
- Batch size: 4
- Gradient accumulation: 4 (effective batch: 16)
- Double quantization: Enabled
- BFloat16 compute: Enabled

In [5]:
import sys
sys.path.append('src/fine_tuning')

from config_exercise_generation import get_exercise_generation_config, adjust_config_for_gpu
from lora_trainer import MarcoLoRATrainer
import torch

# Get V2 config (with optimizations already applied)
config = get_exercise_generation_config()

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"🎮 GPU: {gpu_name}")

    # Display GPU memory
    total_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"💾 Total VRAM: {total_mem:.1f} GB")
else:
    raise RuntimeError("No GPU detected! Select A100 in Runtime settings.")

# Optional: Disable W&B if not needed
# config.experiment.use_wandb = False

print(f"\n📝 V3 Balanced Config (Balanced: Best of V1 + V2):")
print(f"   Model: {config.training.model_name.split('/')[-1]}")
print(f"   Epochs: {config.training.num_train_epochs}")
print(f"   Batch size: {config.training.per_device_train_batch_size}")
print(f"   Effective batch: {config.training.per_device_train_batch_size * config.training.gradient_accumulation_steps}")
print(f"   Learning rate: {config.training.learning_rate}")
print(f"   Weight decay: {config.training.weight_decay}")
print(f"   LoRA rank: {config.lora.r}")
print(f"   LoRA alpha: {config.lora.lora_alpha}")
print(f"   LoRA dropout: {config.lora.lora_dropout}")
print(f"   Max length: {config.data.max_length}")
print(f"   Output: {config.training.output_dir}")

print(f"\n🔄 Changes from V1:")
print(f"   ✅ LoRA rank: 16 → {config.lora.r}")
print(f"   ✅ LoRA dropout: 0.1 → {config.lora.lora_dropout}")
print(f"   ✅ Epochs: 2 → {config.training.num_train_epochs}")
print(f"   ✅ Learning rate: 2e-4 → {config.training.learning_rate}")
print(f"   ✅ Weight decay: 0.01 → {config.training.weight_decay}")
print(f"   ✅ Max length: 1024 → {config.data.max_length}")

🎮 GPU: NVIDIA A100-SXM4-40GB
💾 Total VRAM: 42.5 GB

📝 V3 Balanced Config (Balanced: Best of V1 + V2):
   Model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
   Epochs: 1
   Batch size: 4
   Effective batch: 16
   Learning rate: 0.00015
   Weight decay: 0.02
   LoRA rank: 12
   LoRA alpha: 24
   LoRA dropout: 0.12
   Max length: 2048
   Output: ./models/italian_exercise_generator_lora_v3

🔄 Changes from V1:
   ✅ LoRA rank: 16 → 12
   ✅ LoRA dropout: 0.1 → 0.12
   ✅ Epochs: 2 → 1
   ✅ Learning rate: 2e-4 → 0.00015
   ✅ Weight decay: 0.01 → 0.02
   ✅ Max length: 1024 → 2048


In [6]:
# Optional: Login to W&B for experiment tracking
if config.experiment.use_wandb:
    import wandb
    # wandb.login()  # Uncomment and enter API key

In [7]:
# Initialize and train V3
print("🚀 Starting V3 training...")
print("🎯 Expected: ~200 steps, ~20-30 minutes")
print("=" * 80)


trainer = MarcoLoRATrainer(config=config)
train_result = trainer.train()

print("\n" + "=" * 80)
print("✅ V3 TRAINING COMPLETE!")
print("=" * 80)
print(f"📁 Model saved: {config.training.output_dir}")
print(f"📊 Final loss: {train_result.training_loss:.4f}")
print(f"⏱️  Total time: {train_result.metrics.get('train_runtime', 0) / 60:.1f} minutes")

🚀 Starting V3 training...
🎯 Expected: ~200 steps, ~20-30 minutes


  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mari-katzir[0m ([33mariel-katzir[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/182 [00:00<?, ?B/s]

  self.trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 128009, 'pad_token_id': 128009}.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're

Step,Training Loss,Validation Loss
200,0.4769,0.466733


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.



✅ V3 TRAINING COMPLETE!
📁 Model saved: ./models/italian_exercise_generator_lora_v3
📊 Final loss: 0.6356
⏱️  Total time: 18.9 minutes


## 4. Test V2 Model (Optional)

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

print("Loading V3 model for testing...")

# Load base + LoRA
base_model = AutoModelForCausalLM.from_pretrained(
    config.training.model_name,
    device_map="auto",
    torch_dtype=torch.float16
)

model = PeftModel.from_pretrained(base_model, config.training.output_dir)
tokenizer = AutoTokenizer.from_pretrained(config.training.output_dir)

print("✅ V3 model loaded")

Loading V3 model for testing...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

✅ V3 model loaded


In [9]:
# Generate test exercise (same prompt as V1 for comparison)
test_prompt = [
    {"role": "system", "content": "You are an expert Italian language teacher. Generate high-quality exercises based on the assignment specification. Output exercises in JSON format."},
    {"role": "user", "content": "Generate 2 exercises:\nCEFR Level: C1\nGrammar Focus: Conjunctions\nTopic: history\nExercise Types: fill_in_blank"}
]

inputs = tokenizer.apply_chat_template(
    test_prompt,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=768,  # Increased for v2
    temperature=0.7,
    do_sample=True,
    top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n📝 V3 Generated Exercises:")
print("=" * 80)
print(response.split("assistant")[-1].strip())
print("=" * 80)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



📝 V3 Generated Exercises:
[{"type": "fill_in_blank", "question": "Il Rinascimento, che si sviluppò in Europa tra il 14° e il 17° secolo, è spesso considerato un'epoca di grande rinascita culturale, che portò a scoperte e innovazioni in campo scientifico, artistico e letterario. ___ queste innovazioni, molti artisti e scienziati fecero nuovi esperimenti e crearono opere che sono ancora oggi apprezzate.", "answer": "A causa di", "explanation": "La congiunzione 'a causa di' è usata per indicare una causa o una motivazione per un evento o una situazione.", "hint": "Pensa a una congiunzione che indica una causa o una motivazione."}, {"type": "fill_in_blank", "question": "La Rivoluzione Francese, che ebbe luogo nel 1789, portò a una grande trasformazione politica in Francia e in Europa, ___ molti paesi seguirono il suo esempio e istituirono governi repubblicani.", "answer": "tra cui", "explanation": "La congiunzione 'tra cui' è usata per elencare membri di un gruppo o per indicare che ci so

## 5. Save Summary & Shutdown

In [10]:
# Save V2 training summary
import json
from datetime import datetime

summary = {
    "version": "v3_balanced",
    "timestamp": datetime.now().isoformat(),
    "model": config.training.model_name,
    "dataset_size": 3983,
    "epochs": config.training.num_train_epochs,
    "batch_size": config.training.per_device_train_batch_size,
    "gradient_accumulation": config.training.gradient_accumulation_steps,
    "effective_batch": config.training.per_device_train_batch_size * config.training.gradient_accumulation_steps,
    "learning_rate": config.training.learning_rate,
    "weight_decay": config.training.weight_decay,
    "warmup_ratio": config.training.warmup_ratio,
    "lora_rank": config.lora.r,
    "lora_alpha": config.lora.lora_alpha,
    "lora_dropout": config.lora.lora_dropout,
    "max_length": config.data.max_length,
    "output_dir": config.training.output_dir,
    "gpu": torch.cuda.get_device_name(0),
    "final_loss": float(train_result.training_loss),
    "training_time_minutes": train_result.metrics.get('train_runtime', 0) / 60,
    "improvements": [
        "LoRA rank optimized to 12 (balanced capacity)",
        "LoRA dropout set to 0.12 (moderate regularization)",
        "Epochs reduced 2→1 (prevent overfitting)",
        "Learning rate set to 1.5e-4 (balanced)",
        "Weight decay set to 0.02 (moderate)",
        "Max length increased to 2048 (handle complex B2/C2)"
    ]
}

summary_path = "training_summary_v3.json"
with open(summary_path, "w") as f:
    json.dump(summary, f, indent=2)

print(f"✅ V2 training summary saved to: {summary_path}")
print("\n📊 V2 Summary:")
for key, val in summary.items():
    if key != "improvements":
        print(f"   {key}: {val}")

print("\n🎯 V2 Improvements:")
for improvement in summary["improvements"]:
    print(f"   ✅ {improvement}")

✅ V2 training summary saved to: training_summary_v3.json

📊 V2 Summary:
   version: v3_balanced
   timestamp: 2025-10-06T12:55:48.036967
   model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
   dataset_size: 3983
   epochs: 1
   batch_size: 4
   gradient_accumulation: 4
   effective_batch: 16
   learning_rate: 0.00015
   weight_decay: 0.02
   warmup_ratio: 0.1
   lora_rank: 12
   lora_alpha: 24
   lora_dropout: 0.12
   max_length: 2048
   output_dir: ./models/italian_exercise_generator_lora_v3
   gpu: NVIDIA A100-SXM4-40GB
   final_loss: 0.6355876255035401
   training_time_minutes: 18.89433

🎯 V2 Improvements:
   ✅ LoRA rank optimized to 12 (balanced capacity)
   ✅ LoRA dropout set to 0.12 (moderate regularization)
   ✅ Epochs reduced 2→1 (prevent overfitting)
   ✅ Learning rate set to 1.5e-4 (balanced)
   ✅ Weight decay set to 0.02 (moderate)
   ✅ Max length increased to 2048 (handle complex B2/C2)


In [None]:
# Auto-shutdown to save compute credits
from google.colab import runtime
import time

print("\n" + "=" * 80)
print("🎉 V3 TRAINING COMPLETED SUCCESSFULLY!")
print("=" * 80)
print(f"\n📁 V3 Model saved to: {config.training.output_dir}")
print(f"📊 Summary saved to: {summary_path}")
print(f"\n💡 Next steps:")
print("   1. Run evaluation notebook to compare v1 vs v3")
print("   2. Verify V3 achieves >99% validity AND <1% overfitting")
print("   3. Select best model for production")
print(f"\n⏰ Runtime will disconnect in 30 seconds...")

# time.sleep(30)

print("\n👋 Disconnecting runtime...")
# runtime.unassign()