# üéì Fine-Tuning Mistral 7B - Philosophes (LoRA)

**Objectif :** Fine-tuner Mistral 7B sur 1200 exemples de sch√®mes logiques philosophiques

**GPU Optimal :** A100 40GB (30-45 min) | V100 16GB (1h-1h30) | T4 15GB (2-3h)

**Config :**
- Mod√®le : `mistralai/Mistral-7B-Instruct-v0.3`
- M√©thode : QLoRA (4-bit) + LoRA (r=64, alpha=128)
- Dataset : 1200 exemples (300 base + 900 augment√©s)
- Epochs : 3
- Batch size : 8 (A100) | 4 (V100/T4)

## 1Ô∏è‚É£ Setup - Installation & V√©rifications

In [None]:
# Installation des d√©pendances optimis√©es (versions r√©centes)
# Note: Versions sp√©cifiques pour √©viter conflits avec torchaudio/torchvision
print("üì¶ Installation des packages (peut prendre 2-3 minutes)...\n")

!pip install -U \
    torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 \
    transformers>=4.40.0 \
    peft>=0.10.0 \
    bitsandbytes>=0.43.0 \
    accelerate>=0.28.0 \
    trl>=0.8.0 \
    datasets>=2.18.0 \
    huggingface_hub>=0.22.0

print("\n‚úÖ Installation termin√©e !")

In [None]:
# V√©rifier le GPU disponible
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA disponible: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"GPU: {gpu_name}")
    print(f"VRAM: {gpu_memory:.1f} GB")
    
    # D√©terminer config optimale selon GPU
    if "A100" in gpu_name and gpu_memory > 35:
        print("\nüöÄ GPU OPTIMAL d√©tect√©: A100 40GB")
        BATCH_SIZE = 8
        GRADIENT_ACCUM = 4
    elif "V100" in gpu_name or ("A100" in gpu_name and gpu_memory < 20):
        print("\n‚úÖ GPU EXCELLENT d√©tect√©: V100/A100-16GB")
        BATCH_SIZE = 4
        GRADIENT_ACCUM = 8
    else:
        print("\nüü° GPU STANDARD d√©tect√©: T4")
        BATCH_SIZE = 2
        GRADIENT_ACCUM = 16
    
    print(f"Batch size: {BATCH_SIZE}")
    print(f"Gradient accumulation: {GRADIENT_ACCUM}")
    print(f"Effective batch size: {BATCH_SIZE * GRADIENT_ACCUM}")
else:
    print("‚ö†Ô∏è CUDA non disponible - Training sur CPU (tr√®s lent)")
    BATCH_SIZE = 1
    GRADIENT_ACCUM = 32

## 2Ô∏è‚É£ Configuration - Authentification HF & Datasets

In [None]:
# Authentification Hugging Face (pour push du mod√®le)
from huggingface_hub import login

# ‚ö†Ô∏è REMPLACER PAR VOTRE TOKEN HF (avec write access)
HF_TOKEN = "hf_..."
login(token=HF_TOKEN)

print("‚úÖ Authentification HF r√©ussie")

In [None]:
# Charger les datasets depuis fichiers locaux
# ‚ö†Ô∏è UPLOADER LES FICHIERS DANS COLAB :
# - schemes_levelA_base.jsonl
# - schemes_levelA_augmented.jsonl

from datasets import load_dataset, concatenate_datasets

# Charger les datasets
dataset_base = load_dataset('json', data_files='schemes_levelA_base.jsonl', split='train')
dataset_augmented = load_dataset('json', data_files='schemes_levelA_augmented.jsonl', split='train')

# Combiner les datasets
dataset_full = concatenate_datasets([dataset_base, dataset_augmented])

print(f"‚úÖ Dataset charg√©: {len(dataset_full)} exemples")
print(f"   - Base: {len(dataset_base)} exemples")
print(f"   - Augment√©s: {len(dataset_augmented)} exemples")

# Afficher un exemple
print("\nüìù Exemple de donn√©es:")
print(dataset_full[0]['messages'])

In [None]:
# Split train/validation (95/5)
dataset_split = dataset_full.train_test_split(test_size=0.05, seed=42)
train_dataset = dataset_split['train']
eval_dataset = dataset_split['test']

print(f"‚úÖ Split r√©alis√©:")
print(f"   - Train: {len(train_dataset)} exemples")
print(f"   - Validation: {len(eval_dataset)} exemples")

## 3Ô∏è‚É£ Mod√®le - Configuration QLoRA + LoRA

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
import torch

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

# Configuration quantization 4-bit (QLoRA)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4 (meilleure pr√©cision)
    bnb_4bit_compute_dtype=torch.bfloat16,  # bfloat16 optimal sur A100
    bnb_4bit_use_double_quant=True,      # Double quantization (√©conomie VRAM)
)

print("üì• Chargement du mod√®le Mistral 7B...")

# Charger le mod√®le base en 4-bit
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

# Charger le tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("‚úÖ Mod√®le charg√© en 4-bit")

# Pr√©parer le mod√®le pour training k-bit
model = prepare_model_for_kbit_training(model)

print("‚úÖ Mod√®le pr√©par√© pour k-bit training")

In [None]:
# Configuration LoRA (rang 64 pour qualit√© maximale)
lora_config = LoraConfig(
    r=64,                              # Rang LoRA (64 = haute qualit√©)
    lora_alpha=128,                    # Alpha = 2 * r (r√®gle empirique)
    lora_dropout=0.05,                 # Dropout l√©ger
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[                   # Tous les modules attention + MLP
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
)

# Appliquer LoRA au mod√®le
model = get_peft_model(model, lora_config)

# Afficher les param√®tres entra√Ænables
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"‚úÖ LoRA appliqu√© (r={lora_config.r}, alpha={lora_config.lora_alpha})")
print(f"   Param√®tres entra√Ænables: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"   Param√®tres totaux: {total_params:,}")

## 4Ô∏è‚É£ Training - Configuration & Lancement

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer

# Configuration du training
training_args = TrainingArguments(
    # Output
    output_dir="./mistral-7b-philosophes-lora",
    
    # Batch size (ajust√© selon GPU)
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUM,
    
    # Learning rate
    learning_rate=2e-4,              # Optimal pour LoRA
    lr_scheduler_type="cosine",      # Cosine avec warmup
    warmup_ratio=0.03,               # 3% warmup
    
    # Epochs
    num_train_epochs=3,              # 3 epochs pour √©viter overfitting
    
    # Optimisation
    optim="paged_adamw_8bit",        # AdamW 8-bit (√©conomie VRAM)
    bf16=True,                        # bfloat16 sur A100
    fp16=False,
    
    # Gradient clipping
    max_grad_norm=0.3,
    
    # Logging & Evaluation
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=50,
    save_steps=100,
    save_total_limit=3,
    
    # Misc
    report_to="none",                # Pas de W&B/Tensorboard
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

print("‚úÖ Configuration training cr√©√©e")
print(f"   Effective batch size: {BATCH_SIZE * GRADIENT_ACCUM}")
print(f"   Total steps: ~{len(train_dataset) * 3 // (BATCH_SIZE * GRADIENT_ACCUM)}")

In [None]:
# Fonction de formatage des messages pour SFTTrainer
def formatting_func(example):
    """Formate les messages au format ChatML pour Mistral"""
    messages = example['messages']
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return text

# Cr√©er le trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    formatting_func=formatting_func,
    max_seq_length=512,              # Longueur max des s√©quences
    packing=False,                   # Pas de packing pour simplicit√©
)

print("‚úÖ Trainer cr√©√© et pr√™t")

In [None]:
# üöÄ LANCER LE TRAINING
print("üöÄ D√©marrage du training...\n")

# Afficher temps estim√©
if "A100" in torch.cuda.get_device_name(0) and torch.cuda.get_device_properties(0).total_memory > 35e9:
    print("‚è±Ô∏è Temps estim√©: 30-45 minutes (A100 40GB)")
elif "V100" in torch.cuda.get_device_name(0):
    print("‚è±Ô∏è Temps estim√©: 1h-1h30 (V100)")
else:
    print("‚è±Ô∏è Temps estim√©: 2-3h (T4)")

print("\n" + "="*60)

# Training
trainer.train()

print("\n" + "="*60)
print("‚úÖ Training termin√© !")

## 5Ô∏è‚É£ Export - Sauvegarde & Push vers HF Hub

In [None]:
# Sauvegarder le mod√®le localement
output_dir = "./mistral-7b-philosophes-lora-final"
trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"‚úÖ Mod√®le sauvegard√© dans: {output_dir}")

# Afficher la taille du LoRA
import os
lora_size = sum(os.path.getsize(os.path.join(output_dir, f)) for f in os.listdir(output_dir)) / 1024**2
print(f"   Taille LoRA: {lora_size:.1f} MB")

In [None]:
# Push vers HF Hub (OPTIONNEL)
# ‚ö†Ô∏è REMPLACER PAR VOTRE USERNAME HF
HF_USERNAME = "FJDaz"
REPO_NAME = f"{HF_USERNAME}/mistral-7b-philosophes-lora"

print(f"üì§ Push vers HF Hub: {REPO_NAME}...")

trainer.model.push_to_hub(
    REPO_NAME,
    use_auth_token=HF_TOKEN,
    commit_message="Fine-tuned Mistral 7B on 1200 philosophy schemas (r=64)"
)

tokenizer.push_to_hub(
    REPO_NAME,
    use_auth_token=HF_TOKEN
)

print(f"‚úÖ Mod√®le push√© sur: https://huggingface.co/{REPO_NAME}")

## 6Ô∏è‚É£ Test - Inf√©rence Rapide

In [None]:
# Test rapide du mod√®le fine-tun√©
from transformers import pipeline

# Charger le mod√®le fine-tun√©
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
)

# Test 1: Modus Ponens spinoziste
test_prompt = """Sch√®me : Modus Ponens
Contexte : Si l'homme ignore les causes de ses passions, il est en servitude. Or l'√©l√®ve ignore les causes de ses passions.
Applique le sch√®me :"""

print("üìù Test 1: Modus Ponens spinoziste")
print(f"Prompt: {test_prompt}")
print("\n" + "="*60)

messages = [
    {"role": "system", "content": "Tu es un tuteur philosophique ma√Ætrisant les sch√®mes logiques."},
    {"role": "user", "content": test_prompt}
]

output = pipe(messages)
print(output[0]['generated_text'][-1]['content'])

print("\n" + "="*60)

In [None]:
# Test 2: Identit√© spinoziste
test_prompt_2 = """Sch√®me : Identit√©
Contexte : Dieu = Nature. Or la Nature est n√©cessaire.
Applique le sch√®me :"""

print("üìù Test 2: Identit√© spinoziste")
print(f"Prompt: {test_prompt_2}")
print("\n" + "="*60)

messages = [
    {"role": "system", "content": "Tu es un tuteur philosophique ma√Ætrisant les sch√®mes logiques."},
    {"role": "user", "content": test_prompt_2}
]

output = pipe(messages)
print(output[0]['generated_text'][-1]['content'])

print("\n" + "="*60)
print("‚úÖ Tests termin√©s !")

## 7Ô∏è‚É£ Download - T√©l√©charger le LoRA Localement

In [None]:
# Zipper le mod√®le pour t√©l√©chargement
!zip -r mistral-7b-philosophes-lora-final.zip ./mistral-7b-philosophes-lora-final/

print("‚úÖ Archive cr√©√©e: mistral-7b-philosophes-lora-final.zip")
print("üì• T√©l√©chargez le fichier depuis l'explorateur Colab (gauche)")

---

## ‚úÖ Checklist Finale

- [ ] Training termin√© sans erreur
- [ ] Eval loss < 0.6 (validation)
- [ ] Tests d'inf√©rence corrects (application sch√®mes)
- [ ] Mod√®le push√© sur HF Hub OU t√©l√©charg√© localement
- [ ] Taille LoRA ~250-350 MB (r=64)

## üéØ Prochaines √âtapes

1. **Benchmarks complets** : Tester sur 30 questions (10 par philosophe)
2. **Test CPU** : Charger en 4-bit sur CPU, mesurer latence
3. **D√©ploiement HF Space** : Cr√©er Space CPU gratuit avec ce LoRA
4. **Comparaison** : vs Mistral 7B base (sans LoRA) + prompts syst√®me

---

**Cr√©√© le :** 20 novembre 2025  
**Auteur :** Claude Code  
**Mod√®le :** Mistral 7B Instruct v0.3 + LoRA (r=64, alpha=128)  
**Dataset :** 1200 exemples sch√®mes philosophiques (Spinoza, Bergson, Kant)