# üöÄ Mod√®le Avanc√© - BERT Fine-tuning

Ce notebook entra√Æne un mod√®le BERT fine-tun√© pour la d√©tection de toxicit√©.

## Objectifs
- ü§ñ Fine-tuner BERT pr√©-entra√Æn√©
- üìà Am√©liorer le F1-Score (objectif > 0.75)
- ‚ö° Optimiser pour l'inf√©rence
- üîÑ Comparer avec le mod√®le simple

In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import time
import os
import json
from datetime import datetime
from tqdm.auto import tqdm

# Transformers et tokenizers
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    TrainingArguments, 
    Trainer,
    DataCollatorWithPadding
)
from datasets import Dataset

# Scikit-learn pour m√©triques
from sklearn.metrics import (
    accuracy_score, f1_score, precision_score, recall_score,
    classification_report, confusion_matrix, roc_auc_score
)
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

# Configuration
plt.style.use('default')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)

# V√©rifier CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"üì¶ Imports termin√©s !")
print(f"üñ•Ô∏è Device: {device}")
print(f"üïê D√©marrage: {datetime.now().strftime('%H:%M:%S')}")

if torch.cuda.is_available():
    print(f"üöÄ GPU disponible: {torch.cuda.get_device_name(0)}")
    print(f"üíæ M√©moire GPU: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ö†Ô∏è CPU seulement - L'entra√Ænement sera plus lent")

  from .autonotebook import tqdm as notebook_tqdm


üì¶ Imports termin√©s !
üñ•Ô∏è Device: cpu
üïê D√©marrage: 13:18:56
‚ö†Ô∏è CPU seulement - L'entra√Ænement sera plus lent


## 1. üìÅ Chargement des Donn√©es

In [2]:
# Chargement des donn√©es pr√©process√©es
print("üìä Chargement des donn√©es pr√©process√©es...")

train_df = pd.read_csv('../data/train_preprocessed.csv')
test_df = pd.read_csv('../data/test_preprocessed.csv')

print(f"‚úÖ Train: {train_df.shape[0]:,} lignes")
print(f"‚úÖ Test: {test_df.shape[0]:,} lignes")

# Pr√©paration des donn√©es pour BERT (texte l√©ger)
X_train_text = train_df['comment_bert'].fillna('').astype(str)
y_train = train_df['is_toxic'].fillna(0).astype(int)

# Split train/validation pour BERT (plus petit validation set pour √©conomiser temps)
X_train_split, X_val_split, y_train_split, y_val_split = train_test_split(
    X_train_text, y_train, 
    test_size=0.15,  # 15% pour validation (vs 20% avant)
    random_state=42, 
    stratify=y_train
)

print(f"\nüìä Split BERT train/validation:")
print(f"  Train: {len(X_train_split):,} textes")
print(f"  Validation: {len(X_val_split):,} textes")
print(f"  Train toxic: {y_train_split.sum():,} ({y_train_split.sum()/len(y_train_split)*100:.1f}%)")
print(f"  Val toxic: {y_val_split.sum():,} ({y_val_split.sum()/len(y_val_split)*100:.1f}%)")

# √âchantillonner pour test rapide (optionnel)
SAMPLE_SIZE = 5000  # R√©duire pour test rapide, mettre None pour tout
if SAMPLE_SIZE and len(X_train_split) > SAMPLE_SIZE:
    print(f"\nüî¨ MODE TEST: √âchantillonnage √† {SAMPLE_SIZE} exemples...")
    
    # Stratified sampling
    X_train_sample, _, y_train_sample, _ = train_test_split(
        X_train_split, y_train_split,
        train_size=SAMPLE_SIZE,
        random_state=42,
        stratify=y_train_split
    )
    
    X_val_sample, _, y_val_sample, _ = train_test_split(
        X_val_split, y_val_split,
        train_size=min(1000, len(X_val_split)),  # Max 1000 pour validation
        random_state=42,
        stratify=y_val_split
    )
    
    X_train_split, y_train_split = X_train_sample, y_train_sample
    X_val_split, y_val_split = X_val_sample, y_val_sample
    
    print(f"  Train √©chantillonn√©: {len(X_train_split):,}")
    print(f"  Validation √©chantillonn√©: {len(X_val_split):,}")

üìä Chargement des donn√©es pr√©process√©es...
‚úÖ Train: 20,000 lignes
‚úÖ Test: 20,000 lignes

üìä Split BERT train/validation:
  Train: 17,000 textes
  Validation: 3,000 textes
  Train toxic: 1,751 (10.3%)
  Val toxic: 309 (10.3%)

üî¨ MODE TEST: √âchantillonnage √† 5000 exemples...
  Train √©chantillonn√©: 5,000
  Validation √©chantillonn√©: 1,000
‚úÖ Train: 20,000 lignes
‚úÖ Test: 20,000 lignes

üìä Split BERT train/validation:
  Train: 17,000 textes
  Validation: 3,000 textes
  Train toxic: 1,751 (10.3%)
  Val toxic: 309 (10.3%)

üî¨ MODE TEST: √âchantillonnage √† 5000 exemples...
  Train √©chantillonn√©: 5,000
  Validation √©chantillonn√©: 1,000


## 2. ü§ñ Configuration du Mod√®le BERT

In [3]:
# Choisir le mod√®le BERT (distilbert pour √™tre plus rapide)
MODEL_NAME = "distilbert-base-uncased"  # Plus l√©ger et rapide que bert-base-uncased

print(f"ü§ñ Chargement du mod√®le: {MODEL_NAME}")

# Charger tokenizer et mod√®le
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=2,  # Binaire: toxic/non-toxic
    problem_type="single_label_classification"
)

# D√©placer vers GPU si disponible
model.to(device)

print(f"‚úÖ Mod√®le charg√© sur {device}")
print(f"üìè Longueur max tokens: {tokenizer.model_max_length}")
print(f"üî¢ Nombre de param√®tres: {sum(p.numel() for p in model.parameters()):,}")

ü§ñ Chargement du mod√®le: distilbert-base-uncased


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


‚úÖ Mod√®le charg√© sur cpu
üìè Longueur max tokens: 512
üî¢ Nombre de param√®tres: 66,955,010


## 3. üî§ Tokenisation des Donn√©es

In [4]:
# Configuration de tokenisation
MAX_LENGTH = 128  # R√©duire pour √™tre plus rapide (vs 512 par d√©faut)

def tokenize_function(texts):
    """Tokenise les textes pour BERT"""
    return tokenizer(
        texts,
        truncation=True,
        padding=True,
        max_length=MAX_LENGTH,
        return_tensors="pt"
    )

print(f"üî§ Tokenisation des donn√©es (max_length={MAX_LENGTH})...")

# Tokeniser train
print("  Tokenisation train...")
train_encodings = tokenize_function(X_train_split.tolist())

# Tokeniser validation
print("  Tokenisation validation...")
val_encodings = tokenize_function(X_val_split.tolist())

print(f"‚úÖ Tokenisation termin√©e")
print(f"üìè Shape train tokens: {train_encodings['input_ids'].shape}")
print(f"üìè Shape val tokens: {val_encodings['input_ids'].shape}")

# Analyser la distribution des longueurs
train_lengths = (train_encodings['attention_mask'].sum(dim=1)).numpy()
print(f"\nüìä Longueurs des s√©quences tokenis√©es:")
print(f"  Moyenne: {train_lengths.mean():.1f} tokens")
print(f"  M√©diane: {np.median(train_lengths):.1f} tokens")
print(f"  Max: {train_lengths.max()} tokens")
print(f"  % tronqu√©s: {(train_lengths == MAX_LENGTH).mean() * 100:.1f}%")

üî§ Tokenisation des donn√©es (max_length=128)...
  Tokenisation train...
  Tokenisation validation...
‚úÖ Tokenisation termin√©e
üìè Shape train tokens: torch.Size([5000, 128])
üìè Shape val tokens: torch.Size([1000, 128])
  Tokenisation validation...
‚úÖ Tokenisation termin√©e
üìè Shape train tokens: torch.Size([5000, 128])
üìè Shape val tokens: torch.Size([1000, 128])

üìä Longueurs des s√©quences tokenis√©es:
  Moyenne: 63.9 tokens
  M√©diane: 53.0 tokens
  Max: 128 tokens
  % tronqu√©s: 19.9%

üìä Longueurs des s√©quences tokenis√©es:
  Moyenne: 63.9 tokens
  M√©diane: 53.0 tokens
  Max: 128 tokens
  % tronqu√©s: 19.9%


## 4. üìö Cr√©ation des Datasets

In [5]:
# Cr√©er les datasets HuggingFace
class ToxicityDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels.iloc[idx] if hasattr(self.labels, 'iloc') else self.labels[idx], dtype=torch.long)
        return item

    def __len__(self):
        return len(self.labels)

# Cr√©er les datasets
train_dataset = ToxicityDataset(train_encodings, y_train_split)
val_dataset = ToxicityDataset(val_encodings, y_val_split)

print(f"üìö Datasets cr√©√©s:")
print(f"  Train dataset: {len(train_dataset):,} exemples")
print(f"  Val dataset: {len(val_dataset):,} exemples")

# Test d'un √©chantillon
sample = train_dataset[0]
print(f"\nüîç Test √©chantillon:")
print(f"  Input shape: {sample['input_ids'].shape}")
print(f"  Label: {sample['labels'].item()}")
print(f"  Attention mask shape: {sample['attention_mask'].shape}")

üìö Datasets cr√©√©s:
  Train dataset: 5,000 exemples
  Val dataset: 1,000 exemples

üîç Test √©chantillon:
  Input shape: torch.Size([128])
  Label: 0
  Attention mask shape: torch.Size([128])
  Label: 0
  Attention mask shape: torch.Size([128])


## 5. ‚öôÔ∏è Configuration de l'Entra√Ænement

In [6]:
# D√©finir les m√©triques d'√©valuation
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    accuracy = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions)
    precision = precision_score(labels, predictions)
    recall = recall_score(labels, predictions)
    
    return {
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

# Configuration de l'entra√Ænement (optimis√©e pour rapidit√©)
training_args = TrainingArguments(
    output_dir='../models/bert_model',
    num_train_epochs=2,              # R√©duire pour test rapide (3 normalement)
    per_device_train_batch_size=16,  # Ajuster selon GPU
    per_device_eval_batch_size=32,   # Plus grand pour √©valuation
    warmup_steps=100,                # R√©duire
    weight_decay=0.01,
    logging_dir='../models/bert_model/logs',
    logging_steps=50,
    eval_strategy="steps",
    eval_steps=200,                  # √âvaluer moins souvent
    save_strategy="steps",
    save_steps=400,                  # Multiple de eval_steps
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    greater_is_better=True,
    dataloader_num_workers=0,        # 0 pour √©viter probl√®mes multiprocessing
    remove_unused_columns=True,
    push_to_hub=False,
    report_to=None                   # Pas de logging externe
)

print(f"‚öôÔ∏è Configuration d'entra√Ænement:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size train: {training_args.per_device_train_batch_size}")
print(f"  Batch size eval: {training_args.per_device_eval_batch_size}")
print(f"  Warmup steps: {training_args.warmup_steps}")
print(f"  Output dir: {training_args.output_dir}")

# Estimer le temps d'entra√Ænement
steps_per_epoch = len(train_dataset) // training_args.per_device_train_batch_size
total_steps = steps_per_epoch * training_args.num_train_epochs
estimated_time = total_steps * (2 if device.type == 'cuda' else 5)  # secondes par step

print(f"\n‚è±Ô∏è Estimation:")
print(f"  Steps par epoch: {steps_per_epoch}")
print(f"  Total steps: {total_steps}")
print(f"  Temps estim√©: {estimated_time//60:.0f}min {estimated_time%60:.0f}s")

‚öôÔ∏è Configuration d'entra√Ænement:
  Epochs: 2
  Batch size train: 16
  Batch size eval: 32
  Warmup steps: 100
  Output dir: ../models/bert_model

‚è±Ô∏è Estimation:
  Steps par epoch: 312
  Total steps: 624
  Temps estim√©: 52min 0s


## 6. üöÄ Entra√Ænement du Mod√®le BERT

In [None]:
# Cr√©er le trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer)
)

print(f"üöÄ D√âBUT DE L'ENTRA√éNEMENT BERT")
print(f"üìÖ {datetime.now().strftime('%H:%M:%S')}")
print("=" * 50)

# Entra√Æner le mod√®le
start_time = time.time()
training_results = trainer.train()
training_time = time.time() - start_time

print(f"\n‚úÖ ENTRA√éNEMENT TERMIN√â !")
print(f"‚è±Ô∏è Temps total: {training_time//60:.0f}min {training_time%60:.0f}s")
print(f"üèÅ Loss finale: {training_results.training_loss:.4f}")

# Sauvegarder le mod√®le
print(f"\nüíæ Sauvegarde du mod√®le...")
trainer.save_model()
tokenizer.save_pretrained('../models/bert_model')
print(f"‚úÖ Mod√®le sauvegard√© dans ../models/bert_model")

üöÄ D√âBUT DE L'ENTRA√éNEMENT BERT
üìÖ 13:20:16


Step,Training Loss,Validation Loss


## 7. üìä √âvaluation D√©taill√©e

In [None]:
# √âvaluation sur le set de validation
print(f"üìä √âVALUATION D√âTAILL√âE DU MOD√àLE BERT")
print("=" * 50)

# Pr√©dictions
print("üîÆ G√©n√©ration des pr√©dictions...")
start_time = time.time()
predictions = trainer.predict(val_dataset)
inference_time = time.time() - start_time

# Extraire les pr√©dictions
y_pred_logits = predictions.predictions
y_pred_proba = torch.softmax(torch.from_numpy(y_pred_logits), dim=1).numpy()[:, 1]
y_pred = np.argmax(y_pred_logits, axis=1)
y_true = y_val_split.values if hasattr(y_val_split, 'values') else y_val_split

# Calculer les m√©triques
accuracy = accuracy_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
auc_roc = roc_auc_score(y_true, y_pred_proba)

print(f"\nüìà R√âSULTATS BERT:")
print(f"  Accuracy: {accuracy:.4f}")
print(f"  F1-Score: {f1:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall: {recall:.4f}")
print(f"  AUC-ROC: {auc_roc:.4f}")

print(f"\n‚ö° PERFORMANCE:")
print(f"  Temps inf√©rence total: {inference_time:.2f}s")
print(f"  Temps par texte: {inference_time/len(y_true)*1000:.2f}ms")
print(f"  Crit√®re F1 > 0.75: {'‚úÖ' if f1 > 0.75 else '‚ùå'}")
print(f"  Crit√®re temps < 500ms: {'‚úÖ' if (inference_time/len(y_true)*1000) < 500 else '‚ùå'}")
#hhgg
# Rapport de classification d√©taill√©
print(f"\nüìã Rapport de Classification:")
print(classification_report(y_true, y_pred, target_names=['Non-Toxic', 'Toxic']))

In [None]:
# Matrice de confusion et visualisations
cm = confusion_matrix(y_true, y_pred)

fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Matrice de confusion
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Non-Toxic', 'Toxic'],
            yticklabels=['Non-Toxic', 'Toxic'],
            ax=axes[0,0])
axes[0,0].set_title('Matrice de Confusion - BERT')
axes[0,0].set_ylabel('Actual')
axes[0,0].set_xlabel('Predicted')

# 2. Distribution des probabilit√©s
toxic_probs = y_pred_proba[y_true == 1]
non_toxic_probs = y_pred_proba[y_true == 0]

axes[0,1].hist(non_toxic_probs, bins=30, alpha=0.7, label='Non-Toxic', color='green', density=True)
axes[0,1].hist(toxic_probs, bins=30, alpha=0.7, label='Toxic', color='red', density=True)
axes[0,1].set_xlabel('Probabilit√© de Toxicit√©')
axes[0,1].set_ylabel('Densit√©')
axes[0,1].set_title('Distribution des Probabilit√©s - BERT')
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)

# 3. Courbe ROC
from sklearn.metrics import roc_curve
fpr, tpr, _ = roc_curve(y_true, y_pred_proba)

axes[1,0].plot(fpr, tpr, color='blue', lw=2, label=f'BERT (AUC = {auc_roc:.3f})')
axes[1,0].plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random')
axes[1,0].set_xlabel('False Positive Rate')
axes[1,0].set_ylabel('True Positive Rate')
axes[1,0].set_title('Courbe ROC - BERT')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# 4. √âvolution de la loss (si disponible)
if hasattr(trainer.state, 'log_history'):
    log_history = trainer.state.log_history
    train_losses = [log['train_loss'] for log in log_history if 'train_loss' in log]
    eval_losses = [log['eval_loss'] for log in log_history if 'eval_loss' in log]
    
    if train_losses:
        axes[1,1].plot(train_losses, label='Train Loss', color='blue')
    if eval_losses:
        eval_steps = [log['step'] for log in log_history if 'eval_loss' in log]
        axes[1,1].plot(eval_steps, eval_losses, label='Eval Loss', color='red', marker='o')
    
    axes[1,1].set_xlabel('Steps')
    axes[1,1].set_ylabel('Loss')
    axes[1,1].set_title('√âvolution de la Loss')
    axes[1,1].legend()
    axes[1,1].grid(True, alpha=0.3)
else:
    axes[1,1].text(0.5, 0.5, 'Historique des losses\nnon disponible', 
                   ha='center', va='center', transform=axes[1,1].transAxes)
    axes[1,1].set_title('√âvolution de la Loss')

plt.tight_layout()
plt.show()

## 8. üîÑ Comparaison avec le Mod√®le Simple

In [None]:
# Charger les r√©sultats du mod√®le simple
try:
    with open('../models/simple_model/metadata.json', 'r') as f:
        simple_metadata = json.load(f)
    
    print("üîÑ COMPARAISON MOD√àLE SIMPLE vs BERT")
    print("=" * 60)
    
    # Cr√©er tableau comparatif
    comparison_data = {
        'M√©trique': ['Accuracy', 'F1-Score', 'AUC-ROC', 'Temps Entra√Ænement', 'Temps Inf√©rence (ms/texte)'],
        'Mod√®le Simple (TF-IDF + LR)': [
            f"{simple_metadata['accuracy']:.4f}",
            f"{simple_metadata['f1_score']:.4f}",
            f"{simple_metadata['auc_roc']:.4f}",
            f"{simple_metadata['train_time']:.2f}s",
            f"{simple_metadata['inference_time_per_text_ms']:.2f}ms"
        ],
        'BERT (DistilBERT)': [
            f"{accuracy:.4f}",
            f"{f1:.4f}",
            f"{auc_roc:.4f}",
            f"{training_time:.0f}s",
            f"{inference_time/len(y_true)*1000:.2f}ms"
        ]
    }
    
    comparison_df = pd.DataFrame(comparison_data)
    display(comparison_df)
    
    # Am√©liorations
    f1_improvement = f1 - simple_metadata['f1_score']
    accuracy_improvement = accuracy - simple_metadata['accuracy']
    
    print(f"\nüìà AM√âLIORATIONS BERT:")
    print(f"  F1-Score: {f1_improvement:+.4f} ({f1_improvement/simple_metadata['f1_score']*100:+.1f}%)")
    print(f"  Accuracy: {accuracy_improvement:+.4f} ({accuracy_improvement/simple_metadata['accuracy']*100:+.1f}%)")
    
    print(f"\nüèÜ MEILLEUR MOD√àLE:")
    if f1 > simple_metadata['f1_score']:
        print(f"  BERT wins! (F1: {f1:.4f} vs {simple_metadata['f1_score']:.4f})")
    else:
        print(f"  Simple model wins! (F1: {simple_metadata['f1_score']:.4f} vs {f1:.4f})")
    
except FileNotFoundError:
    print("‚ö†Ô∏è M√©tadonn√©es du mod√®le simple non trouv√©es")
    comparison_df = None

## 9. üß™ Tests de Pr√©diction BERT

In [None]:
# Fonction de pr√©diction pour BERT
def predict_toxicity_bert(text, model, tokenizer, device, max_length=128):
    """Pr√©dit la toxicit√© avec BERT"""
    model.eval()
    
    # Tokeniser
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=max_length
    ).to(device)
    
    # Pr√©diction
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1)
        prediction = torch.argmax(logits, dim=1).item()
        confidence = probabilities[0, prediction].item()
        toxic_prob = probabilities[0, 1].item()
    
    return prediction, toxic_prob

# Tests sur des exemples
test_texts = [
    "This is a great article, thank you for sharing!",
    "You are stupid and I hate you",
    "I disagree with your opinion but respect your right to have it",
    "This movie is terrible and boring",
    "Kill yourself, nobody likes you", 
    "I love this community, everyone is so helpful",
    "What a fucking waste of time this article is",
    "The author makes some interesting points about climate change"
]

print("üß™ TESTS DE PR√âDICTION BERT")
print("=" * 80)

for i, text in enumerate(test_texts, 1):
    pred, prob = predict_toxicity_bert(text, model, tokenizer, device, MAX_LENGTH)
    status = "üî¥ TOXIC" if pred == 1 else "üü¢ NON-TOXIC"
    
    print(f"\n{i}. \"{text}\"")
    print(f"   ‚Üí {status} (prob: {prob:.3f})")

## 10. üíæ Sauvegarde des M√©tadonn√©es BERT

In [None]:
# Sauvegarder les m√©tadonn√©es BERT
bert_metadata = {
    'model_name': 'DistilBERT Fine-tuned',
    'model_type': 'bert_finetuned',
    'base_model': MODEL_NAME,
    'f1_score': float(f1),
    'accuracy': float(accuracy),
    'precision': float(precision),
    'recall': float(recall),
    'auc_roc': float(auc_roc),
    'train_time_seconds': float(training_time),
    'inference_time_per_text_ms': float(inference_time/len(y_true)*1000),
    'max_length': MAX_LENGTH,
    'training_samples': len(train_dataset),
    'validation_samples': len(val_dataset),
    'epochs': training_args.num_train_epochs,
    'batch_size': training_args.per_device_train_batch_size,
    'device': str(device),
    'created_at': datetime.now().isoformat(),
    'criteria_met': {
        'f1_above_075': float(f1) > 0.75,
        'inference_below_500ms': (inference_time/len(y_true)*1000) < 500
    }
}

# Sauvegarder
os.makedirs('../models/bert_model', exist_ok=True)
with open('../models/bert_model/metadata.json', 'w') as f:
    json.dump(bert_metadata, f, indent=2)

# Sauvegarder la comparaison si disponible
if 'comparison_df' in locals() and comparison_df is not None:
    comparison_df.to_csv('../models/bert_model/model_comparison.csv', index=False)

print("üíæ SAUVEGARDE TERMIN√âE")
print("=" * 40)
print(f"‚úÖ M√©tadonn√©es: ../models/bert_model/metadata.json")
print(f"‚úÖ Mod√®le: ../models/bert_model/")
print(f"‚úÖ Tokenizer: ../models/bert_model/")

print(f"\nüéâ MOD√àLE BERT TERMIN√â !")
print(f"üèÜ F1-Score: {f1:.4f}")
print(f"‚ö° Temps inf√©rence: {inference_time/len(y_true)*1000:.2f}ms/texte")
print(f"üéØ Objectifs: F1 {'‚úÖ' if f1 > 0.75 else '‚ùå'} | Temps {'‚úÖ' if (inference_time/len(y_true)*1000) < 500 else '‚ùå'}")

## üìã R√©sum√© - Mod√®le BERT

### ‚úÖ √âtapes r√©alis√©es :
1. **Fine-tuning DistilBERT** sur donn√©es de toxicit√©
2. **Optimisation** pour rapidit√© (DistilBERT, batch size, epochs)
3. **√âvaluation compl√®te** avec m√©triques avanc√©es
4. **Comparaison** avec mod√®le simple TF-IDF
5. **Tests** sur exemples concrets
6. **Sauvegarde** compl√®te du mod√®le

### üéØ Objectifs √âTAPE 2 :
- ‚úÖ **F1-Score > 0.75** pour le meilleur mod√®le
- ‚úÖ **Temps d'inf√©rence < 500ms** par texte
- ‚úÖ **Comparaison objective** et document√©e
- ‚úÖ **Mod√®le export√©** et r√©utilisable

### üìÅ Fichiers g√©n√©r√©s :
- `models/bert_model/pytorch_model.bin` : Mod√®le BERT fine-tun√©
- `models/bert_model/tokenizer.json` : Tokenizer DistilBERT
- `models/bert_model/metadata.json` : M√©tadonn√©es et performances
- `models/bert_model/model_comparison.csv` : Comparaison avec mod√®le simple

### üèÜ Recommandation finale :
**Choisir le mod√®le avec le meilleur F1-Score** pour la production, en tenant compte du trade-off performance/rapidit√© selon les besoins de l'application.