<a href="https://colab.research.google.com/github/L-Poca/Data_Pipeline/blob/rafael_cleaning/notebooks/comprehensive_ml_pipeline_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ü¶† Comprehensive Machine Learning Pipeline - COVID-19 Classification

---

## üìã Overview

This notebook provides a **complete machine learning and deep learning pipeline** for COVID-19 classification from chest X-ray images.

### üéØ Objectives

1. **Baseline ML Models**: Train 9 classical ML algorithms with 3 different feature extraction methods
2. **Deep Learning**: Build custom CNNs and leverage 12+ pre-trained architectures
3. **Advanced Techniques**: Ensemble methods, hyperparameter tuning, cross-validation
4. **Interpretability**: GradCAM, LIME, SHAP explanations
5. **Production-Ready**: Model persistence, prediction pipelines, HTML reports

### üìä Dataset

- **Classes**: COVID, Lung_Opacity, Normal, Viral Pneumonia
- **Images**: ~21,000 chest X-rays (grayscale, 256√ó256)
- **Challenge**: Class imbalance (Viral Pneumonia: 6.4%)

### üöÄ Quick Start

1. **Fast Testing**: Set `N_IMAGES_PER_CLASS = 100` (Section 2)
2. **Full Training**: Set `N_IMAGES_PER_CLASS = None` (all images)
3. **Colab**: Click badge above ‚Üí Auto-clone ‚Üí Auto-install ‚Üí Run all

### ‚è±Ô∏è Estimated Runtime

- **Fast mode** (100 images/class): ~30-60 minutes
- **Full mode** (all images): ~3-5 hours (with GPU)

---

In [None]:
"""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë  üéØ CELLULE DE CONFIGURATION STANDALONE - COPIER-COLLER DANS VOS NOTEBOOKS ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

INSTRUCTIONS:
-------------
1. Copiez TOUT le contenu de cette cellule
2. Collez-le comme PREMI√àRE CELLULE de votre notebook
3. Ex√©cutez la cellule
4. La configuration est pr√™te √† l'emploi !

Cette cellule est 100% autonome et fonctionne partout :
‚úÖ Google Colab (clone + installe automatiquement)
‚úÖ WSL / Linux Local
‚úÖ Tout environnement Jupyter

APR√àS EX√âCUTION, UTILISEZ L'OBJET 'config':
--------------------------------------------
‚ñ∂ config.data_dir              # Chemin du dataset
‚ñ∂ config.models_dir            # R√©pertoire des mod√®les
‚ñ∂ config.results_dir           # R√©pertoire des r√©sultats
‚ñ∂ config.classes               # Liste des classes
‚ñ∂ config.img_size              # Tuple (width, height)
‚ñ∂ config.img_channels          # Nombre de canaux (1=grayscale, 3=RGB)
‚ñ∂ config.batch_size            # Taille des batchs
‚ñ∂ config.epochs                # Nombre d'√©poques
‚ñ∂ config.learning_rate         # Learning rate
‚ñ∂ config.validation_split      # Proportion pour validation
‚ñ∂ config.gradcam_alpha         # Alpha pour Grad-CAM
‚ñ∂ config.shap_max_evals        # Evaluations SHAP
‚ñ∂ config.confidence_high_threshold  # Seuil confiance haute
... et bien plus !

VARIABLES GLOBALES:
-------------------
‚Ä¢ config: Objet Config complet (tous les param√®tres du projet)
‚Ä¢ ENV: Environnement d√©tect√© ('colab', 'wsl', 'local')
‚Ä¢ Tous les transformers import√©s et pr√™ts √† l'emploi

"""

# =============================================================================
# IMPORTS STANDARDS
# =============================================================================

import os
import sys
import subprocess
from pathlib import Path


# =============================================================================
# D√âTECTION AUTOMATIQUE DE L'ENVIRONNEMENT
# =============================================================================

def detect_environment():
    """D√©tecte l'environnement (colab, wsl, local)"""
    try:
        import google.colab
        return "colab"
    except ImportError:
        is_wsl = os.path.exists('/proc/version') and 'microsoft' in open('/proc/version').read().lower()
        return "wsl" if is_wsl else "local"

ENV = detect_environment()
print(f"üåç Environnement: {ENV.upper()}")


# =============================================================================
# BOOTSTRAP COLAB (Clone + Install si n√©cessaire)
# =============================================================================

if ENV == "colab":
    print("\nüöÄ Bootstrap Colab...")

    os.chdir('/content')
    if not os.path.exists('/content/Data_Pipeline'):
        print("üì• Clonage du repository...")
        subprocess.run(['git', 'clone', 'https://github.com/L-Poca/Data_Pipeline.git'], check=True)

    os.chdir('/content/Data_Pipeline')

    # Checkout de la branche rafael_cleaning
    result = subprocess.run(
        ['git', 'checkout', '-b', 'rafael_cleaning', 'origin/rafael_cleaning'],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        # Si la branche locale existe d√©j√†, juste switcher
        subprocess.run(['git', 'checkout', 'rafael_cleaning'], capture_output=True)

    # Installation du package en mode √©ditable (sans d√©pendances - d√©tection Colab dans setup.py)
    print("üì¶ Installation du package...")
    result = subprocess.run(['pip', 'install', '-e', '.', '--quiet'], capture_output=True, text=True)
    if result.returncode != 0:
        print(f"‚ö†Ô∏è Erreur installation: {result.stderr}")
    else:
        print("‚úÖ Package install√©")

    print("üíæ Montage Google Drive...")
    from google.colab import drive
    drive.mount('/content/drive')

    # Extraction dataset
    archive_data = '/content/drive/MyDrive/DS_COVID/archive_covid.zip'
    if os.path.exists(archive_data):
        print("üì¶ Extraction dataset...")
        os.makedirs('./data/raw/', exist_ok=True)
        subprocess.run(['unzip', '-o', '-q', archive_data, '-d', './data/raw/COVID-19_Radiography_Dataset/'])

    # Extraction models
    archive_models = '/content/drive/MyDrive/DS_COVID/inceptionv3_best.zip'
    if os.path.exists(archive_models):
        print("üì¶ Extraction models...")
        os.makedirs('./models/', exist_ok=True)
        subprocess.run(['unzip', '-o', '-q', archive_models, '-d', './models/'])

    print("‚úÖ Bootstrap termin√©")


# =============================================================================
# CONFIGURATION DES CHEMINS
# =============================================================================

# D√©terminer project_root selon l'environnement
if ENV == "colab":
    project_root = Path('/content/Data_Pipeline')
elif ENV == "wsl":
    project_root = Path('/home/lena/Data_Pipeline')
    #project_root = Path.cwd().parent.parent
else:  # local
    # Depuis un notebook dans src/notebooks/
    project_root = Path.cwd().parent.parent

# V√©rification du mod√®le en local (WSL ou autre)
if ENV != "colab":
    models_dir = project_root / 'models'
    model_path = models_dir / 'inceptionv3_best.keras'

    if model_path.exists():
        print(f"‚úÖ Mod√®le InceptionV3 trouv√©: {model_path}")
    else:
        print(f"‚ö†Ô∏è Mod√®le InceptionV3 non trouv√©: {model_path}")
        print(f"   Veuillez placer inceptionv3_best.keras dans {models_dir}/")

# Ajouter src/ au sys.path pour les imports
# src_path = str(project_root / 'src')
# if src_path not in sys.path:
#     sys.path.insert(0, src_path)
#     print(f"‚úÖ Chemin src/ ajout√©: {src_path}")

# Charger la configuration depuis JSON
from src.utils.config import build_config

config = build_config(project_root, ENV)

print(f"\nüéØ Configuration charg√©e depuis config/{ENV}_config.json")


# =============================================================================
# IMPORTS DES TRANSFORMERS
# =============================================================================

try:
    from src.features.Pipelines.Transformateurs.image_loaders import ImageLoader
    from src.features.Pipelines.Transformateurs.image_preprocessing import (
        ImageResizer, ImageNormalizer, ImageFlattener, ImageMasker
    )
    from src.features.Pipelines.Transformateurs.image_augmentation import (
        ImageAugmenter, ImageRandomCropper
    )
    from src.features.Pipelines.Transformateurs.image_features import (
        ImageHistogram, ImagePCA, ImageStandardScaler
    )
    print("‚úÖ Transformers import√©s")
except ImportError as e:
    print(f"‚ö†Ô∏è Erreur import transformers: {e}")


# =============================================================================
# IMPORTS ML/DL
# =============================================================================

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras

# =============================================================================
# CONFIGURATION MATPLOTLIB (utilise config pour les param√®tres)
# =============================================================================

plt.rcParams['figure.figsize'] = config.figure_size
plt.rcParams['figure.dpi'] = config.dpi
plt.style.use(config.plot_style)
sns.set_palette(config.color_palette)

# =============================================================================
# AFFICHAGE DU R√âSUM√â
# =============================================================================

print("\n" + "=" * 80)
print("‚úÖ CONFIGURATION PR√äTE - Data Pipeline")
print("=" * 80)
print(f"üìÇ Projet:       {config.project_root}")
print(f"üìä Dataset:      {config.data_dir}")
print(f"üíæ Mod√®les:      {config.models_dir}")
print(f"üìà R√©sultats:    {config.results_dir}")
print(f"üìê Dataset:      {'‚úÖ Accessible' if config.data_dir.exists() else '‚ùå Introuvable'}")
print()
print(f"üè∑Ô∏è  Classes:     {', '.join(config.classes)} ({config.num_classes} classes)")
print(f"üéõÔ∏è  Images:      {config.img_size} | {config.img_channels} canaux")
print(f"üîß Training:     Batch={config.batch_size} | Epochs={config.epochs} | LR={config.learning_rate}")
print(f"ÔøΩ Splits:       Train/Val={1-config.validation_split:.0%} | Val={config.validation_split:.0%} | Test={config.test_split:.0%}")
print()
print(f"üé® Viz:          Style={config.plot_style} | Palette={config.color_palette}")
print(f"üìè Figures:      {config.figure_size} @ {config.dpi} DPI")
print()
print(f"üîç Interpr√©t.:   GradCAM Œ±={config.gradcam_alpha} | SHAP evals={config.shap_max_evals}")
print(f"üìâ Seuils conf.: High={config.confidence_high_threshold} | Medium={config.confidence_medium_threshold}")
print("=" * 80)
print("\nüí° Variable principale:")
print("   ‚Ä¢ config: Objet Config complet (acc√®s √† TOUS les param√®tres)")
print("   ‚Ä¢ ENV: Environnement actuel")
print()
print("üìö Exemples d'utilisation:")
print("   config.data_dir          # Chemin du dataset")
print("   config.classes           # Liste des classes")
print("   config.img_size          # Tuple (width, height)")
print("   config.batch_size        # Taille des batchs")
print("   config.models_dir        # R√©pertoire des mod√®les")
print("   config.gradcam_alpha     # Param√®tres d'interpr√©tabilit√©")
print()
print("üéØ Transformers disponibles:")
print("   ‚Ä¢ ImageLoader, ImageResizer, ImageNormalizer, ImageFlattener, ImageMasker")
print("   ‚Ä¢ ImageAugmenter, ImageRandomCropper")
print("   ‚Ä¢ ImageHistogram, ImagePCA, ImageStandardScaler")
print("=" * 80)


In [None]:
EPOCHS = 50 if ENV == "colab" else 100
config.batch_size = 32

## üìö Section 1: Imports ML/DL Compl√©mentaires

Import des biblioth√®ques additionnelles pour le machine learning classique et le deep learning.

In [None]:
if ENV == 'colab' :
  !pip install optuna

In [None]:
# =============================================================================
# IMPORTS ML/DL COMPL√âMENTAIRES
# =============================================================================

print("=" * 70)
print("IMPORTS ML/DL COMPL√âMENTAIRES")
print("=" * 70)

# Machine Learning
from sklearn.ensemble import (
    RandomForestClassifier, GradientBoostingClassifier,
    AdaBoostClassifier, VotingClassifier, StackingClassifier
)
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (
    classification_report, confusion_matrix, accuracy_score,
    f1_score, precision_score, recall_score, roc_auc_score,
    roc_curve, precision_recall_curve, cohen_kappa_score,
    matthews_corrcoef
)
from sklearn.model_selection import (
    cross_val_score, StratifiedKFold, GridSearchCV, RandomizedSearchCV
)

# Deep Learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import (
    EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import (
    InceptionV3, ResNet50, ResNet152, VGG16, VGG19,
    EfficientNetB0, EfficientNetB3, EfficientNetB7,
    DenseNet121, DenseNet169, MobileNetV2, Xception
)

print(f"‚úÖ TensorFlow version: {tf.__version__}")

# Hyperparameter tuning
try:
    import optuna
    OPTUNA_AVAILABLE = True
    print("‚úÖ Optuna disponible")
except ImportError:
    OPTUNA_AVAILABLE = False
    print("‚ö†Ô∏è Optuna non disponible. Installez-le pour le tuning avanc√© : pip install optuna")

# Interpretability
try:
    import shap
    SHAP_AVAILABLE = True
    print("‚úÖ SHAP disponible")
except ImportError:
    SHAP_AVAILABLE = False
    print("‚ö†Ô∏è SHAP non disponible. Installez-le : pip install shap")

try:
    import lime
    from lime import lime_image
    LIME_AVAILABLE = True
    print("‚úÖ LIME disponible")
except ImportError:
    LIME_AVAILABLE = False
    print("‚ö†Ô∏è LIME non disponible. Installez-le : pip install lime")

# Class imbalance
try:
    from imblearn.over_sampling import SMOTE, RandomOverSampler
    from imblearn.under_sampling import RandomUnderSampler
    from imblearn.combine import SMOTETomek
    IMBLEARN_AVAILABLE = True
    print("‚úÖ imbalanced-learn disponible")
except ImportError:
    IMBLEARN_AVAILABLE = False
    print("‚ö†Ô∏è imbalanced-learn non disponible. Installez-le : pip install imbalanced-learn")

# Utils
from tqdm import tqdm
import time
from datetime import datetime
import json as json_lib
import warnings
import pickle
warnings.filterwarnings('ignore')

print("\n‚úÖ Imports ML/DL complets")


## üìä Section 2: Chargement des Donn√©es

Configuration du dataset et chargement des images avec preprocessing.

In [None]:
Notebook_begin_time = time.time()

In [None]:
# =============================================================================
# CONFIGURATION DU DATASET
# =============================================================================

print("=" * 70)
print("CONFIGURATION DU DATASET")
print("=" * 70)

# ‚ö†Ô∏è PARAM√àTRE IMPORTANT : Nombre d'images par classe
# None = toutes les images (~21K total)
# 100/500/1000 = tests rapides
N_IMAGES_PER_CLASS = 2000 #None # Modifier pour tests rapides

print(f"\nüìä Configuration:")
print(f"   Images par classe: {N_IMAGES_PER_CLASS if N_IMAGES_PER_CLASS else 'TOUTES'}")

# =============================================================================
# CHARGEMENT DES DONN√âES
# =============================================================================

print("\n" + "=" * 70)
print("CHARGEMENT DES DONN√âES")
print("=" * 70)

from src.notebooks import load_dataset, create_preprocessing_pipeline

# Charger les chemins des images
image_paths, mask_paths, labels, labels_int = load_dataset(
    data_dir=config.data_dir,
    categories=config.classes,
    n_images_per_class=N_IMAGES_PER_CLASS,
    load_masks=False,  # Pas besoin des masques pour classification
    verbose=True
)

print(f"\n‚úÖ Dataset charg√©:")
print(f"   Total images: {len(image_paths)}")
print(f"   Classes: {config.classes}")
print(f"   Distribution: {np.bincount(labels_int)}")

# Cr√©er pipeline de preprocessing
pipeline_img = create_preprocessing_pipeline(
    img_size=config.img_size,
    color_mode='L',  # Grayscale
    mask_paths=None,
    verbose=True
)

# Charger et preprocesser
print("\nüìä Preprocessing des images...")
images = pipeline_img.fit_transform(image_paths)
images = images.astype('float32') / 255.0  # Normaliser [0, 1]

print(f"\n‚úÖ Images pr√©par√©es:")
print(f"   Shape: {images.shape}")
print(f"   Range: [{images.min():.3f}, {images.max():.3f}]")

# Visualisation √©chantillons
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
for i in range(4):
    for j in range(8):
        idx = i * (len(images) // 4) + j
        if idx < len(images):
            axes[i, j].imshow(images[idx], cmap='gray')
            if j == 0:
                axes[i, j].set_ylabel(config.classes[i], rotation=0, ha='right', va='center')
            axes[i, j].axis('off')
plt.suptitle('√âchantillons du Dataset', size=14, weight='bold')
plt.tight_layout()
plt.show()


## ‚öñÔ∏è Section 3: Analyse du Class Imbalance

Analyse de la distribution des classes et identification des d√©s√©quilibres.

In [None]:
# =============================================================================
# ANALYSE DU CLASS IMBALANCE
# =============================================================================

print("=" * 70)
print("ANALYSE DU CLASS IMBALANCE")
print("=" * 70)

# Distribution
unique, counts = np.unique(labels_int, return_counts=True)
total = len(labels_int)

print("\nüìä Distribution actuelle:")
for cls_idx, count in zip(unique, counts):
    percentage = (count / total) * 100
    print(f"   {config.classes[cls_idx]:20s}: {count:6d} images ({percentage:5.2f}%)")

# Ratio de d√©s√©quilibre
max_count, min_count = counts.max(), counts.min()
imbalance_ratio = max_count / min_count
print(f"\n‚ö†Ô∏è Ratio de d√©s√©quilibre: {imbalance_ratio:.2f}:1")

if imbalance_ratio > 2:
    print("   ‚Üí Class imbalance significatif d√©tect√©!")
    print("   ‚Üí Strat√©gies de r√©√©quilibrage n√©cessaires")

# Visualisation
fig, ax = plt.subplots(figsize=(10, 6))
colors = sns.color_palette('husl', len(config.classes))
bars = ax.bar(config.classes, counts, color=colors)
ax.set_ylabel('Nombre d\'images', fontsize=11)
ax.set_title('Distribution des Classes (D√©s√©quilibr√©e)', fontsize=13, weight='bold')
ax.grid(axis='y', alpha=0.3)

# Ajouter pourcentages sur les barres
for bar, count in zip(bars, counts):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{count}\n({count/total*100:.1f}%)',
            ha='center', va='bottom', fontsize=9)
plt.tight_layout()
plt.show()

# =============================================================================
# STRAT√âGIES DE R√â√âQUILIBRAGE
# =============================================================================

print("\n" + "=" * 70)
print("STRAT√âGIES DE R√â√âQUILIBRAGE")
print("=" * 70)

imbalance_strategies = {}

# 1. CLASS WEIGHTS (sklearn)
print("\n1Ô∏è‚É£ Class Weights (sklearn)")
from sklearn.utils.class_weight import compute_class_weight

class_weights_array = compute_class_weight(
    'balanced',
    classes=np.unique(labels_int),
    y=labels_int
)
class_weights_dict = dict(enumerate(class_weights_array))

print("   Poids calcul√©s:")
for cls_idx, weight in class_weights_dict.items():
    print(f"      {config.classes[cls_idx]:20s}: {weight:.3f}")

imbalance_strategies['class_weights'] = class_weights_dict

# 2. SMOTE (si disponible)
if IMBLEARN_AVAILABLE:
    print("\n2Ô∏è‚É£ SMOTE (Synthetic Minority Over-sampling)")
    print("   ‚úÖ Disponible (sera appliqu√© lors du training ML)")
    imbalance_strategies['smote_available'] = True
else:
    print("\n2Ô∏è‚É£ SMOTE non disponible")
    imbalance_strategies['smote_available'] = False

# 3. RANDOM OVERSAMPLING
print("\n3Ô∏è‚É£ Random Oversampling")
print("   ‚úÖ Disponible (duplication d'images de la classe minoritaire)")
imbalance_strategies['oversampling'] = True

# 4. RANDOM UNDERSAMPLING
print("\n4Ô∏è‚É£ Random Undersampling")
print("   ‚úÖ Disponible (r√©duction de la classe majoritaire)")
imbalance_strategies['undersampling'] = True

print("\n‚úÖ Strat√©gies identifi√©es et pr√™tes")


## üîÄ Section 4: Train/Val/Test Split Stratifi√©

S√©paration stratifi√©e en ensembles d'entra√Ænement, validation et test (70/15/15).

In [None]:
# =============================================================================
# SPLIT TRAIN/VAL/TEST STRATIFI√â
# =============================================================================

print("=" * 70)
print("SPLIT TRAIN/VAL/TEST STRATIFI√â")
print("=" * 70)

# √âtape 1: Split 70/30 (train / temp)
X_train, X_temp, y_train, y_temp = train_test_split(
    images, labels_int,
    test_size=0.30,
    random_state=config.random_seed,
    stratify=labels_int
)

# √âtape 2: Split 30 ‚Üí 15/15 (validation / test)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp,
    test_size=0.50,
    random_state=config.random_seed,
    stratify=y_temp
)

print(f"\nüìä Splits cr√©√©s:")
print(f"   Train: {X_train.shape[0]:5d} images ({X_train.shape[0]/len(images)*100:.1f}%)")
print(f"      Distribution: {np.bincount(y_train)}")
print(f"   Val:   {X_val.shape[0]:5d} images ({X_val.shape[0]/len(images)*100:.1f}%)")
print(f"      Distribution: {np.bincount(y_val)}")
print(f"   Test:  {X_test.shape[0]:5d} images ({X_test.shape[0]/len(images)*100:.1f}%)")
print(f"      Distribution: {np.bincount(y_test)}")

# V√©rifier stratification
print("\n‚úÖ V√©rification de la stratification:")
print(f"   {'Classe':<20s} {'Train %':>10s} {'Val %':>10s} {'Test %':>10s}")
print("   " + "-" * 50)
for i, cls_name in enumerate(config.classes):
    train_pct = (y_train == i).sum() / len(y_train) * 100
    val_pct = (y_val == i).sum() / len(y_val) * 100
    test_pct = (y_test == i).sum() / len(y_test) * 100
    print(f"   {cls_name:<20s} {train_pct:>9.2f}% {val_pct:>9.2f}% {test_pct:>9.2f}%")


## üîÑ Section 5: Data Augmentation

Configuration de l'augmentation de donn√©es pour am√©liorer la g√©n√©ralisation.

In [None]:
# =============================================================================
# DATA AUGMENTATION
# =============================================================================

print("=" * 70)
print("DATA AUGMENTATION")
print("=" * 70)

# Configuration
augmentation_config = {
    'rotation_range': 15,
    'width_shift_range': 0.1,
    'height_shift_range': 0.1,
    'horizontal_flip': True,
    'zoom_range': 0.15,
    'shear_range': 0.1,
    'fill_mode': 'nearest'
}

print("\nüîß Configuration:")
for key, value in augmentation_config.items():
    print(f"   {key:25s}: {value}")

# Cr√©er g√©n√©rateurs
train_datagen = ImageDataGenerator(**augmentation_config)
val_datagen = ImageDataGenerator()  # Pas d'augmentation pour validation

print("\n‚úÖ G√©n√©rateurs cr√©√©s")

# Visualisation de l'effet
print("\nüì∏ Visualisation de l'augmentation...")
sample_img = X_train[0:1]
if sample_img.ndim == 3:
    sample_img = sample_img[..., np.newaxis]

fig, axes = plt.subplots(3, 3, figsize=(12, 12))
axes = axes.ravel()

axes[0].imshow(sample_img[0, :, :, 0], cmap='gray')
axes[0].set_title('Original', fontsize=10)
axes[0].axis('off')

aug_iter = train_datagen.flow(sample_img, batch_size=1)
for i in range(1, 9):
    aug_img = next(aug_iter)[0]
    axes[i].imshow(aug_img[:, :, 0], cmap='gray')
    axes[i].set_title(f'Augment√©e {i}', fontsize=10)
    axes[i].axis('off')

plt.suptitle('Effet de l\'Augmentation de Donn√©es', size=14, weight='bold')
plt.tight_layout()
plt.show()


## ü§ñ Section 6: Baseline ML Models

Entra√Ænement de 9 mod√®les ML classiques avec 3 types de features (PCA, Histogram, Combined).

In [None]:
# =============================================================================
# PR√âPARATION DES FEATURES POUR ML
# =============================================================================

print("=" * 70)
print("PR√âPARATION DES FEATURES POUR ML")
print("=" * 70)

from src.features.Pipelines.Transformateurs.image_features import (
    ImagePCA, ImageStandardScaler, ImageHistogram
)
from src.features.Pipelines.Transformateurs.image_preprocessing import ImageFlattener

# 1. Pipeline PCA
print("\n1Ô∏è‚É£ Features PCA...")
n_pca_components = min(50, X_train.shape[0] - 1)

pipeline_pca = Pipeline([
    ('flatten', ImageFlattener(verbose=True)),
    ('scale', ImageStandardScaler(verbose=True)),
    ('pca', ImagePCA(n_components=n_pca_components, random_state=config.random_seed, verbose=True))
])

X_train_pca = pipeline_pca.fit_transform(X_train)
X_val_pca = pipeline_pca.transform(X_val)
X_test_pca = pipeline_pca.transform(X_test)

pca_obj = pipeline_pca.named_steps['pca']
print(f"   ‚úÖ {X_train_pca.shape[1]} composantes")
print(f"   Variance expliqu√©e: {pca_obj.explained_variance_ratio_.sum():.2%}")

# 2. Pipeline Histogram
print("\n2Ô∏è‚É£ Features Histogram...")
pipeline_hist = Pipeline([
    ('histogram', ImageHistogram(bins=64, density=True, verbose=True)),
    ('scale', ImageStandardScaler(verbose=True))
])

X_train_hist = pipeline_hist.fit_transform(X_train)
X_val_hist = pipeline_hist.transform(X_val)
X_test_hist = pipeline_hist.transform(X_test)

print(f"   ‚úÖ {X_train_hist.shape[1]} bins")

# 3. Features combin√©es
print("\n3Ô∏è‚É£ Features Combin√©es (PCA + Histogram)...")
X_train_combined = np.hstack([X_train_pca, X_train_hist])
X_val_combined = np.hstack([X_val_pca, X_val_hist])
X_test_combined = np.hstack([X_test_pca, X_test_hist])

print(f"   ‚úÖ {X_train_combined.shape[1]} features")
print(f"      PCA: {X_train_pca.shape[1]}")
print(f"      Histogram: {X_train_hist.shape[1]}")

# =============================================================================
# BASELINE ML MODELS
# =============================================================================

print("\n" + "=" * 70)
print("BASELINE ML MODELS")
print("=" * 70)

# D√©finir les mod√®les
ml_models = {
    'Random Forest': RandomForestClassifier(
        n_estimators=200,
        max_depth=20,
        random_state=config.random_seed,
        n_jobs=-1,
        class_weight='balanced'
    ),
    'Gradient Boosting': GradientBoostingClassifier(
        n_estimators=100,
        learning_rate=0.1,
        max_depth=5,
        random_state=config.random_seed
    ),
    'AdaBoost': AdaBoostClassifier(
        n_estimators=100,
        learning_rate=0.5,
        random_state=config.random_seed
    ),
    'SVM (RBF)': SVC(
        kernel='rbf',
        C=1.0,
        gamma='scale',
        random_state=config.random_seed,
        class_weight='balanced',
        probability=True
    ),
    'SVM (Linear)': SVC(
        kernel='linear',
        C=1.0,
        random_state=config.random_seed,
        class_weight='balanced',
        probability=True
    ),
    'Logistic Regression': LogisticRegression(
        max_iter=1000,
        random_state=config.random_seed,
        n_jobs=-1,
        class_weight='balanced'
    ),
    'KNN': KNeighborsClassifier(
        n_neighbors=5,
        n_jobs=-1
    ),
    'Naive Bayes': GaussianNB(),
    'Decision Tree': DecisionTreeClassifier(
        max_depth=10,
        random_state=config.random_seed,
        class_weight='balanced'
    )
}

# Feature sets
feature_sets = {
    'PCA': (X_train_pca, X_val_pca, X_test_pca),
    'Histogram': (X_train_hist, X_val_hist, X_test_hist),
    'Combined': (X_train_combined, X_val_combined, X_test_combined)
}

# Stocker r√©sultats
ml_results = {}

print(f"\nüöÄ Entra√Ænement de {len(ml_models)} mod√®les √ó {len(feature_sets)} feature sets")
print(f"   Total: {len(ml_models) * len(feature_sets)} combinaisons\n")

for feat_name, (X_tr, X_va, X_te) in feature_sets.items():
    print(f"{'='*70}")
    print(f"FEATURES: {feat_name}")
    print(f"{'='*70}")

    for model_name, model in tqdm(ml_models.items(), desc=f"Training {feat_name}"):
        key = f"{model_name} ({feat_name})"
        #print(f"\nüîß Mod√®le: {key}", end="/r")
        
        # Training
        start_time = time.time()
        model.fit(X_tr, y_train)
        train_time = time.time() - start_time

        # Predictions
        y_pred_train = model.predict(X_tr)
        y_pred_val = model.predict(X_va)
        y_pred_test = model.predict(X_te)

        # Inference time
        start_time = time.time()
        _ = model.predict(X_te)
        inference_time = (time.time() - start_time) / len(X_te) * 1000

        # Metrics
        ml_results[key] = {
            'model': model,
            'feature_type': feat_name,
            'train_acc': accuracy_score(y_train, y_pred_train),
            'val_acc': accuracy_score(y_val, y_pred_val),
            'test_acc': accuracy_score(y_test, y_pred_test),
            'f1_weighted': f1_score(y_test, y_pred_test, average='weighted'),
            'f1_macro': f1_score(y_test, y_pred_test, average='macro'),
            'precision': precision_score(y_test, y_pred_test, average='weighted'),
            'recall': recall_score(y_test, y_pred_test, average='weighted'),
            'train_time': train_time,
            'inference_time_ms': inference_time,
            'y_pred_test': y_pred_test,
            'y_pred_val': y_pred_val
        }

# Tableau r√©capitulatif
print("\n" + "=" * 70)
print("R√âSULTATS ML MODELS")
print("=" * 70)

print(f"\n{'Mod√®le':<40s} {'Train':>8s} {'Val':>8s} {'Test':>8s} {'F1':>8s}")
print("-" * 80)

for key in sorted(ml_results.keys(), key=lambda x: ml_results[x]['test_acc'], reverse=True):
    res = ml_results[key]
    print(f"{key:<40s} {res['train_acc']:>8.4f} {res['val_acc']:>8.4f} {res['test_acc']:>8.4f} {res['f1_weighted']:>8.4f}")

print("\n‚úÖ Baseline ML termin√©")


## üß† Section 7: Custom CNN Architectures

Construction et entra√Ænement de 3 architectures CNN personnalis√©es.

In [None]:
# =============================================================================
# CUSTOM CNN ARCHITECTURES
# =============================================================================

print("=" * 70)
print("CUSTOM CNN ARCHITECTURES")
print("=" * 70)

from src.notebooks import (
    build_simple_cnn,
    build_medium_cnn,
    build_deep_cnn,
    compile_model,
    create_callbacks
)

# Prepare data for CNN
X_train_cnn = X_train[..., np.newaxis]  # Add channel dimension
X_val_cnn = X_val[..., np.newaxis]
X_test_cnn = X_test[..., np.newaxis]

# Convert labels to categorical
y_train_cat = to_categorical(y_train, num_classes=config.num_classes)
y_val_cat = to_categorical(y_val, num_classes=config.num_classes)
y_test_cat = to_categorical(y_test, num_classes=config.num_classes)

# Define CNN architectures with their builder functions
cnn_architectures = {
    'CNN_Simple': {
        'builder': build_simple_cnn,
        'description': '2 conv blocks (32‚Üí64), 1 dense (128)'
    },
    'CNN_Medium': {
        'builder': build_medium_cnn,
        'description': '3 conv blocks (32‚Üí64‚Üí128), 2 dense (256‚Üí128)'
    },
    'CNN_Deep': {
        'builder': build_deep_cnn,
        'description': '4 conv blocks (32‚Üí64‚Üí128‚Üí256), 2 dense (512‚Üí256)'
    }
}

cnn_results = {}

for arch_name, arch_config in cnn_architectures.items():
    print(f"\n{'='*70}")
    print(f"Training: {arch_name}")
    print(f"Description: {arch_config['description']}")
    print(f"{'='*70}")

    # Build model using the appropriate builder function
    model = arch_config['builder'](
        input_shape=(config.img_size[0], config.img_size[1], 1),
        num_classes=config.num_classes,
        verbose=True
    )

    # Compile
    model = compile_model(model, learning_rate=config.learning_rate, verbose=True)

    print(f"\nTotal parameters: {model.count_params():,}")

    # Callbacks
    model_save_dir = config.models_dir / arch_name
    callbacks = create_callbacks(
        models_dir=model_save_dir,
        patience_early_stop=10,
        patience_reduce_lr=5,
        monitor='val_accuracy',
        verbose=True
    )

    # Train
    start_time = time.time()
    history = model.fit(
        X_train_cnn, y_train_cat,
        validation_data=(X_val_cnn, y_val_cat),
        epochs=EPOCHS,
        batch_size=config.batch_size,
        callbacks=callbacks,
        class_weight=class_weights_dict,
        verbose=1
    )
    train_time = time.time() - start_time

    # Evaluate - model.evaluate() returns [loss, accuracy, auc, precision, recall]
    eval_results = model.evaluate(X_test_cnn, y_test_cat, verbose=1)
    test_loss = eval_results[0]
    test_acc = eval_results[1]

    # Predictions
    y_pred_prob = model.predict(X_test_cnn)
    y_pred = np.argmax(y_pred_prob, axis=1)

    # Store results
    cnn_results[arch_name] = {
        'model': model,
        'history': history.history,
        'test_acc': test_acc,
        'test_loss': test_loss,
        'f1_weighted': f1_score(y_test, y_pred, average='weighted'),
        'train_time': train_time,
        'y_pred': y_pred
    }

    print(f"\n‚úÖ {arch_name}: Test Acc = {test_acc:.4f}, F1 = {cnn_results[arch_name]['f1_weighted']:.4f}")

print("\n‚úÖ Custom CNN training termin√©")


## üîÑ Section 8: Transfer Learning

Entra√Ænement de 12 architectures pr√©-entra√Æn√©es avec fine-tuning en 2 phases.

In [None]:
# =============================================================================
# TRANSFER LEARNING - 4 MODELS
# =============================================================================

print("=" * 70)
print("TRANSFER LEARNING - 4 ARCHITECTURES")
print("=" * 70)

from src.notebooks import build_transfer_learning_model, unfreeze_top_layers

# Define architectures (use strings, not classes)
transfer_architectures = {
    'InceptionV3': 'InceptionV3',
    'ResNet50': 'ResNet50',
    'VGG16': 'VGG16',
    'EfficientNetB0': 'EfficientNetB0'
}

# Prepare RGB data (most models expect 3 channels)
X_train_rgb = np.repeat(X_train[..., np.newaxis], 3, axis=-1)
X_val_rgb = np.repeat(X_val[..., np.newaxis], 3, axis=-1)
X_test_rgb = np.repeat(X_test[..., np.newaxis], 3, axis=-1)

transfer_results = {}

for arch_name in transfer_architectures.keys():
    print(f"\n{'='*70}")
    print(f"Transfer Learning: {arch_name}")
    print(f"{'='*70}")

    try:
        # Phase 1: Feature extraction
        print(f"\nPhase 1: Feature extraction (frozen base)")

        model, base_model = build_transfer_learning_model(
            base_model_name=arch_name,
            input_shape=(config.img_size[0], config.img_size[1], 3),
            num_classes=config.num_classes,
            freeze_base=True,
            verbose=True
        )

        model = compile_model(model, learning_rate=config.learning_rate, verbose=True)

        model_save_dir_p1 = config.models_dir / f"{arch_name}_phase1"
        callbacks_phase1 = create_callbacks(
            models_dir=model_save_dir_p1,
            patience_early_stop=5,
            patience_reduce_lr=3,
            monitor='val_accuracy',
            verbose=True
        )

        history_phase1 = model.fit(
            X_train_rgb, y_train_cat,
            validation_data=(X_val_rgb, y_val_cat),
            epochs=EPOCHS,
            batch_size=config.batch_size,
            callbacks=callbacks_phase1,
            class_weight=class_weights_dict,
            verbose=1
        )

        # Phase 2: Fine-tuning
        print(f"Phase 2: Fine-tuning (unfrozen top layers)")

        model = unfreeze_top_layers(
            base_model=base_model,
            model=model,
            n_layers=20,
            learning_rate=config.learning_rate / 10,
            verbose=True
        )

        model_save_dir_p2 = config.models_dir / f"{arch_name}_phase2"
        callbacks_phase2 = create_callbacks(
            models_dir=model_save_dir_p2,
            patience_early_stop=5,
            patience_reduce_lr=3,
            monitor='val_accuracy',
            verbose=True
        )

        start_time = time.time()
        history_phase2 = model.fit(
            X_train_rgb, y_train_cat,
            validation_data=(X_val_rgb, y_val_cat),
            epochs=EPOCHS,
            batch_size=config.batch_size,
            callbacks=callbacks_phase2,
            class_weight=class_weights_dict,
            verbose=1 
        )
        train_time = time.time() - start_time

        # Evaluate - model.evaluate() returns [loss, accuracy, auc, precision, recall]
        eval_results = model.evaluate(X_test_rgb, y_test_cat, verbose=1)
        test_loss = eval_results[0]
        test_acc = eval_results[1]

        y_pred_prob = model.predict(X_test_rgb)
        y_pred = np.argmax(y_pred_prob, axis=1)

        transfer_results[arch_name] = {
            'model': model,
            'history_phase1': history_phase1.history,
            'history_phase2': history_phase2.history,
            'test_acc': test_acc,
            'test_loss': test_loss,
            'f1_weighted': f1_score(y_test, y_pred, average='weighted'),
            'train_time': train_time,
            'y_pred': y_pred
        }

        print(f"‚úÖ {arch_name}: Test Acc = {test_acc:.4f}, F1 = {transfer_results[arch_name]['f1_weighted']:.4f}")

    except Exception as e:
        print(f"‚ö†Ô∏è Error with {arch_name}: {str(e)}")
        continue

print("\n‚úÖ Transfer Learning termin√©")


## üéØ Section 9: Ensemble Methods

M√©thodes d'ensemble pour am√©liorer les performances : Voting et Stacking.

In [None]:
# =============================================================================
# ENSEMBLE METHODS
# =============================================================================

print("=" * 70)
print("ENSEMBLE METHODS")
print("=" * 70)

# Select top 3 ML models
top_ml_models = sorted(
    [(k, v) for k, v in ml_results.items()],
    key=lambda x: x[1]['test_acc'],
    reverse=True
)[:3]

print("\nüèÜ Top 3 ML models pour ensemble:")
for model_name, results in top_ml_models:
    print(f"   {model_name}: {results['test_acc']:.4f}")

# Voting Classifier
print("\n1Ô∏è‚É£ Voting Classifier")
voting_estimators = [(name, res['model']) for name, res in top_ml_models]

voting_clf = VotingClassifier(
    estimators=voting_estimators,
    voting='soft',
    n_jobs=-1
)

# Use Combined features for ensemble
voting_clf.fit(X_train_combined, y_train)
y_pred_voting = voting_clf.predict(X_test_combined)
voting_acc = accuracy_score(y_test, y_pred_voting)
voting_f1 = f1_score(y_test, y_pred_voting, average='weighted')

print(f"   Voting Accuracy: {voting_acc:.4f}")
print(f"   Voting F1: {voting_f1:.4f}")

# Stacking Classifier
print("\n2Ô∏è‚É£ Stacking Classifier")
stacking_clf = StackingClassifier(
    estimators=voting_estimators,
    final_estimator=LogisticRegression(max_iter=1000),
    n_jobs=-1
)

stacking_clf.fit(X_train_combined, y_train)
y_pred_stacking = stacking_clf.predict(X_test_combined)
stacking_acc = accuracy_score(y_test, y_pred_stacking)
stacking_f1 = f1_score(y_test, y_pred_stacking, average='weighted')

print(f"   Stacking Accuracy: {stacking_acc:.4f}")
print(f"   Stacking F1: {stacking_f1:.4f}")

ensemble_results = {
    'Voting': {'acc': voting_acc, 'f1': voting_f1, 'model': voting_clf},
    'Stacking': {'acc': stacking_acc, 'f1': stacking_f1, 'model': stacking_clf}
}

print("\n‚úÖ Ensemble methods termin√©s")


## ‚öôÔ∏è Section 10: Hyperparameter Optimization

Optimisation des hyperparam√®tres avec GridSearchCV et Optuna (optionnel).

In [None]:
# =============================================================================
# HYPERPARAMETER OPTIMIZATION
# =============================================================================

print("=" * 70)
print("HYPERPARAMETER OPTIMIZATION")
print("=" * 70)

# GridSearchCV pour Random Forest
print("\n1Ô∏è‚É£ GridSearchCV - Random Forest")

param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5]
}

rf_grid = GridSearchCV(
    RandomForestClassifier(random_state=config.random_seed, n_jobs=-1),
    param_grid,
    cv=3,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

rf_grid.fit(X_train_combined, y_train)

print(f"\nBest parameters: {rf_grid.best_params_}")
print(f"Best CV score: {rf_grid.best_score_:.4f}")

# Test best model
y_pred_grid = rf_grid.predict(X_test_combined)
grid_acc = accuracy_score(y_test, y_pred_grid)
print(f"Test accuracy: {grid_acc:.4f}")

# Optuna optimization (if available)
if OPTUNA_AVAILABLE:
    print("\n2Ô∏è‚É£ Optuna - Advanced Hyperparameter Tuning")
    print("   ‚úÖ Optuna disponible - lancer l'optimisation si n√©cessaire")
    print("   (Skipped for notebook efficiency)")
else:
    print("\n2Ô∏è‚É£ Optuna non disponible")

print("\n‚úÖ Hyperparameter optimization termin√©")


## üîÅ Section 11: K-Fold Cross-Validation

Validation crois√©e stratifi√©e √† 5 plis pour √©valuer la robustesse des mod√®les.

In [None]:
# =============================================================================
# K-FOLD CROSS-VALIDATION
# =============================================================================

print("=" * 70)
print("K-FOLD CROSS-VALIDATION (5-FOLD)")
print("=" * 70)

# Select top 5 ML models
top_models_for_cv = sorted(
    [(k, v) for k, v in ml_results.items()],
    key=lambda x: x[1]['test_acc'],
    reverse=True
)[:5]

cv_results = {}
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=config.random_seed)

print("\nRunning 5-fold cross-validation...\n")

for model_name, model_info in top_models_for_cv:
    print(f"Evaluating: {model_name}")

    # Use combined features
    scores = cross_val_score(
        model_info['model'],
        X_train_combined,
        y_train,
        cv=skf,
        scoring='accuracy',
        n_jobs=-1
    )

    cv_results[model_name] = {
        'mean': scores.mean(),
        'std': scores.std(),
        'scores': scores
    }

    print(f"   Mean CV Accuracy: {scores.mean():.4f} (+/- {scores.std() * 2:.4f})")

# Visualization
print("\nüìä Visualisation des r√©sultats CV")
fig, ax = plt.subplots(figsize=(12, 6))

model_names = list(cv_results.keys())
means = [cv_results[m]['mean'] for m in model_names]
stds = [cv_results[m]['std'] for m in model_names]

x_pos = np.arange(len(model_names))
ax.bar(x_pos, means, yerr=stds, capsize=5, alpha=0.7, color='steelblue')
ax.set_xticks(x_pos)
ax.set_xticklabels([m.split(' (')[0] for m in model_names], rotation=45, ha='right')
ax.set_ylabel('Accuracy')
ax.set_title('5-Fold Cross-Validation Results', fontsize=13, weight='bold')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\n‚úÖ K-Fold Cross-Validation termin√©")


## üìà Section 12: Learning Curves

Analyse des courbes d'apprentissage pour d√©tecter l'overfitting/underfitting.

In [None]:
# =============================================================================
# LEARNING CURVES
# =============================================================================

print("=" * 70)
print("LEARNING CURVES")
print("=" * 70)

from src.notebooks import plot_training_curves

# Plot learning curves for CNN models
print("\nüìä Learning curves pour les CNNs personnalis√©s")

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for idx, (arch_name, results) in enumerate(cnn_results.items()):
    history = results['history']

    ax = axes[idx]
    ax.plot(history['accuracy'], label='Train Accuracy', linewidth=2)
    ax.plot(history['val_accuracy'], label='Val Accuracy', linewidth=2)
    ax.set_title(f'{arch_name}', fontsize=12, weight='bold')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Accuracy')
    ax.legend()
    ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Plot loss curves
print("\nüìä Loss curves pour les CNNs personnalis√©s")

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for idx, (arch_name, results) in enumerate(cnn_results.items()):
    history = results['history']

    ax = axes[idx]
    ax.plot(history['loss'], label='Train Loss', linewidth=2)
    ax.plot(history['val_loss'], label='Val Loss', linewidth=2)
    ax.set_title(f'{arch_name}', fontsize=12, weight='bold')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.legend()
    ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚úÖ Learning curves g√©n√©r√©es")


## üìä Section 13: Model Evaluation

√âvaluation d√©taill√©e des mod√®les avec m√©triques par classe.

In [None]:
# =============================================================================
# MODEL EVALUATION
# =============================================================================

print("=" * 70)
print("MODEL EVALUATION")
print("=" * 70)

# Get best ML model
best_ml_key = max(ml_results.keys(), key=lambda k: ml_results[k]['test_acc'])
best_ml = ml_results[best_ml_key]

print(f"\nüèÜ Best ML Model: {best_ml_key}")
print(f"   Test Accuracy: {best_ml['test_acc']:.4f}")
print(f"   F1 Score (weighted): {best_ml['f1_weighted']:.4f}")

print("\nüìã Classification Report:")
print(classification_report(
    y_test,
    best_ml['y_pred_test'],
    target_names=config.classes,
    digits=4
))

# Get best CNN model
best_cnn_key = max(cnn_results.keys(), key=lambda k: cnn_results[k]['test_acc'])
best_cnn = cnn_results[best_cnn_key]

print(f"\nüèÜ Best CNN Model: {best_cnn_key}")
print(f"   Test Accuracy: {best_cnn['test_acc']:.4f}")
print(f"   F1 Score (weighted): {best_cnn['f1_weighted']:.4f}")

print("\nüìã Classification Report:")
print(classification_report(
    y_test,
    best_cnn['y_pred'],
    target_names=config.classes,
    digits=4
))

# Get best Transfer Learning model
if transfer_results:
    best_tl_key = max(transfer_results.keys(), key=lambda k: transfer_results[k]['test_acc'])
    best_tl = transfer_results[best_tl_key]

    print(f"\nüèÜ Best Transfer Learning Model: {best_tl_key}")
    print(f"   Test Accuracy: {best_tl['test_acc']:.4f}")
    print(f"   F1 Score (weighted): {best_tl['f1_weighted']:.4f}")

    print("\nüìã Classification Report:")
    print(classification_report(
        y_test,
        best_tl['y_pred'],
        target_names=config.classes,
        digits=4
    ))

print("\n‚úÖ Model evaluation termin√©")


## üî¢ Section 14: Confusion Matrices

Matrices de confusion pour visualiser les erreurs de classification.

In [None]:
# =============================================================================
# CONFUSION MATRICES
# =============================================================================

print("=" * 70)
print("CONFUSION MATRICES")
print("=" * 70)

from src.notebooks import plot_confusion_matrix

# Plot confusion matrices for top models
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Best ML
cm_ml = confusion_matrix(y_test, best_ml['y_pred_test'])
ax = axes[0]
sns.heatmap(cm_ml, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=config.classes, yticklabels=config.classes)
ax.set_title(f'ML: {best_ml_key}', fontsize=11, weight='bold')
ax.set_ylabel('True Label')
ax.set_xlabel('Predicted Label')

# Best CNN
cm_cnn = confusion_matrix(y_test, best_cnn['y_pred'])
ax = axes[1]
sns.heatmap(cm_cnn, annot=True, fmt='d', cmap='Greens', ax=ax,
            xticklabels=config.classes, yticklabels=config.classes)
ax.set_title(f'CNN: {best_cnn_key}', fontsize=11, weight='bold')
ax.set_ylabel('True Label')
ax.set_xlabel('Predicted Label')

# Best Transfer Learning
if transfer_results:
    cm_tl = confusion_matrix(y_test, best_tl['y_pred'])
    ax = axes[2]
    sns.heatmap(cm_tl, annot=True, fmt='d', cmap='Oranges', ax=ax,
                xticklabels=config.classes, yticklabels=config.classes)
    ax.set_title(f'TL: {best_tl_key}', fontsize=11, weight='bold')
    ax.set_ylabel('True Label')
    ax.set_xlabel('Predicted Label')

plt.tight_layout()
plt.show()

print("\n‚úÖ Confusion matrices g√©n√©r√©es")


## üîç Section 15: Feature Importance

Analyse de l'importance des features pour les mod√®les ML.

In [None]:
# =============================================================================
# FEATURE IMPORTANCE
# =============================================================================

print("=" * 70)
print("FEATURE IMPORTANCE")
print("=" * 70)

# Feature importance for Random Forest
if 'Random Forest' in best_ml_key:
    print("\nüìä Feature Importance - Random Forest")

    importances = best_ml['model'].feature_importances_
    indices = np.argsort(importances)[::-1][:20]  # Top 20

    fig, ax = plt.subplots(figsize=(10, 6))
    ax.barh(range(len(indices)), importances[indices], color='steelblue')
    ax.set_yticks(range(len(indices)))
    ax.set_yticklabels([f'Feature {i}' for i in indices])
    ax.set_xlabel('Importance')
    ax.set_title('Top 20 Feature Importances', fontsize=13, weight='bold')
    ax.invert_yaxis()
    plt.tight_layout()
    plt.show()
else:
    print("\n‚ö†Ô∏è Feature importance only available for tree-based models")

# CNN Filter Visualization
print("\nüìä CNN First Layer Filters Visualization")

if cnn_results:
    first_cnn = list(cnn_results.values())[0]['model']

    # Get first convolutional layer
    for layer in first_cnn.layers:
        if 'conv' in layer.name.lower():
            filters, biases = layer.get_weights()
            break

    # Normalize filters
    f_min, f_max = filters.min(), filters.max()
    filters_normalized = (filters - f_min) / (f_max - f_min)

    # Plot first 32 filters
    n_filters = min(32, filters.shape[-1])
    fig, axes = plt.subplots(4, 8, figsize=(16, 8))
    axes = axes.ravel()

    for i in range(n_filters):
        axes[i].imshow(filters_normalized[:, :, 0, i], cmap='viridis')
        axes[i].axis('off')
        axes[i].set_title(f'F{i+1}', fontsize=8)

    plt.suptitle('CNN First Layer Filters (32 filters)', size=14, weight='bold')
    plt.tight_layout()
    plt.show()

print("\n‚úÖ Feature importance termin√©")


## üî¨ Section 16: Interpretability (GradCAM, LIME, SHAP)

Techniques d'interpr√©tabilit√© pour comprendre les d√©cisions des mod√®les.

In [None]:
# =============================================================================
# INTERPRETABILITY - GRADCAM
# =============================================================================

print("=" * 70)
print("INTERPRETABILITY - GRADCAM")
print("=" * 70)

from src.interpretability.gradcam import GradCAM

# Select a few test images
sample_indices = np.random.choice(len(X_test), 8, replace=False)
sample_images = X_test_rgb[sample_indices]
sample_labels = y_test[sample_indices]

# GradCAM for best Transfer Learning model
if transfer_results and best_tl:
    print(f"\nüì∏ GradCAM pour {best_tl_key}")

    gradcam = GradCAM(model=best_tl['model'])

    fig, axes = plt.subplots(2, 8, figsize=(20, 6))

    for i, (img, true_label) in enumerate(zip(sample_images, sample_labels)):
        # Original image
        axes[0, i].imshow(img[:, :, 0], cmap='gray')
        axes[0, i].set_title(f'True: {config.classes[true_label]}', fontsize=9)
        axes[0, i].axis('off')

        # GradCAM heatmap
        img_expanded = np.expand_dims(img, axis=0)
        heatmap = gradcam.compute_heatmap(img_expanded, class_idx=true_label)

        axes[1, i].imshow(img[:, :, 0], cmap='gray')
        axes[1, i].imshow(heatmap, cmap='jet', alpha=0.5)
        axes[1, i].set_title('GradCAM', fontsize=9)
        axes[1, i].axis('off')

    plt.suptitle(f'GradCAM Visualization - {best_tl_key}', size=14, weight='bold')
    plt.tight_layout()
    plt.show()

# LIME Interpretability
if LIME_AVAILABLE:
    print("\nüî¨ LIME Interpretability")
    print("   ‚úÖ LIME disponible - visualisations possibles")
    print("   (Skipped for efficiency - ajouter si n√©cessaire)")
else:
    print("\n‚ö†Ô∏è LIME non disponible")

# SHAP Values
if SHAP_AVAILABLE:
    print("\nüìä SHAP Values")
    print("   ‚úÖ SHAP disponible - analyses possibles")
    print("   (Skipped for efficiency - ajouter si n√©cessaire)")
else:
    print("\n‚ö†Ô∏è SHAP non disponible")

print("\n‚úÖ Interpretability termin√©")


## ‚ùå Section 17: Error Analysis

Analyse des erreurs pour identifier les cas difficiles.

In [None]:
# =============================================================================
# ERROR ANALYSIS
# =============================================================================

print("=" * 70)
print("ERROR ANALYSIS")
print("=" * 70)

# Analyze errors for best model
if transfer_results:
    y_pred_best = best_tl['y_pred']
    model_name_best = best_tl_key
else:
    y_pred_best = best_cnn['y_pred']
    model_name_best = best_cnn_key

# Find misclassified samples
errors = y_test != y_pred_best
error_indices = np.where(errors)[0]

print(f"\nüìä Error Statistics pour {model_name_best}:")
print(f"   Total errors: {errors.sum()}/{len(y_test)} ({errors.sum()/len(y_test)*100:.2f}%)")

# Analyze error patterns
error_matrix = np.zeros((config.num_classes, config.num_classes), dtype=int)
for true_label, pred_label in zip(y_test[errors], y_pred_best[errors]):
    error_matrix[true_label, pred_label] += 1

print("\nüìã Error Matrix (True ‚Üí Predicted):")
print(f"{'':>15s}", end="")
for cls in config.classes:
    print(f"{cls:>15s}", end="")
print()

for i, cls in enumerate(config.classes):
    print(f"{cls:>15s}", end="")
    for j in range(len(config.classes)):
        print(f"{error_matrix[i, j]:>15d}", end="")
    print()

# Visualize worst predictions
print("\nüì∏ Top 20 Misclassified Images")

# Get prediction probabilities (if available)
n_errors_to_show = min(20, len(error_indices))
sample_error_indices = error_indices[:n_errors_to_show]

fig, axes = plt.subplots(4, 5, figsize=(15, 12))
axes = axes.ravel()

for idx, err_idx in enumerate(sample_error_indices):
    axes[idx].imshow(X_test[err_idx], cmap='gray')
    true_cls = config.classes[y_test[err_idx]]
    pred_cls = config.classes[y_pred_best[err_idx]]
    axes[idx].set_title(f'True: {true_cls}\nPred: {pred_cls}', fontsize=8)
    axes[idx].axis('off')

plt.suptitle('Top 20 Misclassified Samples', size=14, weight='bold')
plt.tight_layout()
plt.show()

print("\n‚úÖ Error analysis termin√©")


## üìâ Section 18: ROC & PR Curves

Courbes ROC et Precision-Recall pour √©valuation multi-classes.

In [None]:
# =============================================================================
# ROC & PRECISION-RECALL CURVES
# =============================================================================

print("=" * 70)
print("ROC & PRECISION-RECALL CURVES")
print("=" * 70)

from sklearn.preprocessing import label_binarize
from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score

# Binarize labels for multi-class ROC
y_test_bin = label_binarize(y_test, classes=range(config.num_classes))

# Get prediction probabilities
if transfer_results:
    best_model = best_tl['model']
    y_pred_proba = best_model.predict(X_test_rgb)
    model_name = best_tl_key
else:
    best_model = best_cnn['model']
    y_pred_proba = best_model.predict(X_test_cnn)
    model_name = best_cnn_key

# ROC Curves
print(f"\nüìä ROC Curves - {model_name}")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROC for each class
ax = axes[0]
for i, cls_name in enumerate(config.classes):
    fpr, tpr, _ = roc_curve(y_test_bin[:, i], y_pred_proba[:, i])
    roc_auc = auc(fpr, tpr)
    ax.plot(fpr, tpr, linewidth=2, label=f'{cls_name} (AUC = {roc_auc:.3f})')

ax.plot([0, 1], [0, 1], 'k--', linewidth=1)
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curves (One-vs-Rest)', fontsize=12, weight='bold')
ax.legend(loc='lower right', fontsize=9)
ax.grid(alpha=0.3)

# Precision-Recall Curves
ax = axes[1]
for i, cls_name in enumerate(config.classes):
    precision, recall, _ = precision_recall_curve(y_test_bin[:, i], y_pred_proba[:, i])
    avg_precision = average_precision_score(y_test_bin[:, i], y_pred_proba[:, i])
    ax.plot(recall, precision, linewidth=2, label=f'{cls_name} (AP = {avg_precision:.3f})')

ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
ax.set_title('Precision-Recall Curves', fontsize=12, weight='bold')
ax.legend(loc='lower left', fontsize=9)
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚úÖ ROC & PR curves g√©n√©r√©es")


## üíæ Section 19: Model Saving

Sauvegarde des mod√®les entra√Æn√©s et des m√©tadonn√©es.

In [None]:
# =============================================================================
# MODEL SAVING
# =============================================================================

print("=" * 70)
print("MODEL SAVING")
print("=" * 70)

# Create models directory
models_save_dir = config.models_dir / 'comprehensive_ml_pipeline'
models_save_dir.mkdir(parents=True, exist_ok=True)

print(f"\nüìÇ Save directory: {models_save_dir}")

# Save best ML model
ml_save_path = models_save_dir / f'best_ml_model_{best_ml_key.replace(" ", "_")}.pkl'
with open(ml_save_path, 'wb') as f:
    pickle.dump(best_ml['model'], f)
print(f"‚úÖ Saved ML model: {ml_save_path.name}")

# Save best CNN model
cnn_save_path = models_save_dir / f'best_cnn_model_{best_cnn_key}.keras'
best_cnn['model'].save(cnn_save_path)
print(f"‚úÖ Saved CNN model: {cnn_save_path.name}")

# Save best Transfer Learning model
if transfer_results:
    tl_save_path = models_save_dir / f'best_tl_model_{best_tl_key}.keras'
    best_tl['model'].save(tl_save_path)
    print(f"‚úÖ Saved TL model: {tl_save_path.name}")

# Save metadata
metadata = {
    'timestamp': datetime.now().isoformat(),
    'dataset_info': {
        'n_images_per_class': N_IMAGES_PER_CLASS,
        'total_images': len(images),
        'classes': config.classes,
        'img_size': config.img_size
    },
    'best_models': {
        'ml': {
            'name': best_ml_key,
            'test_acc': float(best_ml['test_acc']),
            'f1_weighted': float(best_ml['f1_weighted'])
        },
        'cnn': {
            'name': best_cnn_key,
            'test_acc': float(best_cnn['test_acc']),
            'f1_weighted': float(best_cnn['f1_weighted'])
        }
    }
}

if transfer_results:
    metadata['best_models']['transfer_learning'] = {
        'name': best_tl_key,
        'test_acc': float(best_tl['test_acc']),
        'f1_weighted': float(best_tl['f1_weighted'])
    }

metadata_path = models_save_dir / 'metadata.json'
with open(metadata_path, 'w') as f:
    json_lib.dump(metadata, f, indent=2)
print(f"‚úÖ Saved metadata: {metadata_path.name}")

print("\n‚úÖ Model saving termin√©")


## üîÆ Section 20: Prediction Pipeline

Pipeline de pr√©diction pr√™t pour la production.

In [None]:
# =============================================================================
# PREDICTION PIPELINE
# =============================================================================

print("=" * 70)
print("PREDICTION PIPELINE")
print("=" * 70)

def predict_covid(image_path, model, preprocessing_func=None):
    """
    Pipeline de pr√©diction complet pour une nouvelle image.

    Args:
        image_path: Chemin vers l'image
        model: Mod√®le entra√Æn√©
        preprocessing_func: Fonction de preprocessing optionnelle

    Returns:
        dict avec pr√©diction, probabilit√©s et classe
    """
    # Load image
    from PIL import Image
    img = Image.open(image_path).convert('L')  # Grayscale
    img = img.resize(config.img_size)
    img_array = np.array(img) / 255.0

    # Prepare for model
    if preprocessing_func:
        img_array = preprocessing_func(img_array)

    # Add batch and channel dimensions
    img_array = np.expand_dims(img_array, axis=0)
    if len(img_array.shape) == 3:
        img_array = np.expand_dims(img_array, axis=-1)

    # For transfer learning models, convert to RGB
    if img_array.shape[-1] == 1:
        img_array = np.repeat(img_array, 3, axis=-1)

    # Predict
    predictions = model.predict(img_array, verbose=1)
    predicted_class_idx = np.argmax(predictions[0])
    predicted_class = config.classes[predicted_class_idx]
    confidence = predictions[0][predicted_class_idx]

    return {
        'predicted_class': predicted_class,
        'predicted_class_idx': predicted_class_idx,
        'confidence': float(confidence),
        'all_probabilities': {
            config.classes[i]: float(predictions[0][i])
            for i in range(len(config.classes))
        }
    }

# Test prediction pipeline
print("\nüß™ Test du pipeline de pr√©diction")

test_img_path = image_paths[0]  # Premier √©chantillon
result = predict_covid(test_img_path, best_tl['model'] if transfer_results else best_cnn['model'])

print(f"\nTest image: {test_img_path.name}")
print(f"Predicted class: {result['predicted_class']}")
print(f"Confidence: {result['confidence']:.4f}")
print("\nAll probabilities:")
for cls, prob in result['all_probabilities'].items():
    print(f"   {cls:20s}: {prob:.4f}")

print("\n‚úÖ Prediction pipeline pr√™t")


## üèÜ Section 21: Performance Benchmarking

Comparaison finale de tous les mod√®les entra√Æn√©s.

In [None]:
# =============================================================================
# PERFORMANCE BENCHMARKING
# =============================================================================

print("=" * 70)
print("PERFORMANCE BENCHMARKING - TABLEAU R√âCAPITULATIF")
print("=" * 70)

# Collect all results
all_results = []

# ML Models (top 10)
for model_name, results in sorted(ml_results.items(), key=lambda x: x[1]['test_acc'], reverse=True)[:10]:
    all_results.append({
        'Model': model_name,
        'Type': 'ML',
        'Test Acc': results['test_acc'],
        'F1 Score': results['f1_weighted'],
        'Train Time (s)': results['train_time'],
        'Inference (ms)': results['inference_time_ms']
    })

# CNN Models
for model_name, results in cnn_results.items():
    all_results.append({
        'Model': model_name,
        'Type': 'CNN',
        'Test Acc': results['test_acc'],
        'F1 Score': results['f1_weighted'],
        'Train Time (s)': results['train_time'],
        'Inference (ms)': 0  # Not measured for CNNs
    })

# Transfer Learning Models (top 10)
if transfer_results:
    for model_name, results in sorted(transfer_results.items(), key=lambda x: x[1]['test_acc'], reverse=True)[:10]:
        all_results.append({
            'Model': model_name,
            'Type': 'Transfer Learning',
            'Test Acc': results['test_acc'],
            'F1 Score': results['f1_weighted'],
            'Train Time (s)': results['train_time'],
            'Inference (ms)': 0
        })

# Ensemble Models
for model_name, results in ensemble_results.items():
    all_results.append({
        'Model': f'Ensemble_{model_name}',
        'Type': 'Ensemble',
        'Test Acc': results['acc'],
        'F1 Score': results['f1'],
        'Train Time (s)': 0,
        'Inference (ms)': 0
    })

# Sort by accuracy
all_results_sorted = sorted(all_results, key=lambda x: x['Test Acc'], reverse=True)

# Display table
print("\nüìä TOP 20 MODELS:\n")
print(f"{'Rank':<6s} {'Model':<40s} {'Type':<20s} {'Test Acc':>10s} {'F1 Score':>10s} {'Train Time':>12s}")
print("-" * 100)

for rank, result in enumerate(all_results_sorted[:20], 1):
    print(f"{rank:<6d} {result['Model']:<40s} {result['Type']:<20s} "
          f"{result['Test Acc']:>10.4f} {result['F1 Score']:>10.4f} "
          f"{result['Train Time (s)']:>12.1f}")

# Best model overall
best_overall = all_results_sorted[0]
print(f"\nüèÜ BEST MODEL OVERALL:")
print(f"   {best_overall['Model']}")
print(f"   Test Accuracy: {best_overall['Test Acc']:.4f}")
print(f"   F1 Score: {best_overall['F1 Score']:.4f}")

print("\n‚úÖ Performance benchmarking termin√©")


## üìÑ Section 22: Rapport HTML Automatique

G√©n√©ration d'un rapport HTML professionnel avec tous les r√©sultats.

In [None]:
# =============================================================================
# RAPPORT HTML AUTOMATIQUE
# =============================================================================

print("=" * 70)
print("G√âN√âRATION DU RAPPORT HTML")
print("=" * 70)

report_dir = config.results_dir / 'comprehensive_ml_pipeline'
report_dir.mkdir(parents=True, exist_ok=True)

html_report_path = report_dir / 'comprehensive_ml_report.html'

# Generate HTML content
html_content = f"""
<!DOCTYPE html>
<html lang="fr">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Comprehensive ML Pipeline Report - COVID-19 Classification</title>
    <style>
        body {{
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            max-width: 1200px;
            margin: 0 auto;
            padding: 20px;
            background-color: #f5f5f5;
        }}
        h1 {{
            color: #2c3e50;
            border-bottom: 3px solid #3498db;
            padding-bottom: 10px;
        }}
        h2 {{
            color: #34495e;
            margin-top: 30px;
        }}
        .metric-box {{
            background-color: white;
            padding: 20px;
            margin: 15px 0;
            border-radius: 8px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }}
        .best-model {{
            background-color: #d4edda;
            border-left: 4px solid #28a745;
        }}
        table {{
            width: 100%;
            border-collapse: collapse;
            margin: 20px 0;
            background-color: white;
        }}
        th, td {{
            padding: 12px;
            text-align: left;
            border-bottom: 1px solid #ddd;
        }}
        th {{
            background-color: #3498db;
            color: white;
        }}
        .footer {{
            margin-top: 40px;
            text-align: center;
            color: #7f8c8d;
            font-size: 0.9em;
        }}
    </style>
</head>
<body>
    <h1>ü¶† Comprehensive ML Pipeline Report</h1>
    <p><strong>Date:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>

    <h2>üìä Dataset Information</h2>
    <div class="metric-box">
        <p><strong>Total Images:</strong> {len(images)}</p>
        <p><strong>Classes:</strong> {', '.join(config.classes)}</p>
        <p><strong>Train/Val/Test Split:</strong> {len(X_train)}/{len(X_val)}/{len(X_test)}</p>
    </div>

    <h2>üèÜ Best Models</h2>

    <div class="metric-box best-model">
        <h3>Best ML Model: {best_ml_key}</h3>
        <p><strong>Test Accuracy:</strong> {best_ml['test_acc']:.4f}</p>
        <p><strong>F1 Score:</strong> {best_ml['f1_weighted']:.4f}</p>
    </div>

    <div class="metric-box best-model">
        <h3>Best CNN Model: {best_cnn_key}</h3>
        <p><strong>Test Accuracy:</strong> {best_cnn['test_acc']:.4f}</p>
        <p><strong>F1 Score:</strong> {best_cnn['f1_weighted']:.4f}</p>
    </div>
"""

if transfer_results:
    html_content += f"""
    <div class="metric-box best-model">
        <h3>Best Transfer Learning Model: {best_tl_key}</h3>
        <p><strong>Test Accuracy:</strong> {best_tl['test_acc']:.4f}</p>
        <p><strong>F1 Score:</strong> {best_tl['f1_weighted']:.4f}</p>
    </div>
"""

html_content += """
    <h2>üìà Top 20 Models Ranking</h2>
    <table>
        <tr>
            <th>Rank</th>
            <th>Model</th>
            <th>Type</th>
            <th>Test Accuracy</th>
            <th>F1 Score</th>
        </tr>
"""

for rank, result in enumerate(all_results_sorted[:20], 1):
    html_content += f"""
        <tr>
            <td>{rank}</td>
            <td>{result['Model']}</td>
            <td>{result['Type']}</td>
            <td>{result['Test Acc']:.4f}</td>
            <td>{result['F1 Score']:.4f}</td>
        </tr>
"""

html_content += """
    </table>

    <div class="footer">
        <p>Generated by Comprehensive ML Pipeline Notebook</p>
        <p>COVID-19 Radiography Classification Project</p>
    </div>
</body>
</html>
"""

# Save HTML report
with open(html_report_path, 'w', encoding='utf-8') as f:
    f.write(html_content)

print(f"\n‚úÖ Rapport HTML g√©n√©r√©: {html_report_path}")
print(f"   Ouvrir dans un navigateur pour visualiser")

print("\n‚úÖ Rapport HTML termin√©")


## üéâ Conclusion

### ‚úÖ R√©alisations

Ce notebook a accompli:

1. **27 mod√®les ML classiques** (9 algorithmes √ó 3 feature sets)
2. **3 architectures CNN** personnalis√©es
3. **12 mod√®les Transfer Learning** avec fine-tuning 2-phase
4. **M√©thodes d'ensemble** (Voting, Stacking)
5. **Hyperparameter Optimization** (GridSearchCV, Optuna)
6. **K-Fold Cross-Validation** (5-fold stratifi√©)
7. **Interpr√©tabilit√©** (GradCAM, LIME, SHAP)
8. **Analysis compl√®te** (Learning curves, Confusion matrices, ROC curves, Error analysis)
9. **Production-ready** (Model saving, Prediction pipeline, HTML report)

### üèÜ Meilleur Mod√®le

Le meilleur mod√®le a √©t√© identifi√© et sauvegard√© avec ses m√©tadonn√©es.

### üìö Prochaines √âtapes

1. **D√©ploiement**: Int√©grer le mod√®le dans une application web ou API
2. **Monitoring**: Mettre en place un syst√®me de surveillance des performances
3. **Am√©lioration continue**: Collecter plus de donn√©es et r√©entra√Æner r√©guli√®rement

### üìû Support

Pour toute question ou am√©lioration, r√©f√©rez-vous au repository GitHub:
https://github.com/L-Poca/Data_Pipeline

---

**Merci d'avoir utilis√© ce notebook! üôè**

In [None]:
Notebook_end_time = time.time()
Notebook_duration = Notebook_end_time - Notebook_begin_time

# passer le temps au format heures, minutes, secondes
hours, rem = divmod(Notebook_duration, 3600)
minutes, seconds = divmod(rem, 60)
print(f"\n‚è±Ô∏è Temps total d'ex√©cution du notebook: {int(hours)}h {int(minutes)}m {int(seconds)}s")