# üî¨ Advanced Machine Learning for COVID-19 Chest X-Ray Classification

## üìã Comprehensive Guide: Ensemble Methods & Deep Learning Fine-Tuning

**Auteurs**: √âquipe DS_COVID  
**Date**: 15 octobre 2025  
**Branche**: ReVamp

---

### üéØ **Objectifs du notebook**

Ce notebook pr√©sente une approche compl√®te du machine learning pour la classification de radiographies pulmonaires COVID-19, couvrant:

1. **üå≥ M√©thodes d'Ensemble** : Bagging, Boosting, Stacking
2. **üß† Deep Learning** : Transfer Learning & Fine-Tuning  
3. **‚öôÔ∏è Optimisation** : Hyperparam√®tres, Cross-validation
4. **üìä √âvaluation** : M√©triques avanc√©es, Visualisations

### üìÅ **Dataset**
- **Source**: COVID-19_Radiography_Dataset
- **Classes**: COVID, Normal, Lung_Opacity, Viral Pneumonia
- **Type**: Images radiographiques pulmonaires

---

### üóÇÔ∏è **Structure du notebook**

| Section | Technique | Description |
|---------|-----------|-------------|
| 1-3 | **Setup & EDA** | Chargement, exploration, preprocessing |
| 4-6 | **Ensemble Methods** | Bagging, Boosting, comparaisons |
| 7-10 | **Deep Learning** | CNN, Transfer Learning, Fine-Tuning |
| 11-13 | **Optimization & Evaluation** | Tuning, m√©triques, visualisations |

## üì¶ Section 1: Setup and Data Loading

## üöÄ Version Google Colab

**Pour utiliser ce notebook sur Google Colab, ex√©cutez cette cellule au lieu de la cellule de configuration normale :**

### üìã Instructions pour Google Colab

#### ‚úÖ Ce qui change sur Colab vs Local :

1. **üìÇ Gestion des donn√©es** :
   - Local : Dataset dans `data/raw/COVID-19_Radiography_Dataset/`
   - Colab : **Extraction automatique** d'`archive_covid.zip` depuis Drive

2. **üì¶ Installation** :
   - Local : `pip install -e .` fait une seule fois
   - Colab : Installation automatique dans la cellule

3. **üîß Configuration** :
   - Local : Chargement automatique `.env`
   - Colab : Package ds-covid + extraction ZIP

#### üöÄ Ce que fait automatiquement la cellule Colab :

1. **üì• Clone le repo** : `git clone https://github.com/L-Poca/DS_COVID.git`
2. **üì¶ Extrait archive_covid.zip** : Depuis `MyDrive/archive_covid.zip`
3. **üîç Recherche intelligente** : Trouve automatiquement le dossier COVID
4. ** Installe les d√©pendances** : `pip install -r requirements.txt`
5. **üîß Installe le package** : `pip install -e .` 
6. **‚öôÔ∏è Configure les chemins** : `DATA_DIR` pointe vers les donn√©es extraites
7. **‚úÖ V√©rifie les donn√©es** : Compte les images par classe

#### üíæ Votre fichier sur Drive :

```
Drive/
‚îú‚îÄ‚îÄ MyDrive/
‚îÇ   ‚îî‚îÄ‚îÄ archive_covid.zip  ‚Üê Votre fichier ZIP
```

**Le script fait automatiquement** :
1. **Extrait** `archive_covid.zip` vers `/content/temp_covid_extract/`
2. **Recherche** le dossier COVID (plusieurs patterns support√©s)
3. **D√©place** vers `/content/DS_COVID/data/raw/COVID-19_Radiography_Dataset/`
4. **Nettoie** le dossier temporaire

#### üîç Recherche intelligente :

Le script cherche automatiquement :
- `COVID-19_Radiography_Dataset/`
- Dossiers contenant `*COVID*`
- Dossiers contenant `*radiography*`
- Dossiers contenant `*chest*`

#### üí° Avantages :

- ‚úÖ **Simple** : Juste d√©poser archive_covid.zip dans Drive
- ‚úÖ **Automatique** : Extraction et organisation automatiques
- ‚úÖ **Robuste** : Recherche intelligente de la structure
- ‚úÖ **Rapide** : Pas besoin de cr√©er des dossiers manuellement

### üöÄ Optimisations Colab Pro

#### ‚ö° Param√®tres ambitieux pour GPU puissant :

**üñºÔ∏è Images :**
- `IMG_SIZE`: `(256, 256)` ‚Üê Puissance de 2 (optimal GPU)
- `BATCH_SIZE`: `64` ‚Üê Plus gros batch (GPU T4/V100)

**üéØ Training :**
- `EPOCHS`: `100` ‚Üê Training long (temps illimit√© Colab Pro)
- `MAX_IMAGES`: `5000` ‚Üê Dataset complet par classe

**ü§ñ ML traditionnel :**
- Random Forest : `500 estimateurs` (vs 200)
- XGBoost : `300 estimateurs` (vs 100)  
- Gradient Boosting : `300 estimateurs`
- CV : `5 folds` (vs 3)

**üß† Deep Learning :**
- Architectures : `EfficientNetB3`, `ResNet152V2`, `VGG19`, `DenseNet201`
- Mixed precision : `float16` (acc√©l√©ration GPU)
- Fine-tuning : `20 couches` (vs 10)
- Data augmentation : Rotation, zoom, flip, brightness

#### üí° Pourquoi ces optimisations ?

1. **GPU T4/V100** : Plus de m√©moire et compute ‚Üí batch size plus gros
2. **Temps illimit√©** : Colab Pro permet training long ‚Üí plus d'epochs
3. **Puissance de 2** : `256x256` optimise les op√©rations GPU vs `224x224`
4. **Mixed precision** : Acc√©l√©ration x1.5-2x sur GPU r√©cents
5. **Architectures avanc√©es** : Plus performantes qu'EfficientNetB0/ResNet50

In [None]:
# ===================================
# CONFIGURATION POUR GOOGLE COLAB
# ===================================
# ‚ö†Ô∏è Ex√©cutez cette cellule uniquement sur Google Colab
# Pour l'environnement local, utilisez la cellule suivante

import os
import sys

# V√©rification si on est sur Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("üåê Google Colab d√©tect√©")
    
    # 1. Connexion √† Google Drive (pour acc√©der aux donn√©es)
    from google.colab import drive
    drive.mount('/content/drive')
    
    # 2. Clonage du repository depuis GitHub avec la branche ReVamp
    repo_exists = os.path.exists('/content/DS_COVID')
    if not repo_exists:
        print("üì• Clonage du repository (branche ReVamp)...")
        !git clone -b ReVamp https://github.com/L-Poca/DS_COVID.git /content/DS_COVID
        print("‚úÖ Repository clon√© (branche ReVamp)")
    else:
        print("‚úÖ Repository d√©j√† pr√©sent")
        # S'assurer qu'on est sur la bonne branche
        os.chdir('/content/DS_COVID')
        print("üîÑ V√©rification de la branche...")
        !git fetch origin
        !git checkout ReVamp
        !git pull origin ReVamp
        print("‚úÖ Branche ReVamp mise √† jour")
    
    # 3. Changement vers le dossier du projet
    os.chdir('/content/DS_COVID')
    print(f"üìÅ Dossier courant: {os.getcwd()}")
    
    # V√©rification que pyproject.toml existe (preuve qu'on est sur ReVamp)
    if os.path.exists('pyproject.toml'):
        print("‚úÖ pyproject.toml trouv√© - Branche ReVamp confirm√©e")
    else:
        print("‚ö†Ô∏è pyproject.toml manquant - Probl√®me de branche!")
    
    # 4. Extraction de archive_covid.zip depuis Google Drive
    print("üìÅ Extraction de archive_covid.zip depuis Drive...")
    
    # Chemin vers votre fichier ZIP sur Drive
    zip_file_path = "/content/drive/MyDrive/archive_covid.zip"
    
    # Dossier de destination
    local_data_dir = "/content/DS_COVID/data/raw"
    local_data_path = f"{local_data_dir}/COVID-19_Radiography_Dataset"
    
    # Cr√©ation du dossier data/raw
    os.makedirs(local_data_dir, exist_ok=True)
    
    if os.path.exists(zip_file_path):
        print(f"‚úÖ Archive trouv√©e: {zip_file_path}")
        
        if not os.path.exists(local_data_path):
            print("üì¶ Extraction de l'archive COVID...")
            import zipfile
            import shutil
            import glob
            
            # Extraction dans un dossier temporaire
            temp_extract_path = "/content/temp_covid_extract"
            
            with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
                zip_ref.extractall(temp_extract_path)
                print(f"‚úÖ Archive extraite dans {temp_extract_path}")
            
            # Recherche intelligente du dossier COVID
            search_patterns = [
                f"{temp_extract_path}/**/COVID-19_Radiography_Dataset",
                f"{temp_extract_path}/**/*COVID*",
                f"{temp_extract_path}/**/*radiography*",
                f"{temp_extract_path}/**/*chest*"
            ]
            
            covid_source = None
            for pattern in search_patterns:
                possible_dirs = glob.glob(pattern, recursive=True)
                possible_dirs = [d for d in possible_dirs if os.path.isdir(d)]
                if possible_dirs:
                    covid_source = possible_dirs[0]
                    break
            
            if covid_source:
                print(f"‚úÖ Dossier COVID trouv√©: {covid_source}")
                
                # V√©rification de la structure (doit contenir COVID, Normal, etc.)
                subdirs = [d.name for d in Path(covid_source).iterdir() if d.is_dir()]
                print(f"   Sous-dossiers: {subdirs}")
                
                # D√©placement vers la destination finale
                if os.path.exists(local_data_path):
                    shutil.rmtree(local_data_path)
                shutil.move(covid_source, local_data_path)
                
                # Nettoyage du dossier temporaire  
                shutil.rmtree(temp_extract_path)
                print(f"‚úÖ Donn√©es COVID disponibles: {local_data_path}")
            else:
                print("‚ö†Ô∏è Structure du ZIP non reconnue automatiquement")
                # Affichage du contenu pour debug
                for root, dirs, files in os.walk(temp_extract_path):
                    if dirs:
                        print(f"   üìÅ {root}: {dirs[:3]}...")  # Premi√®res 3 dossiers
                        break
                print(f"   V√©rifiez manuellement dans: {temp_extract_path}")
        else:
            print(f"‚úÖ Donn√©es d√©j√† extraites: {local_data_path}")
    else:
        print(f"‚ùå Archive non trouv√©e: {zip_file_path}")
        print("üí° Solutions possibles:")
        print("   1. V√©rifiez que 'archive_covid.zip' est dans MyDrive/")
        print("   2. Ou modifiez zip_file_path avec le bon chemin")
        print("   3. Ou uploadez archive_covid.zip dans MyDrive/")
    
    # 5. Installation des d√©pendances depuis requirements.txt
    print("üì¶ Installation des d√©pendances...")
    !pip install -r requirements.txt
    
    # 6. Installation du package ds-covid en mode d√©veloppement
    print("üì¶ Installation du package ds-covid...")
    !pip install -e .
    
    # 7. V√©rification de l'installation et configuration
    try:
        import ds_covid
        from ds_covid import Settings
        print(f"‚úÖ Package ds-covid v{ds_covid.__version__} install√© avec succ√®s")
        
        # Configuration optimis√©e pour Colab Pro
        settings = Settings()
        
        # Adaptation des chemins pour Colab
        from pathlib import Path
        PROJECT_ROOT = Path('/content/DS_COVID')
        
        # V√©rification des donn√©es extraites
        if os.path.exists(local_data_path):
            # Structure normale: COVID-19_Radiography_Dataset/COVID-19_Radiography_Dataset/
            inner_covid_path = Path(local_data_path) / 'COVID-19_Radiography_Dataset'
            if inner_covid_path.exists():
                DATA_DIR = inner_covid_path
            else:
                DATA_DIR = Path(local_data_path)
        else:
            DATA_DIR = Path(local_data_path)
        
        MODELS_DIR = PROJECT_ROOT / 'models'
        RESULTS_DIR = PROJECT_ROOT / 'results'
        
        # Variables optimis√©es pour Colab Pro
        BATCH_SIZE = settings.training.batch_size      # 64
        EPOCHS = settings.training.epochs              # 100
        LEARNING_RATE = settings.training.learning_rate
        IMG_SIZE = settings.training.img_size          # (256, 256)
        IMG_CHANNELS = settings.training.img_channels
        TEST_SPLIT = settings.training.test_split
        VALIDATION_SPLIT = settings.training.validation_split
        RANDOM_SEED = settings.training.random_seed
        
        # Configuration avanc√©e
        MAX_IMAGES_PER_CLASS = settings.data.max_images_per_class
        AUGMENTATION_PARAMS = settings.data.augmentation_params
        DL_ARCHITECTURES = settings.deep_learning.architectures
        MIXED_PRECISION = settings.deep_learning.mixed_precision
        
        # Classes
        CLASSES = settings.data.class_names
        CLASS_MAPPING = settings.data.class_mapping
        NUM_CLASSES = settings.data.num_classes
        
        print(f"‚úÖ Configuration COLAB PRO:")
        print(f"   - Donn√©es: {DATA_DIR}")
        print(f"   - Classes: {CLASSES}")
        print(f"   - Image size: {IMG_SIZE}")
        print(f"   - Batch size: {BATCH_SIZE}")
        print(f"   - Epochs: {EPOCHS}")
        
    except ImportError as e:
        print(f"‚ö†Ô∏è Erreur package: {e}")
        # Configuration manuelle de fallback
        from pathlib import Path
        PROJECT_ROOT = Path('/content/DS_COVID')
        DATA_DIR = Path(local_data_path)
        MODELS_DIR = PROJECT_ROOT / 'models'
        RESULTS_DIR = PROJECT_ROOT / 'results'
        
        BATCH_SIZE = 64
        EPOCHS = 100
        LEARNING_RATE = 0.001
        IMG_SIZE = (256, 256)
        IMG_CHANNELS = 3
        TEST_SPLIT = 0.2
        VALIDATION_SPLIT = 0.2
        RANDOM_SEED = 42
        
        CLASSES = ['COVID', 'Lung_Opacity', 'Normal', 'Viral Pneumonia']
        CLASS_MAPPING = {'COVID': 0, 'Lung_Opacity': 1, 'Normal': 2, 'Viral Pneumonia': 3}
        NUM_CLASSES = len(CLASSES)
    
    # Cr√©ation des dossiers
    MODELS_DIR.mkdir(parents=True, exist_ok=True)
    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
    
    # V√©rification finale des donn√©es
    if DATA_DIR.exists():
        subdirs = [d.name for d in DATA_DIR.iterdir() if d.is_dir()]
        print(f"   - Dossiers trouv√©s: {subdirs}")
        
        # Comptage rapide des images
        total_images = 0
        for subdir in subdirs[:4]:  # Premi√®res 4 classes
            subdir_path = DATA_DIR / subdir
            if subdir_path.exists():
                images = list(subdir_path.glob('*.png')) + list(subdir_path.glob('*.jpg'))
                print(f"     {subdir}: {len(images)} images")
                total_images += len(images)
        print(f"   - Total: ~{total_images} images")
    else:
        print(f"   - ‚ö†Ô∏è Dossier non accessible: {DATA_DIR}")
    
else:
    print("üíª Environnement local d√©tect√© - utilisez la cellule de configuration normale")
    
# Imports communs (marchent partout)
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import cv2
from PIL import Image
import pickle
import joblib
from datetime import datetime
import json

warnings.filterwarnings('ignore')

# Machine Learning
import sklearn
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import BaggingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, roc_curve

# Deep Learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks, optimizers
from tensorflow.keras.applications import VGG16, ResNet50, EfficientNetB0
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

# Configuration GPU
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if physical_devices:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
    print(f"‚úÖ GPU d√©tect√©: {len(physical_devices)} device(s)")
else:
    print("‚ö†Ô∏è Pas de GPU d√©tect√©, utilisation du CPU")

if IN_COLAB:
    print(f"\nüì¶ Versions (Colab):")
    print(f"   - Python: {sys.version.split()[0]}")
    print(f"   - Scikit-learn: {sklearn.__version__}")
    print(f"   - TensorFlow: {tf.__version__}")
    print(f"   - OpenCV: {cv2.__version__}")
    print(f"   - NumPy: {np.__version__}")
    print(f"   - Pandas: {pd.__version__}")
    print("\n‚úÖ Configuration Colab termin√©e - Donn√©es extraites depuis archive_covid.zip !")

In [None]:
# Configuration pour Google Colab avec archive_covid.zip
import os
import sys
from google.colab import drive

print("? CONFIGURATION GOOGLE COLAB PRO")

# Monter Google Drive
drive.mount('/content/drive')

# Cloner le repo si pas d√©j√† fait
if not os.path.exists('/content/DS_COVID'):
    print("üì• Clonage du repository...")
    !git clone https://github.com/L-Poca/DS_COVID.git /content/DS_COVID

# Aller dans le dossier du projet
os.chdir('/content/DS_COVID')

# Extraction intelligente de archive_covid.zip depuis Drive
import zipfile
import shutil

def find_and_extract_covid_archive():
    """Trouve et extrait archive_covid.zip depuis Google Drive"""
    possible_paths = [
        '/content/drive/MyDrive/archive_covid.zip',
        '/content/drive/My Drive/archive_covid.zip',
        '/content/drive/MyDrive/COVID/archive_covid.zip',
        '/content/drive/My Drive/COVID/archive_covid.zip'
    ]
    
    for zip_path in possible_paths:
        if os.path.exists(zip_path):
            print(f"üì¶ Archive trouv√©e: {zip_path}")
            
            # Cr√©er le dossier data/raw s'il n'existe pas
            os.makedirs('/content/DS_COVID/data/raw', exist_ok=True)
            
            # Extraire l'archive
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall('/content/DS_COVID/data/raw/')
            
            # Chercher le dossier COVID-19_Radiography_Dataset
            for root, dirs, files in os.walk('/content/DS_COVID/data/raw/'):
                for dir_name in dirs:
                    if 'COVID' in dir_name and 'Radiography' in dir_name:
                        dataset_path = os.path.join(root, dir_name)
                        print(f"‚úÖ Dataset trouv√©: {dataset_path}")
                        return dataset_path
            
            print("‚ö†Ô∏è Dossier COVID-19_Radiography_Dataset non trouv√© dans l'archive")
            return None
    
    print("‚ùå Archive archive_covid.zip non trouv√©e dans Drive")
    print("üìÇ V√©rifiez que le fichier est dans MyDrive/")
    return None

# Extraire les donn√©es
dataset_path = find_and_extract_covid_archive()

# Installation du package
print("üì¶ Installation du package ds-covid...")
!pip install -e .

In [None]:
# ===================================
# 1.1 CONFIGURATION LOCALE (VS Code / Environnement local)
# ===================================
# ‚ö†Ô∏è Cette cellule est pour l'environnement LOCAL avec le package ds-covid install√©
# Pour Google Colab, utilisez la cellule pr√©c√©dente

# Configuration du package ds-covid (utilise automatiquement .env)
from ds_covid import Settings, configure_package, __version__
import ds_covid

# Imports de base
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import cv2
from PIL import Image
import pickle
import joblib
from datetime import datetime
import json
import sys

# Configuration des warnings
warnings.filterwarnings('ignore')

# Machine Learning - Scikit-learn
import sklearn
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import BaggingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, roc_curve

# Deep Learning - TensorFlow/Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks, optimizers
from tensorflow.keras.applications import VGG16, ResNet50, EfficientNetB0
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

# Configuration GPU
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if physical_devices:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
    print(f"‚úÖ GPU d√©tect√©: {len(physical_devices)} device(s)")
else:
    print("‚ö†Ô∏è Pas de GPU d√©tect√©, utilisation du CPU")

# Configuration automatique depuis .env
print(f"üì¶ Package ds-covid v{__version__}")
print("üîß Chargement de la configuration depuis .env...")

# Le package charge automatiquement les settings depuis .env
settings = Settings()

# Affichage de la configuration charg√©e
print(f"‚úÖ Configuration charg√©e depuis .env:")
print(f"   - PROJECT_ROOT: {settings.project_root}")
print(f"   - DATA_DIR: {settings.data_dir}")
print(f"   - MODELS_DIR: {settings.models_dir}")
print(f"   - RESULTS_DIR: {settings.results_dir}")

# Variables globales depuis les settings (.env)
RANDOM_SEED = settings.training.random_seed
BATCH_SIZE = settings.training.batch_size
EPOCHS = settings.training.epochs
LEARNING_RATE = settings.training.learning_rate
IMG_SIZE = settings.training.img_size
IMG_CHANNELS = settings.training.img_channels
TEST_SPLIT = settings.training.test_split
VALIDATION_SPLIT = settings.training.validation_split

# Variables de donn√©es depuis .env
MAX_IMAGES_PER_CLASS = settings.data.max_images_per_class

# Chemins depuis .env
PROJECT_ROOT = Path(settings.project_root)
DATA_DIR = Path(settings.data_dir)
MODELS_DIR = Path(settings.models_dir)
RESULTS_DIR = Path(settings.results_dir)

# V√©rification que les chemins existent
print(f"\nüìÅ V√©rification des chemins (.env):")
for name, path in [('PROJECT_ROOT', PROJECT_ROOT), ('DATA_DIR', DATA_DIR), 
                   ('MODELS_DIR', MODELS_DIR), ('RESULTS_DIR', RESULTS_DIR)]:
    if path.exists():
        print(f"   ‚úÖ {name}: {path}")
    else:
        print(f"   ‚ùå {name}: {path} (sera cr√©√© si n√©cessaire)")
        if name in ['MODELS_DIR', 'RESULTS_DIR']:
            path.mkdir(parents=True, exist_ok=True)
            print(f"   ‚úÖ {name}: Cr√©√© avec succ√®s")

# Classes du dataset COVID (depuis la configuration)
CLASSES = settings.data.class_names
CLASS_MAPPING = settings.data.class_mapping
NUM_CLASSES = settings.data.num_classes

# R√©sum√© de la configuration
print(f"\nüéØ Configuration finale (depuis .env):")
print(f"   - Classes ({NUM_CLASSES}): {CLASSES}")
print(f"   - Image size: {IMG_SIZE}")
print(f"   - Batch size: {BATCH_SIZE}")
print(f"   - Epochs: {EPOCHS}")
print(f"   - Max images/classe: {MAX_IMAGES_PER_CLASS}")

# Pour Colab Pro - Configuration optimis√©e
print("\nüöÄ OPTIMISATIONS COLAB PRO ACTIV√âES")
print("‚úÖ Mixed precision activ√©e (float16)")

print("\nüéØ PARAM√àTRES COLAB PRO:")
print(f"   - Image size: {IMG_SIZE} ‚Üê Puissance de 2 (optimal GPU)")
print(f"   - Batch size: {BATCH_SIZE} ‚Üê Plus gros (GPU puissant)")
print(f"   - Epochs: {EPOCHS} ‚Üê Plus long (temps illimit√©)")
print(f"   - Max images: {MAX_IMAGES_PER_CLASS} ‚Üê Dataset complet")

# Test m√©moire GPU
if tf.config.list_physical_devices('GPU'):
    # Activer mixed precision si GPU disponible
    from tensorflow.keras import mixed_precision
    mixed_precision.set_global_policy('mixed_float16')
    print("‚úÖ Mixed precision float16 activ√©e")

print("üéâ Configuration termin√©e!")

‚ö†Ô∏è Pas de GPU d√©tect√©, utilisation du CPU
üì¶ Package ds-covid v0.1.0
üîß Chargement de la configuration depuis .env...
üìÑ Fichier .env charg√©: /home/cepa/DST/projet_DS/DS_COVID/.env
‚úÖ Configuration charg√©e depuis .env:
   - PROJECT_ROOT: /home/cepa/DST/projet_DS/DS_COVID
   - DATA_DIR: /home/cepa/DST/projet_DS/DS_COVID/data/raw/COVID-19_Radiography_Dataset/COVID-19_Radiography_Dataset
   - MODELS_DIR: /home/cepa/DST/projet_DS/DS_COVID/models
   - RESULTS_DIR: /home/cepa/DST/projet_DS/DS_COVID/reports

üìÅ V√©rification des chemins (.env):
   ‚úÖ PROJECT_ROOT: /home/cepa/DST/projet_DS/DS_COVID
   ‚úÖ DATA_DIR: /home/cepa/DST/projet_DS/DS_COVID/data/raw/COVID-19_Radiography_Dataset/COVID-19_Radiography_Dataset
   ‚úÖ MODELS_DIR: /home/cepa/DST/projet_DS/DS_COVID/models
   ‚úÖ RESULTS_DIR: /home/cepa/DST/projet_DS/DS_COVID/reports

üéØ Configuration finale (depuis .env):
   - Classes (4): ['COVID', 'Lung_Opacity', 'Normal', 'Viral Pneumonia']
   - Mapping: {'COVID': 0, 'L

In [7]:
Path.cwd()

PosixPath('/home/cepa/DST/projet_DS/DS_COVID/notebooks')

In [6]:
# ===================================
# 1.2 CONFIGURATION CENTRALIS√âE AVEC .ENV
# ===================================

# Import du gestionnaire de configuration (chemin relatif)
# D√©tection automatique du chemin src depuis le notebook
notebook_dir = Path.cwd() if 'notebooks' in str(Path.cwd()) else Path(__file__).parent
project_root = notebook_dir.parent if notebook_dir.name == 'notebooks' else notebook_dir
src_path = project_root / 'src'

sys.path.append(str(src_path))
from config import config_manager, get_config, setup_environment

print("üîß Chargement de la configuration centralis√©e...")

# Affichage du r√©sum√© de configuration
config_manager.print_summary()

# Cr√©ation des r√©pertoires n√©cessaires
config_manager.create_directories()

# Configuration de l'environnement
setup_environment()

# R√©cup√©ration des variables de configuration
PROJECT_ROOT = get_config('paths', 'project_root')
DATA_DIR = get_config('paths', 'data_dir')
MODELS_DIR = get_config('paths', 'models_dir')
RESULTS_DIR = get_config('paths', 'results_dir')

# Configuration d'images
IMG_SIZE = get_config('image', 'img_size')
IMG_CHANNELS = get_config('image', 'img_channels')

# Param√®tres d'entra√Ænement depuis .env
BATCH_SIZE = get_config('training', 'batch_size')
EPOCHS = get_config('training', 'epochs')
LEARNING_RATE = get_config('training', 'learning_rate')
VALIDATION_SPLIT = get_config('training', 'validation_split')
TEST_SPLIT = get_config('training', 'test_split')
RANDOM_SEED = get_config('training', 'random_seed')

# Classes depuis la configuration
CLASSES = get_config('classes', 'class_names')
CLASS_MAPPING = get_config('classes', 'class_mapping')

print("\n‚úÖ Configuration charg√©e depuis .env!")
print(f"üìä Param√®tres principaux:")
print(f"   üñºÔ∏è Taille d'image: {IMG_SIZE}")
print(f"   üì¶ Batch size: {BATCH_SIZE}")
print(f"   üîÑ √âpoques: {EPOCHS}")
print(f"   üéØ Classes: {len(CLASSES)} ({', '.join(CLASSES)})")
print(f"   üìÅ R√©pertoire de donn√©es: {DATA_DIR}")

# Configuration des seeds pour reproductibilit√©
np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

print(f"üé≤ Seed configur√©: {RANDOM_SEED}")
print("üîß Environnement pr√™t avec configuration centralis√©e!")

‚úÖ Variables d'environnement charg√©es depuis: /home/cepa/DST/projet_DS/DS_COVID/.env
üîß Chargement de la configuration centralis√©e...
üîß CONFIGURATION DU PROJET DS_COVID

üìÅ CHEMINS:
   ‚úÖ project_root: /home/cepa/DST/projet_DS/DS_COVID
   ‚úÖ data_dir: /home/cepa/DST/projet_DS/DS_COVID/data/raw/COVID-19_Radiography_Dataset/COVID-19_Radiography_Dataset
   ‚úÖ models_dir: /home/cepa/DST/projet_DS/DS_COVID/models
   ‚úÖ results_dir: /home/cepa/DST/projet_DS/DS_COVID/reports
   ‚úÖ notebooks_dir: /home/cepa/DST/projet_DS/DS_COVID/notebooks

üñºÔ∏è IMAGES:
   üìê Taille: (224, 224)
   üé® Canaux: 3

üéØ ENTRA√éNEMENT:
   üìä Batch size: 32
   üîÑ √âpoques: 50
   üìà Learning rate: 0.001

üè∑Ô∏è CLASSES:
   üìã 4 classes: COVID, Lung_Opacity, Normal, Viral Pneumonia
üìÅ data_dir: /home/cepa/DST/projet_DS/DS_COVID/data/raw/COVID-19_Radiography_Dataset/COVID-19_Radiography_Dataset
üìÅ models_dir: /home/cepa/DST/projet_DS/DS_COVID/models
üìÅ results_dir: /home/cepa/DST/pr

In [None]:
# ===================================
# 1.2 CONFIGURATION DES CHEMINS ET DATASET
# ===================================

# Chemins du projet
PROJECT_ROOT = Path("/home/cepa/DST/projet_DS/DS_COVID")
DATA_DIR = PROJECT_ROOT / "data" / "raw" / "COVID-19_Radiography_Dataset" / "COVID-19_Radiography_Dataset"
MODELS_DIR = PROJECT_ROOT / "models"
RESULTS_DIR = PROJECT_ROOT / "reports"

# Cr√©ation des dossiers de sortie
MODELS_DIR.mkdir(exist_ok=True)
RESULTS_DIR.mkdir(exist_ok=True)

print(f"üìÅ R√©pertoire du projet: {PROJECT_ROOT}")
print(f"üìÅ R√©pertoire des donn√©es: {DATA_DIR}")
print(f"üìÅ R√©pertoire des mod√®les: {MODELS_DIR}")

# D√©finition des classes
CLASSES = ["COVID", "Lung_Opacity", "Normal", "Viral Pneumonia"]
CLASS_MAPPING = {cls: idx for idx, cls in enumerate(CLASSES)}

print(f"üè∑Ô∏è Classes d√©tect√©es: {CLASSES}")
print(f"üî¢ Mapping des classes: {CLASS_MAPPING}")

# Param√®tres globaux
IMG_SIZE = (224, 224)  # Taille standard pour les mod√®les pr√©-entra√Æn√©s
BATCH_SIZE = 32
EPOCHS = 50
LEARNING_RATE = 0.001
VALIDATION_SPLIT = 0.2
TEST_SPLIT = 0.2

print(f"‚öôÔ∏è Taille d'image: {IMG_SIZE}")
print(f"‚öôÔ∏è Batch size: {BATCH_SIZE}")
print(f"‚öôÔ∏è Nombre d'√©poques: {EPOCHS}")

# V√©rification de l'existence des donn√©es
if DATA_DIR.exists():
    print("‚úÖ R√©pertoire de donn√©es trouv√©")
    for class_name in CLASSES:
        class_path = DATA_DIR / class_name / "images"
        if class_path.exists():
            n_images = len(list(class_path.glob("*.png")))
            print(f"   üìä {class_name}: {n_images} images")
        else:
            print(f"   ‚ùå {class_name}: r√©pertoire non trouv√©")
else:
    print("‚ùå R√©pertoire de donn√©es non trouv√©!")
    print(f"   V√©rifiez le chemin: {DATA_DIR}")

In [None]:
# ===================================
# 1.3 FONCTIONS UTILITAIRES POUR LE CHARGEMENT
# ===================================

def load_image_paths_and_labels(data_dir, classes):
    """
    Charge les chemins des images et leurs labels
    
    Returns:
        tuple: (image_paths, labels, class_counts)
    """
    image_paths = []
    labels = []
    class_counts = {}
    
    for class_name in classes:
        class_dir = data_dir / class_name / "images"
        if not class_dir.exists():
            continue
            
        # R√©cup√©ration des images
        image_files = list(class_dir.glob("*.png"))
        class_counts[class_name] = len(image_files)
        
        # Ajout des chemins et labels
        for img_path in image_files:
            image_paths.append(str(img_path))
            labels.append(class_name)
    
    return image_paths, labels, class_counts

def load_and_preprocess_image(image_path, target_size=(224, 224)):
    """
    Charge et pr√©processe une image
    
    Args:
        image_path (str): Chemin vers l'image
        target_size (tuple): Taille cible (largeur, hauteur)
    
    Returns:
        np.array: Image pr√©process√©e
    """
    try:
        # Chargement avec PIL
        img = Image.open(image_path).convert('RGB')
        
        # Redimensionnement
        img = img.resize(target_size)
        
        # Conversion en array numpy
        img_array = np.array(img)
        
        # Normalisation [0, 1]
        img_array = img_array.astype(np.float32) / 255.0
        
        return img_array
    
    except Exception as e:
        print(f"Erreur lors du chargement de {image_path}: {e}")
        return None

def create_balanced_subset(image_paths, labels, max_per_class=500):
    """
    Cr√©e un sous-ensemble √©quilibr√© du dataset
    
    Args:
        image_paths (list): Liste des chemins d'images
        labels (list): Liste des labels
        max_per_class (int): Nombre maximum d'images par classe
    
    Returns:
        tuple: (subset_paths, subset_labels)
    """
    df = pd.DataFrame({'path': image_paths, 'label': labels})
    
    # √âchantillonnage √©quilibr√©
    balanced_df = df.groupby('label').apply(
        lambda x: x.sample(n=min(len(x), max_per_class), random_state=42)
    ).reset_index(drop=True)
    
    return balanced_df['path'].tolist(), balanced_df['label'].tolist()

print("üîß Fonctions utilitaires d√©finies")

## üìä Section 2: Exploratory Data Analysis

In [None]:
# ===================================
# 2.1 CHARGEMENT ET ANALYSE DE LA DISTRIBUTION
# ===================================

# Chargement des chemins et labels
print("üìÇ Chargement des donn√©es...")
image_paths, labels, class_counts = load_image_paths_and_labels(DATA_DIR, CLASSES)

print(f"üìä Total d'images: {len(image_paths)}")
print(f"üìä Total de labels: {len(labels)}")

# Cr√©ation du DataFrame pour l'analyse
df_analysis = pd.DataFrame({
    'image_path': image_paths,
    'label': labels
})

# Analyse de la distribution des classes
print("\nüè∑Ô∏è Distribution des classes:")
class_distribution = df_analysis['label'].value_counts()
print(class_distribution)

# Visualisation de la distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Graphique en barres
class_distribution.plot(kind='bar', ax=ax1, color='skyblue', alpha=0.8)
ax1.set_title('Distribution des Classes', fontsize=14, fontweight='bold')
ax1.set_xlabel('Classes', fontsize=12)
ax1.set_ylabel('Nombre d\'images', fontsize=12)
ax1.tick_params(axis='x', rotation=45)

# Graphique en secteurs
ax2.pie(class_distribution.values, labels=class_distribution.index, autopct='%1.1f%%', 
        startangle=90, colors=plt.cm.Set3.colors)
ax2.set_title('R√©partition des Classes (%)', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# Statistiques d√©taill√©es
print("\nüìà Statistiques d√©taill√©es:")
total_images = len(image_paths)
for class_name, count in class_distribution.items():
    percentage = (count / total_images) * 100
    print(f"   {class_name}: {count} images ({percentage:.2f}%)")

# D√©tection du d√©s√©quilibre
max_count = class_distribution.max()
min_count = class_distribution.min()
imbalance_ratio = max_count / min_count

print(f"\n‚öñÔ∏è Ratio de d√©s√©quilibre: {imbalance_ratio:.2f}")
if imbalance_ratio > 2:
    print("‚ö†Ô∏è  Dataset d√©s√©quilibr√© d√©tect√© - techniques de r√©√©quilibrage recommand√©es")
else:
    print("‚úÖ Dataset relativement √©quilibr√©")

In [None]:
# ===================================
# 2.2 VISUALISATION DES IMAGES REPR√âSENTATIVES
# ===================================

def visualize_sample_images(image_paths, labels, classes, n_samples=3):
    """
    Visualise des √©chantillons d'images pour chaque classe
    """
    fig, axes = plt.subplots(len(classes), n_samples, figsize=(15, 12))
    fig.suptitle('√âchantillons d\'Images par Classe', fontsize=16, fontweight='bold')
    
    df_viz = pd.DataFrame({'path': image_paths, 'label': labels})
    
    for i, class_name in enumerate(classes):
        class_images = df_viz[df_viz['label'] == class_name]['path'].tolist()
        
        # S√©lection al√©atoire d'√©chantillons
        samples = np.random.choice(class_images, min(n_samples, len(class_images)), replace=False)
        
        for j, img_path in enumerate(samples):
            try:
                # Chargement de l'image
                img = cv2.imread(img_path)
                img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                
                # Affichage
                axes[i, j].imshow(img, cmap='gray' if len(img.shape) == 2 else None)
                axes[i, j].set_title(f'{class_name}', fontsize=10)
                axes[i, j].axis('off')
                
            except Exception as e:
                axes[i, j].text(0.5, 0.5, f'Erreur:\n{str(e)}', 
                              ha='center', va='center', transform=axes[i, j].transAxes)
                axes[i, j].axis('off')
    
    plt.tight_layout()
    plt.show()

# Visualisation des √©chantillons
print("üñºÔ∏è Visualisation des √©chantillons d'images...")
visualize_sample_images(image_paths, labels, CLASSES, n_samples=4)

In [None]:
# ===================================
# 2.3 ANALYSE DES PROPRI√âT√âS DES IMAGES
# ===================================

def analyze_image_properties(image_paths, sample_size=100):
    """
    Analyse les propri√©t√©s des images (taille, intensit√©, etc.)
    """
    print(f"üîç Analyse des propri√©t√©s sur un √©chantillon de {sample_size} images...")
    
    # √âchantillonnage al√©atoire
    sample_paths = np.random.choice(image_paths, min(sample_size, len(image_paths)), replace=False)
    
    properties = {
        'widths': [],
        'heights': [],
        'channels': [],
        'mean_intensities': [],
        'std_intensities': [],
        'file_sizes': []
    }
    
    for img_path in sample_paths:
        try:
            # Chargement avec PIL pour les propri√©t√©s de base
            img_pil = Image.open(img_path)
            width, height = img_pil.size
            
            # Chargement avec OpenCV pour l'analyse d'intensit√©
            img_cv = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
            
            # Propri√©t√©s
            properties['widths'].append(width)
            properties['heights'].append(height)
            properties['channels'].append(len(img_pil.getbands()))
            properties['mean_intensities'].append(np.mean(img_cv))
            properties['std_intensities'].append(np.std(img_cv))
            properties['file_sizes'].append(os.path.getsize(img_path) / 1024)  # KB
            
        except Exception as e:
            print(f"Erreur avec {img_path}: {e}")
            continue
    
    return properties

# Analyse des propri√©t√©s
properties = analyze_image_properties(image_paths, sample_size=200)

# Cr√©ation du DataFrame d'analyse
df_props = pd.DataFrame(properties)

# Affichage des statistiques
print("\nüìä Statistiques des propri√©t√©s d'images:")
print(df_props.describe())

# Visualisation des distributions
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Distribution des Propri√©t√©s d\'Images', fontsize=16, fontweight='bold')

# Tailles
axes[0, 0].hist(df_props['widths'], bins=20, alpha=0.7, color='skyblue')
axes[0, 0].set_title('Distribution des Largeurs')
axes[0, 0].set_xlabel('Largeur (pixels)')

axes[0, 1].hist(df_props['heights'], bins=20, alpha=0.7, color='lightgreen')
axes[0, 1].set_title('Distribution des Hauteurs')
axes[0, 1].set_xlabel('Hauteur (pixels)')

# Intensit√©s
axes[0, 2].hist(df_props['mean_intensities'], bins=20, alpha=0.7, color='coral')
axes[0, 2].set_title('Distribution des Intensit√©s Moyennes')
axes[0, 2].set_xlabel('Intensit√© Moyenne')

axes[1, 0].hist(df_props['std_intensities'], bins=20, alpha=0.7, color='gold')
axes[1, 0].set_title('Distribution des √âcarts-Types d\'Intensit√©')
axes[1, 0].set_xlabel('√âcart-Type Intensit√©')

# Tailles de fichiers
axes[1, 1].hist(df_props['file_sizes'], bins=20, alpha=0.7, color='mediumpurple')
axes[1, 1].set_title('Distribution des Tailles de Fichiers')
axes[1, 1].set_xlabel('Taille (KB)')

# Corr√©lation largeur/hauteur
axes[1, 2].scatter(df_props['widths'], df_props['heights'], alpha=0.6, color='darkblue')
axes[1, 2].set_title('Corr√©lation Largeur/Hauteur')
axes[1, 2].set_xlabel('Largeur (pixels)')
axes[1, 2].set_ylabel('Hauteur (pixels)')

plt.tight_layout()
plt.show()

# D√©tection des formats non standards
print(f"\nüìê Formats d'images d√©tect√©s:")
unique_dimensions = df_props.groupby(['widths', 'heights']).size().reset_index(name='count')
print(unique_dimensions.sort_values('count', ascending=False))

## üîß Section 3: Data Preprocessing and Augmentation

In [None]:
# ===================================
# 3.1 PR√âPARATION DES DONN√âES D'ENTRA√éNEMENT
# ===================================

print("üîÑ Pr√©paration du dataset pour l'entra√Ænement...")

# Cr√©ation d'un sous-ensemble √©quilibr√© pour des temps de traitement raisonnables
print("‚öñÔ∏è Cr√©ation d'un sous-ensemble √©quilibr√©...")
balanced_paths, balanced_labels = create_balanced_subset(
    image_paths, labels, max_per_class=1000
)

print(f"üìä Dataset √©quilibr√©: {len(balanced_paths)} images")

# V√©rification de l'√©quilibrage
balanced_distribution = pd.Series(balanced_labels).value_counts()
print("üè∑Ô∏è Nouvelle distribution:")
print(balanced_distribution)

# Encodage des labels
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(balanced_labels)

print(f"üî¢ Classes encod√©es: {dict(zip(label_encoder.classes_, range(len(label_encoder.classes_))))}")

# Division train/validation/test
print("‚úÇÔ∏è Division du dataset...")

# Premi√®re division: train+val / test
X_temp, X_test, y_temp, y_test = train_test_split(
    balanced_paths, encoded_labels,
    test_size=TEST_SPLIT,
    stratify=encoded_labels,
    random_state=42
)

# Seconde division: train / val
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp,
    test_size=VALIDATION_SPLIT/(1-TEST_SPLIT),  # Ajustement pour avoir 20% du total
    stratify=y_temp,
    random_state=42
)

print(f"üìä Train: {len(X_train)} images")
print(f"üìä Validation: {len(X_val)} images")
print(f"üìä Test: {len(X_test)} images")

# V√©rification de la stratification
print("\nüéØ V√©rification de la stratification:")
for split_name, y_split in [("Train", y_train), ("Val", y_val), ("Test", y_test)]:
    distribution = pd.Series(y_split).value_counts().sort_index()
    percentages = (distribution / len(y_split) * 100).round(1)
    print(f"{split_name}: {dict(zip(label_encoder.classes_, percentages.values))}")

In [None]:
# ===================================
# 3.2 PIPELINE DE PR√âPROCESSING D'IMAGES
# ===================================

class ImagePreprocessor:
    """Pipeline de pr√©processing d'images optimis√©"""
    
    def __init__(self, target_size=(224, 224), normalize=True):
        self.target_size = target_size
        self.normalize = normalize
    
    def preprocess_single_image(self, image_path):
        """Pr√©processe une seule image"""
        try:
            # Chargement
            img = cv2.imread(image_path)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            
            # Redimensionnement
            img = cv2.resize(img, self.target_size)
            
            # Normalisation
            if self.normalize:
                img = img.astype(np.float32) / 255.0
            
            return img
        
        except Exception as e:
            print(f"Erreur preprocessing {image_path}: {e}")
            return np.zeros((*self.target_size, 3), dtype=np.float32)
    
    def preprocess_batch(self, image_paths, batch_size=32):
        """Pr√©processe un lot d'images avec gestion m√©moire"""
        n_images = len(image_paths)
        n_batches = (n_images + batch_size - 1) // batch_size
        
        # Initialisation du tableau de sortie
        images = np.zeros((n_images, *self.target_size, 3), dtype=np.float32)
        
        print(f"üîÑ Preprocessing {n_images} images en {n_batches} batches...")
        
        for i in range(n_batches):
            start_idx = i * batch_size
            end_idx = min((i + 1) * batch_size, n_images)
            
            # Traitement du batch
            for j, img_path in enumerate(image_paths[start_idx:end_idx]):
                images[start_idx + j] = self.preprocess_single_image(img_path)
            
            if (i + 1) % 10 == 0:
                print(f"   üìä Batch {i+1}/{n_batches} termin√©")
        
        return images

# Initialisation du preprocessor
preprocessor = ImagePreprocessor(target_size=IMG_SIZE, normalize=True)

# Preprocessing des donn√©es
print("üîÑ Preprocessing des images...")

X_train_processed = preprocessor.preprocess_batch(X_train)
X_val_processed = preprocessor.preprocess_batch(X_val)
X_test_processed = preprocessor.preprocess_batch(X_test)

print(f"‚úÖ Preprocessing termin√©!")
print(f"üìä Shape train: {X_train_processed.shape}")
print(f"üìä Shape validation: {X_val_processed.shape}")
print(f"üìä Shape test: {X_test_processed.shape}")

# Conversion des labels en format cat√©goriel pour le deep learning
y_train_categorical = to_categorical(y_train, num_classes=len(CLASSES))
y_val_categorical = to_categorical(y_val, num_classes=len(CLASSES))
y_test_categorical = to_categorical(y_test, num_classes=len(CLASSES))

print(f"üìä Shape labels train: {y_train_categorical.shape}")

# Visualisation d'un √©chantillon preprocess√©
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('Images Preprocess√©es - √âchantillons', fontsize=14, fontweight='bold')

for i in range(8):
    row = i // 4
    col = i % 4
    
    # S√©lection d'une image al√©atoire
    idx = np.random.randint(0, len(X_train_processed))
    img = X_train_processed[idx]
    label = label_encoder.inverse_transform([y_train[idx]])[0]
    
    axes[row, col].imshow(img)
    axes[row, col].set_title(f'{label}', fontsize=10)
    axes[row, col].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# ===================================
# 3.3 DATA AUGMENTATION POUR LE DEEP LEARNING
# ===================================

# Configuration de l'augmentation de donn√©es
train_datagen = ImageDataGenerator(
    rotation_range=10,           # Rotation al√©atoire jusqu'√† 10¬∞
    width_shift_range=0.1,       # D√©calage horizontal 10%
    height_shift_range=0.1,      # D√©calage vertical 10%
    zoom_range=0.1,              # Zoom al√©atoire 10%
    horizontal_flip=True,        # Miroir horizontal
    brightness_range=[0.8, 1.2], # Variation de luminosit√©
    fill_mode='nearest'          # Mode de remplissage
)

# G√©n√©rateur pour validation (pas d'augmentation)
val_datagen = ImageDataGenerator()

print("üîÑ Configuration de l'augmentation de donn√©es")

# D√©monstration de l'augmentation
def demonstrate_augmentation(X_sample, y_sample, n_augmentations=6):
    """Montre l'effet de l'augmentation sur quelques images"""
    
    fig, axes = plt.subplots(2, n_augmentations + 1, figsize=(20, 8))
    fig.suptitle('D√©monstration de l\'Augmentation de Donn√©es', fontsize=16, fontweight='bold')
    
    for row in range(2):
        # Image originale
        idx = np.random.randint(0, len(X_sample))
        original_img = X_sample[idx]
        label = label_encoder.inverse_transform([y_sample[idx]])[0]
        
        axes[row, 0].imshow(original_img)
        axes[row, 0].set_title(f'Original\n{label}')
        axes[row, 0].axis('off')
        
        # Images augment√©es
        img_batch = np.expand_dims(original_img, 0)
        augmented_generator = train_datagen.flow(img_batch, batch_size=1)
        
        for col in range(1, n_augmentations + 1):
            augmented_batch = next(augmented_generator)
            augmented_img = augmented_batch[0]
            
            # Clip pour √©viter les valeurs hors [0,1]
            augmented_img = np.clip(augmented_img, 0, 1)
            
            axes[row, col].imshow(augmented_img)
            axes[row, col].set_title(f'Augment√©e {col}')
            axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()

# D√©monstration
print("üñºÔ∏è D√©monstration de l'augmentation...")
demonstrate_augmentation(X_train_processed, y_train, n_augmentations=5)

print("‚úÖ Pipeline d'augmentation configur√©")

## üéØ Section 4: Baseline Models Implementation

In [None]:
# ===================================
# 4.1 EXTRACTION DE FEATURES POUR ML TRADITIONNEL
# ===================================

def extract_traditional_features(images):
    """
    Extrait des features traditionnelles pour les mod√®les ML classiques
    
    Features extraites:
    - Statistiques d'intensit√© (moyenne, std, min, max)
    - Histogramme des niveaux de gris
    - Features de texture (Local Binary Pattern simul√©)
    """
    print(f"üîç Extraction de features pour {len(images)} images...")
    
    features_list = []
    
    for i, img in enumerate(images):
        # Conversion en niveaux de gris
        if len(img.shape) == 3:
            gray_img = cv2.cvtColor((img * 255).astype(np.uint8), cv2.COLOR_RGB2GRAY)
        else:
            gray_img = (img * 255).astype(np.uint8)
        
        features = []
        
        # 1. Statistiques d'intensit√©
        features.extend([
            np.mean(gray_img),
            np.std(gray_img),
            np.min(gray_img),
            np.max(gray_img),
            np.percentile(gray_img, 25),
            np.percentile(gray_img, 50),
            np.percentile(gray_img, 75)
        ])
        
        # 2. Histogramme (16 bins)
        hist, _ = np.histogram(gray_img, bins=16, range=(0, 256))
        hist = hist / np.sum(hist)  # Normalisation
        features.extend(hist)
        
        # 3. Features de texture simples
        # Gradient
        grad_x = cv2.Sobel(gray_img, cv2.CV_64F, 1, 0, ksize=3)
        grad_y = cv2.Sobel(gray_img, cv2.CV_64F, 0, 1, ksize=3)
        gradient_magnitude = np.sqrt(grad_x**2 + grad_y**2)
        
        features.extend([
            np.mean(gradient_magnitude),
            np.std(gradient_magnitude)
        ])
        
        # 4. Features g√©om√©triques (moments)
        moments = cv2.moments(gray_img)
        if moments['m00'] != 0:
            cx = moments['m10'] / moments['m00']
            cy = moments['m01'] / moments['m00']
            features.extend([cx, cy])
        else:
            features.extend([0, 0])
        
        # 5. √ânergie et entropie
        # Normalisation de l'image pour calculer l'entropie
        normalized = gray_img / 255.0
        entropy = -np.sum(normalized * np.log(normalized + 1e-10))
        energy = np.sum(normalized ** 2)
        
        features.extend([entropy, energy])
        
        features_list.append(features)
        
        if (i + 1) % 100 == 0:
            print(f"   üìä {i+1}/{len(images)} images trait√©es")
    
    feature_matrix = np.array(features_list)
    print(f"‚úÖ Features extraites: shape {feature_matrix.shape}")
    
    return feature_matrix

# Extraction des features
print("üîç Extraction des features traditionnelles...")
X_train_features = extract_traditional_features(X_train_processed)
X_val_features = extract_traditional_features(X_val_processed)
X_test_features = extract_traditional_features(X_test_processed)

# Normalisation des features
scaler = StandardScaler()
X_train_features_scaled = scaler.fit_transform(X_train_features)
X_val_features_scaled = scaler.transform(X_val_features)
X_test_features_scaled = scaler.transform(X_test_features)

print(f"üìä Features shape train: {X_train_features_scaled.shape}")
print(f"üìä Features normalis√©es avec StandardScaler")

In [None]:
# ===================================
# 4.2 MOD√àLES BASELINE TRADITIONNELS
# ===================================

# Dictionnaire pour stocker les r√©sultats
baseline_results = {}

def evaluate_model(model, X_train, X_val, y_train, y_val, model_name):
    """√âvalue un mod√®le et stocke les r√©sultats"""
    
    print(f"\nüîÑ Entra√Ænement de {model_name}...")
    
    # Entra√Ænement
    start_time = datetime.now()
    model.fit(X_train, y_train)
    training_time = datetime.now() - start_time
    
    # Pr√©dictions
    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_val)
    
    # M√©triques
    train_accuracy = accuracy_score(y_train, y_train_pred)
    val_accuracy = accuracy_score(y_val, y_val_pred)
    
    # Scores d√©taill√©s pour la validation
    precision, recall, f1, _ = precision_recall_fscore_support(y_val, y_val_pred, average='weighted')
    
    # Stockage des r√©sultats
    results = {
        'model': model,
        'train_accuracy': train_accuracy,
        'val_accuracy': val_accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'training_time': training_time.total_seconds(),
        'y_val_pred': y_val_pred
    }
    
    baseline_results[model_name] = results
    
    print(f"‚úÖ {model_name}:")
    print(f"   üìä Train Accuracy: {train_accuracy:.4f}")
    print(f"   üìä Val Accuracy: {val_accuracy:.4f}")
    print(f"   üìä F1-Score: {f1:.4f}")
    print(f"   ‚è±Ô∏è Training Time: {training_time.total_seconds():.2f}s")
    
    return results

# D√©finition des mod√®les baseline
baseline_models = {
    'Random Forest': RandomForestClassifier(
        n_estimators=100,
        max_depth=10,
        random_state=42,
        n_jobs=-1
    ),
    'SVM': SVC(
        kernel='rbf',
        C=1.0,
        random_state=42,
        probability=True  # Pour les probabilit√©s
    ),
    'Logistic Regression': LogisticRegression(
        max_iter=1000,
        random_state=42,
        n_jobs=-1
    ),
    'Decision Tree': DecisionTreeClassifier(
        max_depth=15,
        random_state=42
    )
}

# Entra√Ænement et √©valuation de tous les mod√®les
print("üöÄ Entra√Ænement des mod√®les baseline...")

for model_name, model in baseline_models.items():
    evaluate_model(
        model, 
        X_train_features_scaled, 
        X_val_features_scaled, 
        y_train, 
        y_val, 
        model_name
    )

print("\n‚úÖ Tous les mod√®les baseline entra√Æn√©s!")

## üå≥ Section 5: Bagging Ensemble Methods

In [None]:
# ===================================
# 5.1 IMPL√âMENTATION DES M√âTHODES DE BAGGING
# ===================================

print("üå≥ Impl√©mentation des m√©thodes de Bagging...")

# Dictionnaire pour les r√©sultats de bagging
bagging_results = {}

# 5.1.1 Random Forest optimis√©
print("\nüîÑ Random Forest optimis√©...")
rf_optimized = RandomForestClassifier(
    n_estimators=200,           # Plus d'arbres
    max_depth=15,               # Profondeur contr√¥l√©e
    min_samples_split=5,        # Contr√¥le overfitting
    min_samples_leaf=2,         # Contr√¥le overfitting
    max_features='sqrt',        # Features al√©atoires
    bootstrap=True,             # Bagging activ√©
    random_state=42,
    n_jobs=-1
)

evaluate_model(
    rf_optimized, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'Random Forest Optimized'
)

# 5.1.2 Extra Trees (Extremely Randomized Trees)
print("\nüîÑ Extra Trees...")
extra_trees = ExtraTreesClassifier(
    n_estimators=200,
    max_depth=15,
    min_samples_split=5,
    min_samples_leaf=2,
    max_features='sqrt',
    bootstrap=True,             # Bagging
    random_state=42,
    n_jobs=-1
)

evaluate_model(
    extra_trees, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'Extra Trees'
)

# 5.1.3 Bagging avec diff√©rents mod√®les de base
print("\nüîÑ Bagging avec SVM...")
bagging_svm = BaggingClassifier(
    estimator=SVC(kernel='rbf', probability=True, random_state=42),
    n_estimators=50,            # Moins d'estimateurs car SVM est co√ªteux
    max_samples=0.8,            # 80% des √©chantillons par mod√®le
    max_features=0.8,           # 80% des features par mod√®le
    bootstrap=True,
    random_state=42,
    n_jobs=-1
)

evaluate_model(
    bagging_svm, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'Bagging SVM'
)

# 5.1.4 Bagging avec Logistic Regression
print("\nüîÑ Bagging avec Logistic Regression...")
bagging_lr = BaggingClassifier(
    estimator=LogisticRegression(max_iter=1000, random_state=42),
    n_estimators=100,
    max_samples=0.8,
    max_features=0.8,
    bootstrap=True,
    random_state=42,
    n_jobs=-1
)

evaluate_model(
    bagging_lr, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'Bagging Logistic Regression'
)

print("\n‚úÖ M√©thodes de Bagging entra√Æn√©es!")

In [None]:
# ===================================
# 5.2 ANALYSE DE L'IMPORTANCE DES FEATURES (BAGGING)
# ===================================

def analyze_feature_importance(model, feature_names, model_name, top_n=20):
    """Analyse l'importance des features pour les mod√®les tree-based"""
    
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
        
        # Cr√©ation du DataFrame
        feature_importance_df = pd.DataFrame({
            'feature': feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)
        
        # Visualisation
        plt.figure(figsize=(12, 8))
        top_features = feature_importance_df.head(top_n)
        
        sns.barplot(data=top_features, x='importance', y='feature', palette='viridis')
        plt.title(f'Top {top_n} Features - {model_name}', fontsize=14, fontweight='bold')
        plt.xlabel('Importance')
        plt.tight_layout()
        plt.show()
        
        return feature_importance_df
    else:
        print(f"‚ùå {model_name} ne supporte pas l'analyse d'importance des features")
        return None

# G√©n√©ration des noms de features
feature_names = (
    ['mean', 'std', 'min', 'max', 'q25', 'q50', 'q75'] +  # Statistiques
    [f'hist_{i}' for i in range(16)] +                     # Histogramme
    ['grad_mean', 'grad_std'] +                            # Gradient
    ['cx', 'cy'] +                                         # Moments
    ['entropy', 'energy']                                  # Texture
)

# Analyse pour Random Forest optimis√©
print("üîç Analyse d'importance des features - Random Forest...")
rf_model = baseline_results['Random Forest Optimized']['model']
rf_importance = analyze_feature_importance(rf_model, feature_names, 'Random Forest Optimized')

# Analyse pour Extra Trees
print("üîç Analyse d'importance des features - Extra Trees...")
et_model = baseline_results['Extra Trees']['model']
et_importance = analyze_feature_importance(et_model, feature_names, 'Extra Trees')

# Comparaison des importances
if rf_importance is not None and et_importance is not None:
    print("\nüìä Comparaison des top 10 features:")
    comparison_df = pd.merge(
        rf_importance.head(10)[['feature', 'importance']].rename(columns={'importance': 'RF_importance'}),
        et_importance.head(10)[['feature', 'importance']].rename(columns={'importance': 'ET_importance'}),
        on='feature',
        how='outer'
    ).fillna(0)
    
    print(comparison_df)

## üöÄ Section 6: Boosting Ensemble Methods

In [None]:
# ===================================
# 6.1 IMPL√âMENTATION DES M√âTHODES DE BOOSTING
# ===================================

print("üöÄ Impl√©mentation des m√©thodes de Boosting...")

# 6.1.1 AdaBoost
print("\nüîÑ AdaBoost...")
ada_boost = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=3, random_state=42),
    n_estimators=100,
    learning_rate=1.0,
    algorithm='SAMME',  # Supporte multiclass
    random_state=42
)

evaluate_model(
    ada_boost, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'AdaBoost'
)

# 6.1.2 Gradient Boosting
print("\nüîÑ Gradient Boosting...")
gradient_boost = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    min_samples_split=5,
    min_samples_leaf=2,
    subsample=0.8,              # Stochastic gradient boosting
    random_state=42
)

evaluate_model(
    gradient_boost, 
    X_train_features_scaled, 
    X_val_features_scaled, 
    y_train, 
    y_val, 
    'Gradient Boosting'
)

# 6.1.3 XGBoost (si disponible)
try:
    print("\nüîÑ XGBoost...")
    xgb_classifier = xgb.XGBClassifier(
        n_estimators=100,
        learning_rate=0.1,
        max_depth=3,
        min_child_weight=1,
        subsample=0.8,
        colsample_bytree=0.8,
        random_state=42,
        eval_metric='mlogloss'  # Pour multiclass
    )
    
    evaluate_model(
        xgb_classifier, 
        X_train_features_scaled, 
        X_val_features_scaled, 
        y_train, 
        y_val, 
        'XGBoost'
    )
    
except NameError:
    print("‚ö†Ô∏è XGBoost non disponible - passage ignor√©")

print("\n‚úÖ M√©thodes de Boosting entra√Æn√©es!")

In [None]:
# ===================================
# 6.2 COMPARAISON BAGGING VS BOOSTING
# ===================================

def create_ensemble_comparison():
    """Compare les performances des m√©thodes d'ensemble"""
    
    # R√©cup√©ration des m√©triques pour comparaison
    comparison_data = []
    
    for model_name, results in baseline_results.items():
        comparison_data.append({
            'Model': model_name,
            'Type': get_model_type(model_name),
            'Val_Accuracy': results['val_accuracy'],
            'F1_Score': results['f1_score'],
            'Training_Time': results['training_time']
        })
    
    df_comparison = pd.DataFrame(comparison_data)
    
    return df_comparison

def get_model_type(model_name):
    """D√©termine le type de mod√®le"""
    if any(keyword in model_name.lower() for keyword in ['random forest', 'extra trees', 'bagging']):
        return 'Bagging'
    elif any(keyword in model_name.lower() for keyword in ['ada', 'gradient', 'xgboost']):
        return 'Boosting'
    else:
        return 'Baseline'

# Cr√©ation de la comparaison
df_comparison = create_ensemble_comparison()

print("üìä Comparaison des m√©thodes d'ensemble:")
print(df_comparison.sort_values('Val_Accuracy', ascending=False))

# Visualisation comparative
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Comparaison des M√©thodes d\'Ensemble', fontsize=16, fontweight='bold')

# Accuracy par type
sns.boxplot(data=df_comparison, x='Type', y='Val_Accuracy', ax=axes[0, 0])
axes[0, 0].set_title('Accuracy par Type de M√©thode')
axes[0, 0].set_ylabel('Validation Accuracy')

# F1-Score par type
sns.boxplot(data=df_comparison, x='Type', y='F1_Score', ax=axes[0, 1])
axes[0, 1].set_title('F1-Score par Type de M√©thode')
axes[0, 1].set_ylabel('F1 Score')

# Temps d'entra√Ænement par type
sns.boxplot(data=df_comparison, x='Type', y='Training_Time', ax=axes[1, 0])
axes[1, 0].set_title('Temps d\'Entra√Ænement par Type')
axes[1, 0].set_ylabel('Training Time (s)')

# Correlation Accuracy vs Training Time
sns.scatterplot(data=df_comparison, x='Training_Time', y='Val_Accuracy', 
                hue='Type', size='F1_Score', sizes=(50, 200), ax=axes[1, 1])
axes[1, 1].set_title('Accuracy vs Temps d\'Entra√Ænement')
axes[1, 1].set_xlabel('Training Time (s)')
axes[1, 1].set_ylabel('Validation Accuracy')

plt.tight_layout()
plt.show()

# Statistiques par type
print("\nüìà Statistiques par type de m√©thode:")
type_stats = df_comparison.groupby('Type').agg({
    'Val_Accuracy': ['mean', 'std', 'max'],
    'F1_Score': ['mean', 'std', 'max'],
    'Training_Time': ['mean', 'std', 'min']
}).round(4)

print(type_stats)

## üß† Section 7: Deep Learning Model Setup

In [None]:
# ===================================
# 7.1 ARCHITECTURE CNN CUSTOM
# ===================================

def create_custom_cnn(input_shape=(224, 224, 3), num_classes=4):
    """
    Cr√©e une architecture CNN personnalis√©e pour la classification d'images m√©dicales
    """
    
    model = models.Sequential([
        # Bloc 1: Extraction de features de bas niveau
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape, padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Bloc 2: Features de niveau interm√©diaire
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Bloc 3: Features de haut niveau
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Bloc 4: Features complexes
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Classification
        layers.GlobalAveragePooling2D(),  # Alternative √† Flatten + Dense
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation='softmax')
    ])
    
    return model

# Cr√©ation du mod√®le custom
print("üß† Cr√©ation du mod√®le CNN personnalis√©...")
custom_cnn = create_custom_cnn(input_shape=(*IMG_SIZE, 3), num_classes=len(CLASSES))

# Affichage de l'architecture
print("üìã Architecture du mod√®le:")
custom_cnn.summary()

# Visualisation de l'architecture
tf.keras.utils.plot_model(
    custom_cnn, 
    to_file=str(RESULTS_DIR / "custom_cnn_architecture.png"),
    show_shapes=True, 
    show_layer_names=True,
    rankdir='TB'
)

print(f"‚úÖ Diagramme d'architecture sauvegard√©: {RESULTS_DIR / 'custom_cnn_architecture.png'}")

# Configuration de l'optimiseur et compilation
optimizer = optimizers.Adam(learning_rate=LEARNING_RATE)

custom_cnn.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    metrics=['accuracy', 'precision', 'recall']
)

print("‚úÖ Mod√®le compil√© avec succ√®s")

In [None]:
# ===================================
# 7.2 CALLBACKS ET MONITORING
# ===================================

# Configuration des callbacks
def setup_callbacks(model_name, patience=10):
    """Configure les callbacks pour l'entra√Ænement"""
    
    # R√©pertoire pour sauvegarder les mod√®les
    model_save_dir = MODELS_DIR / model_name
    model_save_dir.mkdir(exist_ok=True)
    
    callbacks_list = [
        # Early Stopping
        EarlyStopping(
            monitor='val_accuracy',
            patience=patience,
            restore_best_weights=True,
            verbose=1
        ),
        
        # R√©duction du learning rate
        ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=5,
            min_lr=1e-7,
            verbose=1
        ),
        
        # Sauvegarde du meilleur mod√®le
        ModelCheckpoint(
            filepath=str(model_save_dir / "best_model.h5"),
            monitor='val_accuracy',
            save_best_only=True,
            save_weights_only=False,
            verbose=1
        )
    ]
    
    return callbacks_list

# Configuration des callbacks pour le mod√®le custom
custom_callbacks = setup_callbacks("custom_cnn", patience=10)

print("‚úÖ Callbacks configur√©s:")
for callback in custom_callbacks:
    print(f"   üìã {callback.__class__.__name__}")

# Pr√©paration des g√©n√©rateurs de donn√©es
print("\nüîÑ Pr√©paration des g√©n√©rateurs de donn√©es...")

# G√©n√©rateur d'entra√Ænement avec augmentation
train_generator = train_datagen.flow(
    X_train_processed,
    y_train_categorical,
    batch_size=BATCH_SIZE,
    shuffle=True
)

# G√©n√©rateur de validation
val_generator = val_datagen.flow(
    X_val_processed,
    y_val_categorical,
    batch_size=BATCH_SIZE,
    shuffle=False
)

print(f"‚úÖ G√©n√©rateurs pr√©par√©s:")
print(f"   üìä Train: {len(train_generator)} batches de {BATCH_SIZE}")
print(f"   üìä Validation: {len(val_generator)} batches de {BATCH_SIZE}")

# Fonction d'entra√Ænement
def train_deep_model(model, model_name, train_gen, val_gen, epochs=EPOCHS, callbacks=None):
    """Entra√Æne un mod√®le de deep learning avec monitoring"""
    
    print(f"\nüöÄ Entra√Ænement de {model_name}...")
    print(f"   üìä √âpoques: {epochs}")
    print(f"   üìä Batch size: {BATCH_SIZE}")
    
    # Entra√Ænement
    history = model.fit(
        train_gen,
        epochs=epochs,
        validation_data=val_gen,
        callbacks=callbacks,
        verbose=1
    )
    
    print(f"‚úÖ Entra√Ænement de {model_name} termin√©!")
    
    return history

print("üîß Environnement d'entra√Ænement Deep Learning configur√©")

## üîÑ Section 8: Transfer Learning Implementation

In [None]:
# ===================================
# 8.1 MOD√àLES PR√â-ENTRA√éN√âS AVEC TRANSFER LEARNING
# ===================================

def create_transfer_learning_model(base_model_name, input_shape=(224, 224, 3), num_classes=4, 
                                 freeze_base=True, trainable_layers=0):
    """
    Cr√©e un mod√®le de transfer learning
    
    Args:
        base_model_name: 'VGG16', 'ResNet50', 'EfficientNetB0', ou 'InceptionV3'
        freeze_base: Si True, g√®le les couches du mod√®le de base
        trainable_layers: Nombre de couches √† rendre entra√Ænables (depuis la fin)
    """
    
    # S√©lection du mod√®le de base
    if base_model_name == 'VGG16':
        base_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)
    elif base_model_name == 'ResNet50':
        base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape)
    elif base_model_name == 'EfficientNetB0':
        base_model = EfficientNetB0(weights='imagenet', include_top=False, input_shape=input_shape)
    elif base_model_name == 'InceptionV3':
        base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=input_shape)
    else:
        raise ValueError(f"Mod√®le {base_model_name} non support√©")
    
    # Gel des couches
    if freeze_base:
        base_model.trainable = False
    else:
        # Rendre seulement les derni√®res couches entra√Ænables
        if trainable_layers > 0:
            base_model.trainable = True
            for layer in base_model.layers[:-trainable_layers]:
                layer.trainable = False
    
    # Construction du mod√®le complet
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(num_classes, activation='softmax')
    ])
    
    return model

# Dictionnaire pour stocker les mod√®les de transfer learning
transfer_models = {}

print("üîÑ Cr√©ation des mod√®les de Transfer Learning...")

In [None]:
# ===================================
# 8.2 ENTRA√éNEMENT DES MOD√àLES DE TRANSFER LEARNING
# ===================================

# Configuration des mod√®les √† tester
models_config = [
    {'name': 'VGG16_frozen', 'base': 'VGG16', 'freeze': True, 'trainable': 0},
    {'name': 'ResNet50_frozen', 'base': 'ResNet50', 'freeze': True, 'trainable': 0},
    {'name': 'EfficientNetB0_frozen', 'base': 'EfficientNetB0', 'freeze': True, 'trainable': 0}
]

# Stockage des historiques d'entra√Ænement
training_histories = {}

for config in models_config:
    print(f"\nüîÑ Configuration: {config['name']}")
    
    try:
        # Cr√©ation du mod√®le
        model = create_transfer_learning_model(
            base_model_name=config['base'],
            input_shape=(*IMG_SIZE, 3),
            num_classes=len(CLASSES),
            freeze_base=config['freeze'],
            trainable_layers=config['trainable']
        )
        
        # Compilation
        model.compile(
            optimizer=optimizers.Adam(learning_rate=LEARNING_RATE),
            loss='categorical_crossentropy',
            metrics=['accuracy', 'precision', 'recall']
        )
        
        # Stockage
        transfer_models[config['name']] = model
        
        print(f"‚úÖ {config['name']} cr√©√© et compil√©")
        print(f"   üìä Param√®tres entra√Ænables: {model.count_params():,}")
        
        # Affichage du r√©sum√© pour le premier mod√®le
        if config['name'] == 'VGG16_frozen':
            print(f"\nüìã Exemple d'architecture - {config['name']}:")
            model.summary()
    
    except Exception as e:
        print(f"‚ùå Erreur lors de la cr√©ation de {config['name']}: {e}")

print(f"\n‚úÖ {len(transfer_models)} mod√®les de transfer learning cr√©√©s")

# Entra√Ænement d'un mod√®le de d√©monstration (VGG16)
if 'VGG16_frozen' in transfer_models:
    print("\nüöÄ Entra√Ænement de d√©monstration - VGG16 (epochs r√©duits)...")
    
    # Callbacks pour la d√©mo
    demo_callbacks = setup_callbacks("VGG16_frozen_demo", patience=5)
    
    # Entra√Ænement avec moins d'√©poques pour la d√©mo
    demo_history = train_deep_model(
        transfer_models['VGG16_frozen'],
        "VGG16_frozen",
        train_generator,
        val_generator,
        epochs=10,  # R√©duit pour la d√©mo
        callbacks=demo_callbacks
    )
    
    training_histories['VGG16_frozen'] = demo_history
    
    print("‚úÖ Entra√Ænement de d√©monstration termin√©")

print("\nüí° Note: Pour un entra√Ænement complet, ajustez le nombre d'√©poques selon vos ressources")

## ‚ö° Section 9: Fine-Tuning Pre-trained Models

In [None]:
# ===================================
# 9.1 STRAT√âGIES DE FINE-TUNING
# ===================================

def fine_tune_model(base_model_name, pretrained_model_path=None, 
                   unfreeze_layers=20, fine_tune_lr=1e-5):
    """
    Impl√©mente le fine-tuning d'un mod√®le pr√©-entra√Æn√©
    
    Args:
        base_model_name: Nom du mod√®le de base
        pretrained_model_path: Chemin vers le mod√®le pr√©-entra√Æn√© (optionnel)
        unfreeze_layers: Nombre de couches √† d√©geler pour le fine-tuning
        fine_tune_lr: Learning rate r√©duit pour le fine-tuning
    """
    
    print(f"‚ö° Fine-tuning de {base_model_name}")
    
    # Si un mod√®le pr√©-entra√Æn√© existe, le charger
    if pretrained_model_path and os.path.exists(pretrained_model_path):
        print(f"üìÇ Chargement du mod√®le pr√©-entra√Æn√©: {pretrained_model_path}")
        model = keras.models.load_model(pretrained_model_path)
    else:
        # Cr√©er un nouveau mod√®le si pas de mod√®le pr√©-entra√Æn√©
        print("üîß Cr√©ation d'un nouveau mod√®le pour le fine-tuning")
        model = create_transfer_learning_model(
            base_model_name=base_model_name,
            input_shape=(*IMG_SIZE, 3),
            num_classes=len(CLASSES),
            freeze_base=False,
            trainable_layers=unfreeze_layers
        )
    
    # Phase 1: D√©gel progressif des couches
    print(f"üîì D√©gel des {unfreeze_layers} derni√®res couches du mod√®le de base")
    
    # Identification du mod√®le de base (premi√®re couche)
    base_model = model.layers[0]
    
    # Gel de toutes les couches d'abord
    base_model.trainable = True
    
    # Gel des premi√®res couches, d√©gel des derni√®res
    total_layers = len(base_model.layers)
    freeze_until = total_layers - unfreeze_layers
    
    for i, layer in enumerate(base_model.layers):
        if i < freeze_until:
            layer.trainable = False
        else:
            layer.trainable = True
    
    print(f"   üìä Couches totales: {total_layers}")
    print(f"   üîí Couches gel√©es: {freeze_until}")
    print(f"   üîì Couches entra√Ænables: {unfreeze_layers}")
    
    # Recompilation avec learning rate r√©duit
    model.compile(
        optimizer=optimizers.Adam(learning_rate=fine_tune_lr),
        loss='categorical_crossentropy',
        metrics=['accuracy', 'precision', 'recall']
    )
    
    print(f"‚úÖ Mod√®le recompil√© avec learning rate: {fine_tune_lr}")
    
    return model

# D√©monstration du fine-tuning sur VGG16
print("‚ö° D√©monstration de Fine-Tuning - VGG16")

# Strat√©gies de fine-tuning √† tester
fine_tuning_strategies = [
    {
        'name': 'VGG16_fine_tuned_conservative',
        'base': 'VGG16',
        'unfreeze_layers': 10,
        'lr': 1e-5,
        'description': 'Fine-tuning conservateur - 10 derni√®res couches'
    },
    {
        'name': 'VGG16_fine_tuned_aggressive', 
        'base': 'VGG16',
        'unfreeze_layers': 20,
        'lr': 5e-6,
        'description': 'Fine-tuning agressif - 20 derni√®res couches'
    }
]

# Stockage des mod√®les fine-tun√©s
fine_tuned_models = {}

for strategy in fine_tuning_strategies:
    print(f"\nüîÑ {strategy['description']}")
    
    try:
        # Cr√©ation du mod√®le fine-tun√©
        ft_model = fine_tune_model(
            base_model_name=strategy['base'],
            unfreeze_layers=strategy['unfreeze_layers'],
            fine_tune_lr=strategy['lr']
        )
        
        fine_tuned_models[strategy['name']] = ft_model
        
        # Affichage des informations sur l'entra√Ænabilit√©
        trainable_count = sum([1 for layer in ft_model.layers for sublayer in layer.layers if hasattr(sublayer, 'trainable') and sublayer.trainable])
        total_count = sum([1 for layer in ft_model.layers for sublayer in layer.layers if hasattr(sublayer, 'trainable')])
        
        print(f"   üìä Couches entra√Ænables: {trainable_count}/{total_count}")
        print(f"   üìä Param√®tres entra√Ænables: {ft_model.count_params():,}")
        
    except Exception as e:
        print(f"‚ùå Erreur lors du fine-tuning de {strategy['name']}: {e}")

print(f"\n‚úÖ {len(fine_tuned_models)} mod√®les fine-tun√©s cr√©√©s")

In [None]:
# ===================================
# 9.2 LEARNING RATE SCHEDULING AVANC√â
# ===================================

def create_advanced_callbacks(model_name, strategy_type="fine_tuning"):
    """
    Cr√©e des callbacks avanc√©s pour le fine-tuning
    """
    
    model_save_dir = MODELS_DIR / f"{model_name}_{strategy_type}"
    model_save_dir.mkdir(exist_ok=True)
    
    # Learning rate scheduler personnalis√©
    def lr_schedule(epoch, lr):
        """Planification du learning rate"""
        if epoch < 5:
            return lr
        elif epoch < 15:
            return lr * 0.9
        else:
            return lr * 0.95
    
    callbacks_list = [
        # Early Stopping plus patient pour le fine-tuning
        EarlyStopping(
            monitor='val_accuracy',
            patience=15,
            restore_best_weights=True,
            verbose=1
        ),
        
        # R√©duction automatique du learning rate
        ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.3,
            patience=7,
            min_lr=1e-8,
            verbose=1
        ),
        
        # Learning rate scheduler personnalis√©
        keras.callbacks.LearningRateScheduler(lr_schedule, verbose=1),
        
        # Sauvegarde des meilleurs mod√®les
        ModelCheckpoint(
            filepath=str(model_save_dir / "best_model.h5"),
            monitor='val_accuracy',
            save_best_only=True,
            save_weights_only=False,
            verbose=1
        ),
        
        # Sauvegarde des checkpoints
        ModelCheckpoint(
            filepath=str(model_save_dir / "checkpoint_epoch_{epoch:02d}.h5"),
            save_freq='epoch',
            save_weights_only=True,
            verbose=0
        )
    ]
    
    return callbacks_list

# D√©monstration d'entra√Ænement avec fine-tuning (version r√©duite)
if 'VGG16_fine_tuned_conservative' in fine_tuned_models:
    print("üöÄ D√©monstration de Fine-Tuning avec callbacks avanc√©s...")
    
    # Configuration des callbacks avanc√©s
    ft_callbacks = create_advanced_callbacks("VGG16", "fine_tuning_demo")
    
    # Entra√Ænement avec param√®tres ajust√©s pour le fine-tuning
    print("üìä Configuration pour Fine-Tuning:")
    print("   - Learning rate r√©duit")
    print("   - Patience augment√©e")
    print("   - Callbacks avanc√©s")
    print("   - Epochs r√©duits pour d√©mo")
    
    # Note: Dans un sc√©nario r√©el, vous entra√Æneriez ici le mod√®le
    print("\nüí° Note: Dans un entra√Ænement complet, vous ex√©cuteriez:")
    print("   ft_history = train_deep_model(model, 'VGG16_fine_tuned', train_gen, val_gen, epochs=30)")
    
    print("‚úÖ Configuration de fine-tuning pr√©par√©e")

print("\nüîß Pipeline de Fine-Tuning configur√© avec succ√®s!")

## ü§ù Section 10: Ensemble of Deep Learning Models

In [None]:
# ===================================
# 10.1 ENSEMBLE VOTING POUR DEEP LEARNING
# ===================================

class DeepLearningEnsemble:
    """
    Classe pour cr√©er des ensembles de mod√®les de deep learning
    """
    
    def __init__(self, models_dict, ensemble_method='soft_voting'):
        """
        Args:
            models_dict: Dictionnaire {nom: mod√®le} des mod√®les √† ensembler
            ensemble_method: 'soft_voting', 'hard_voting', ou 'stacking'
        """
        self.models = models_dict
        self.ensemble_method = ensemble_method
        self.model_names = list(models_dict.keys())
        
    def predict_ensemble(self, X, return_individual=False):
        """
        Fait des pr√©dictions avec l'ensemble
        """
        individual_predictions = {}
        all_predictions = []
        
        print(f"üîÑ Pr√©diction avec {len(self.models)} mod√®les...")
        
        # Pr√©dictions individuelles
        for name, model in self.models.items():
            try:
                pred = model.predict(X, verbose=0)
                individual_predictions[name] = pred
                all_predictions.append(pred)
                print(f"   ‚úÖ {name}: {pred.shape}")
            except Exception as e:
                print(f"   ‚ùå Erreur avec {name}: {e}")
                continue
        
        if not all_predictions:
            raise ValueError("Aucune pr√©diction r√©ussie")
        
        # Ensemble des pr√©dictions
        all_predictions = np.array(all_predictions)
        
        if self.ensemble_method == 'soft_voting':
            # Moyenne des probabilit√©s
            ensemble_pred = np.mean(all_predictions, axis=0)
        elif self.ensemble_method == 'hard_voting':
            # Vote majoritaire
            individual_classes = [np.argmax(pred, axis=1) for pred in all_predictions]
            ensemble_classes = []
            for i in range(len(X)):
                votes = [pred[i] for pred in individual_classes]
                ensemble_classes.append(max(set(votes), key=votes.count))
            
            # Conversion en format one-hot
            ensemble_pred = to_categorical(ensemble_classes, num_classes=len(CLASSES))
        
        else:
            raise ValueError(f"M√©thode d'ensemble {self.ensemble_method} non support√©e")
        
        if return_individual:
            return ensemble_pred, individual_predictions
        return ensemble_pred
    
    def evaluate_ensemble(self, X, y_true, return_individual=False):
        """
        √âvalue les performances de l'ensemble
        """
        print(f"üìä √âvaluation de l'ensemble ({self.ensemble_method})...")
        
        # Pr√©dictions
        if return_individual:
            ensemble_pred, individual_preds = self.predict_ensemble(X, return_individual=True)
        else:
            ensemble_pred = self.predict_ensemble(X)
            individual_preds = None
        
        # Conversion en classes pour les m√©triques
        y_true_classes = np.argmax(y_true, axis=1)
        ensemble_classes = np.argmax(ensemble_pred, axis=1)
        
        # M√©triques de l'ensemble
        accuracy = accuracy_score(y_true_classes, ensemble_classes)
        precision, recall, f1, _ = precision_recall_fscore_support(
            y_true_classes, ensemble_classes, average='weighted'
        )
        
        results = {
            'ensemble_accuracy': accuracy,
            'ensemble_precision': precision,
            'ensemble_recall': recall,
            'ensemble_f1': f1,
            'ensemble_predictions': ensemble_pred
        }
        
        # √âvaluation individuelle si demand√©e
        if return_individual and individual_preds:
            individual_results = {}
            for name, pred in individual_preds.items():
                pred_classes = np.argmax(pred, axis=1)
                ind_acc = accuracy_score(y_true_classes, pred_classes)
                individual_results[name] = {
                    'accuracy': ind_acc,
                    'predictions': pred
                }
            results['individual_results'] = individual_results
        
        return results

# D√©monstration d'ensemble avec les mod√®les disponibles
print("ü§ù Cr√©ation d'ensembles de mod√®les Deep Learning...")

# V√©rification des mod√®les disponibles pour l'ensemble
available_models = {}

# Ajout des mod√®les de transfer learning si disponibles
for model_name, model in transfer_models.items():
    available_models[model_name] = model

# Ajout des mod√®les fine-tun√©s si disponibles  
for model_name, model in fine_tuned_models.items():
    available_models[model_name] = model

print(f"üìä Mod√®les disponibles pour l'ensemble: {list(available_models.keys())}")

if len(available_models) >= 2:
    # Cr√©ation d'ensembles avec diff√©rentes m√©thodes
    ensemble_configs = [
        {'method': 'soft_voting', 'description': 'Moyenne des probabilit√©s'},
        {'method': 'hard_voting', 'description': 'Vote majoritaire'}
    ]
    
    ensembles = {}
    
    for config in ensemble_configs:
        ensemble_name = f"DL_Ensemble_{config['method']}"
        ensemble = DeepLearningEnsemble(
            models_dict=available_models,
            ensemble_method=config['method']
        )
        ensembles[ensemble_name] = ensemble
        
        print(f"‚úÖ {ensemble_name} cr√©√©: {config['description']}")
    
    print(f"\nüéØ {len(ensembles)} ensembles de Deep Learning configur√©s")
    
else:
    print("‚ö†Ô∏è Pas assez de mod√®les entra√Æn√©s pour cr√©er un ensemble")
    print("   Entra√Ænez d'abord plusieurs mod√®les de transfer learning")

print("\nüí° Note: L'√©valuation compl√®te des ensembles n√©cessite des mod√®les enti√®rement entra√Æn√©s")

## ‚öôÔ∏è Section 11: Hyperparameter Optimization

In [None]:
# ===================================
# 11.1 OPTIMISATION DES HYPERPARAM√àTRES - M√âTHODES TRADITIONNELLES
# ===================================

print("‚öôÔ∏è Optimisation des hyperparam√®tres pour les m√©thodes d'ensemble...")

# Configuration des espaces de recherche
param_grids = {
    'RandomForest': {
        'n_estimators': [100, 200, 300],
        'max_depth': [10, 15, 20, None],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'max_features': ['sqrt', 'log2', None]
    },
    
    'XGBoost': {
        'n_estimators': [100, 200, 300],
        'learning_rate': [0.01, 0.1, 0.2],
        'max_depth': [3, 5, 7],
        'min_child_weight': [1, 3, 5],
        'subsample': [0.8, 0.9, 1.0],
        'colsample_bytree': [0.8, 0.9, 1.0]
    } if 'xgb' in globals() else {},
    
    'GradientBoosting': {
        'n_estimators': [100, 200, 300],
        'learning_rate': [0.01, 0.1, 0.2],
        'max_depth': [3, 5, 7],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'subsample': [0.8, 0.9, 1.0]
    }
}

def optimize_hyperparameters(model_class, param_grid, X_train, y_train, 
                           cv=3, scoring='accuracy', n_jobs=-1, verbose=True):
    """
    Optimise les hyperparam√®tres avec GridSearchCV
    """
    if not param_grid:
        print("‚ö†Ô∏è Grille de param√®tres vide - optimisation ignor√©e")
        return None, None
    
    print(f"üîç Optimisation avec {len(param_grid)} param√®tres...")
    print(f"   üìä Cross-validation: {cv} folds")
    print(f"   üìä M√©trique: {scoring}")
    
    # Configuration de la recherche
    grid_search = GridSearchCV(
        estimator=model_class,
        param_grid=param_grid,
        cv=cv,
        scoring=scoring,
        n_jobs=n_jobs,
        return_train_score=True,
        verbose=1 if verbose else 0
    )
    
    # Ex√©cution de la recherche
    start_time = datetime.now()
    grid_search.fit(X_train, y_train)
    optimization_time = datetime.now() - start_time
    
    if verbose:
        print(f"‚úÖ Optimisation termin√©e en {optimization_time.total_seconds():.2f}s")
        print(f"üéØ Meilleur score: {grid_search.best_score_:.4f}")
        print(f"‚öôÔ∏è Meilleurs param√®tres: {grid_search.best_params_}")
    
    return grid_search.best_estimator_, grid_search.cv_results_

# Stockage des r√©sultats d'optimisation
optimization_results = {}

# Optimisation pour Random Forest
print("\nüå≥ Optimisation Random Forest...")
if 'RandomForest' in param_grids:
    # Grille r√©duite pour la d√©mo
    rf_demo_grid = {
        'n_estimators': [100, 200],
        'max_depth': [10, 15],
        'min_samples_split': [2, 5],
        'max_features': ['sqrt', None]
    }
    
    best_rf, rf_cv_results = optimize_hyperparameters(
        RandomForestClassifier(random_state=42, n_jobs=-1),
        rf_demo_grid,
        X_train_features_scaled,
        y_train,
        cv=3,
        verbose=True
    )
    
    if best_rf:
        optimization_results['RandomForest'] = {
            'best_model': best_rf,
            'cv_results': rf_cv_results
        }

# Optimisation pour Gradient Boosting
print("\nüöÄ Optimisation Gradient Boosting...")
gb_demo_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.1, 0.2],
    'max_depth': [3, 5]
}

best_gb, gb_cv_results = optimize_hyperparameters(
    GradientBoostingClassifier(random_state=42),
    gb_demo_grid,
    X_train_features_scaled,
    y_train,
    cv=3,
    verbose=True
)

if best_gb:
    optimization_results['GradientBoosting'] = {
        'best_model': best_gb,
        'cv_results': gb_cv_results
    }

print(f"\n‚úÖ {len(optimization_results)} mod√®les optimis√©s")

# √âvaluation des mod√®les optimis√©s
print("\nüìä √âvaluation des mod√®les optimis√©s...")
for model_name, results in optimization_results.items():
    model = results['best_model']
    
    # √âvaluation sur validation
    val_pred = model.predict(X_val_features_scaled)
    val_accuracy = accuracy_score(y_val, val_pred)
    
    print(f"üéØ {model_name} optimis√©:")
    print(f"   üìä Validation Accuracy: {val_accuracy:.4f}")
    
    # Comparaison avec le mod√®le baseline
    if model_name in baseline_results:
        baseline_acc = baseline_results[model_name]['val_accuracy']
        improvement = val_accuracy - baseline_acc
        print(f"   üìà Am√©lioration: {improvement:+.4f}")
    
    # Stockage dans baseline_results pour comparaison
    baseline_results[f"{model_name}_optimized"] = {
        'model': model,
        'val_accuracy': val_accuracy,
        'y_val_pred': val_pred
    }

In [None]:
# ===================================
# 11.2 RECHERCHE AL√âATOIRE ET ANALYSE DES HYPERPARAM√àTRES
# ===================================

def randomized_hyperparameter_search(model_class, param_distributions, 
                                    X_train, y_train, n_iter=20, cv=3):
    """
    Recherche al√©atoire d'hyperparam√®tres avec RandomizedSearchCV
    """
    print(f"üé≤ Recherche al√©atoire avec {n_iter} it√©rations...")
    
    random_search = RandomizedSearchCV(
        estimator=model_class,
        param_distributions=param_distributions,
        n_iter=n_iter,
        cv=cv,
        scoring='accuracy',
        n_jobs=-1,
        random_state=42,
        return_train_score=True,
        verbose=1
    )
    
    start_time = datetime.now()
    random_search.fit(X_train, y_train)
    search_time = datetime.now() - start_time
    
    print(f"‚úÖ Recherche termin√©e en {search_time.total_seconds():.2f}s")
    print(f"üéØ Meilleur score: {random_search.best_score_:.4f}")
    print(f"‚öôÔ∏è Meilleurs param√®tres: {random_search.best_params_}")
    
    return random_search.best_estimator_, random_search.cv_results_

# Distributions pour la recherche al√©atoire
from scipy.stats import randint, uniform

rf_param_distributions = {
    'n_estimators': randint(50, 300),
    'max_depth': [5, 10, 15, 20, None],
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 10),
    'max_features': ['sqrt', 'log2', None],
    'bootstrap': [True, False]
}

# Recherche al√©atoire pour Random Forest
print("\nüé≤ Recherche al√©atoire - Random Forest...")
best_rf_random, rf_random_results = randomized_hyperparameter_search(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    rf_param_distributions,
    X_train_features_scaled,
    y_train,
    n_iter=10,  # R√©duit pour la d√©mo
    cv=3
)

# Analyse des r√©sultats de recherche
def analyze_hyperparameter_results(cv_results, param_name, model_name):
    """
    Analyse l'impact d'un hyperparam√®tre sur les performances
    """
    if f'param_{param_name}' not in cv_results:
        print(f"‚ö†Ô∏è Param√®tre {param_name} non trouv√© dans les r√©sultats")
        return
    
    # Extraction des donn√©es
    param_values = cv_results[f'param_{param_name}']
    mean_scores = cv_results['mean_test_score']
    
    # Cr√©ation du DataFrame pour l'analyse
    df_analysis = pd.DataFrame({
        'param_value': param_values,
        'mean_score': mean_scores
    })
    
    # Gestion des valeurs None
    df_analysis = df_analysis[df_analysis['param_value'].notna()]
    
    if len(df_analysis) == 0:
        print(f"‚ö†Ô∏è Pas de donn√©es valides pour {param_name}")
        return
    
    # Visualisation
    plt.figure(figsize=(10, 6))
    
    if df_analysis['param_value'].dtype == 'object':
        # Param√®tre cat√©goriel
        df_grouped = df_analysis.groupby('param_value')['mean_score'].agg(['mean', 'std']).reset_index()
        plt.bar(range(len(df_grouped)), df_grouped['mean'], 
                yerr=df_grouped['std'], alpha=0.7, capsize=5)
        plt.xticks(range(len(df_grouped)), df_grouped['param_value'], rotation=45)
    else:
        # Param√®tre num√©rique
        plt.scatter(df_analysis['param_value'], df_analysis['mean_score'], alpha=0.6)
        plt.xlabel(param_name)
    
    plt.title(f'Impact de {param_name} sur les performances - {model_name}')
    plt.ylabel('Score de validation')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# Analyse des r√©sultats de Random Forest
if rf_random_results:
    print("\nüìä Analyse des hyperparam√®tres - Random Forest...")
    
    # Analyse de diff√©rents param√®tres
    important_params = ['n_estimators', 'max_depth', 'min_samples_split']
    
    for param in important_params:
        analyze_hyperparameter_results(rf_random_results, param, 'Random Forest')

# Comparaison des m√©thodes d'optimisation
print("\nüìà Comparaison des m√©thodes d'optimisation:")

comparison_data = []

# Baseline
for model_name in ['Random Forest', 'Gradient Boosting']:
    if model_name in baseline_results:
        comparison_data.append({
            'Model': model_name,
            'Method': 'Baseline',
            'Accuracy': baseline_results[model_name]['val_accuracy']
        })

# Optimis√©s
for model_name in ['RandomForest', 'GradientBoosting']:
    if f"{model_name}_optimized" in baseline_results:
        comparison_data.append({
            'Model': model_name,
            'Method': 'Grid Search',
            'Accuracy': baseline_results[f"{model_name}_optimized"]['val_accuracy']
        })

# Random Search (Random Forest seulement)
if best_rf_random:
    rf_random_acc = accuracy_score(y_val, best_rf_random.predict(X_val_features_scaled))
    comparison_data.append({
        'Model': 'RandomForest',
        'Method': 'Random Search',
        'Accuracy': rf_random_acc
    })

if comparison_data:
    df_optimization_comparison = pd.DataFrame(comparison_data)
    
    # Visualisation
    plt.figure(figsize=(12, 6))
    sns.barplot(data=df_optimization_comparison, x='Model', y='Accuracy', hue='Method')
    plt.title('Comparaison des M√©thodes d\'Optimisation')
    plt.ylabel('Validation Accuracy')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    print("üìä R√©sultats de comparaison:")
    print(df_optimization_comparison)

print("\n‚úÖ Optimisation des hyperparam√®tres termin√©e")

## üìä Section 12: Model Evaluation and Comparison

In [None]:
# ===================================
# 12.1 M√âTRIQUES AVANC√âES ET MATRICES DE CONFUSION
# ===================================

def comprehensive_model_evaluation(y_true, y_pred, model_name, class_names):
    """
    √âvaluation compl√®te d'un mod√®le avec m√©triques avanc√©es
    """
    print(f"\nüìä √âvaluation compl√®te - {model_name}")
    print("=" * 50)
    
    # M√©triques globales
    accuracy = accuracy_score(y_true, y_pred)
    precision, recall, f1, support = precision_recall_fscore_support(y_true, y_pred, average='weighted')
    
    print(f"üéØ Accuracy: {accuracy:.4f}")
    print(f"üéØ Precision (weighted): {precision:.4f}")
    print(f"üéØ Recall (weighted): {recall:.4f}")
    print(f"üéØ F1-Score (weighted): {f1:.4f}")
    
    # M√©triques par classe
    print(f"\nüìã Rapport de classification:")
    print(classification_report(y_true, y_pred, target_names=class_names))
    
    # Matrice de confusion
    cm = confusion_matrix(y_true, y_pred)
    
    # Visualisation de la matrice de confusion
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=class_names, yticklabels=class_names)
    plt.title(f'Matrice de Confusion - {model_name}')
    plt.xlabel('Pr√©dictions')
    plt.ylabel('Vraies √©tiquettes')
    plt.tight_layout()
    plt.show()
    
    # M√©triques par classe d√©taill√©es
    class_metrics = {}
    for i, class_name in enumerate(class_names):
        class_precision = precision_recall_fscore_support(y_true, y_pred, labels=[i], average=None)[0][0]
        class_recall = precision_recall_fscore_support(y_true, y_pred, labels=[i], average=None)[1][0]
        class_f1 = precision_recall_fscore_support(y_true, y_pred, labels=[i], average=None)[2][0]
        
        class_metrics[class_name] = {
            'precision': class_precision,
            'recall': class_recall,
            'f1_score': class_f1,
            'support': support[i]
        }
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'confusion_matrix': cm,
        'class_metrics': class_metrics
    }

# √âvaluation de tous les mod√®les disponibles
print("üìä √âvaluation compl√®te de tous les mod√®les...")

evaluation_results = {}

# √âvaluation des mod√®les baseline
for model_name, results in baseline_results.items():
    if 'y_val_pred' in results:
        eval_results = comprehensive_model_evaluation(
            y_val, 
            results['y_val_pred'], 
            model_name, 
            CLASSES
        )
        evaluation_results[model_name] = eval_results

In [None]:
# ===================================
# 12.2 COMPARAISON GLOBALE DES MOD√àLES
# ===================================

def create_models_comparison_dashboard():
    """
    Cr√©e un dashboard de comparaison des mod√®les
    """
    if not evaluation_results:
        print("‚ö†Ô∏è Aucun r√©sultat d'√©valuation disponible")
        return
    
    # Compilation des m√©triques
    comparison_data = []
    for model_name, results in evaluation_results.items():
        comparison_data.append({
            'Model': model_name,
            'Accuracy': results['accuracy'],
            'Precision': results['precision'],
            'Recall': results['recall'],
            'F1_Score': results['f1_score'],
            'Type': get_model_type(model_name)
        })
    
    df_comparison = pd.DataFrame(comparison_data)
    
    # Tri par accuracy
    df_comparison = df_comparison.sort_values('Accuracy', ascending=False)
    
    print("üèÜ Classement des mod√®les par Accuracy:")
    print(df_comparison[['Model', 'Accuracy', 'F1_Score', 'Type']].to_string(index=False))
    
    # Visualisations
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Dashboard de Comparaison des Mod√®les', fontsize=16, fontweight='bold')
    
    # 1. Accuracy par mod√®le
    df_sorted = df_comparison.sort_values('Accuracy', ascending=True)
    bars = axes[0, 0].barh(range(len(df_sorted)), df_sorted['Accuracy'], 
                          color=plt.cm.viridis(np.linspace(0, 1, len(df_sorted))))
    axes[0, 0].set_yticks(range(len(df_sorted)))
    axes[0, 0].set_yticklabels(df_sorted['Model'], fontsize=8)
    axes[0, 0].set_xlabel('Accuracy')
    axes[0, 0].set_title('Accuracy par Mod√®le')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Ajout des valeurs sur les barres
    for i, bar in enumerate(bars):
        width = bar.get_width()
        axes[0, 0].text(width + 0.001, bar.get_y() + bar.get_height()/2, 
                       f'{width:.3f}', ha='left', va='center', fontsize=8)
    
    # 2. F1-Score vs Accuracy
    scatter = axes[0, 1].scatter(df_comparison['Accuracy'], df_comparison['F1_Score'], 
                                c=df_comparison.index, cmap='viridis', s=100, alpha=0.7)
    axes[0, 1].set_xlabel('Accuracy')
    axes[0, 1].set_ylabel('F1-Score')
    axes[0, 1].set_title('F1-Score vs Accuracy')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Ligne diagonale de r√©f√©rence
    min_val = min(df_comparison['Accuracy'].min(), df_comparison['F1_Score'].min())
    max_val = max(df_comparison['Accuracy'].max(), df_comparison['F1_Score'].max())
    axes[0, 1].plot([min_val, max_val], [min_val, max_val], 'k--', alpha=0.5)
    
    # 3. M√©triques par type de mod√®le
    df_melted = df_comparison.melt(id_vars=['Model', 'Type'], 
                                  value_vars=['Accuracy', 'Precision', 'Recall', 'F1_Score'],
                                  var_name='Metric', value_name='Score')
    
    sns.boxplot(data=df_melted, x='Type', y='Score', hue='Metric', ax=axes[1, 0])
    axes[1, 0].set_title('Distribution des M√©triques par Type')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    
    # 4. Heatmap des corr√©lations entre m√©triques
    metrics_corr = df_comparison[['Accuracy', 'Precision', 'Recall', 'F1_Score']].corr()
    sns.heatmap(metrics_corr, annot=True, cmap='coolwarm', center=0, 
                square=True, ax=axes[1, 1])
    axes[1, 1].set_title('Corr√©lations entre M√©triques')
    
    plt.tight_layout()
    plt.show()
    
    return df_comparison

# Cr√©ation du dashboard
print("üìä Cr√©ation du dashboard de comparaison...")
comparison_df = create_models_comparison_dashboard()

# Analyse des meilleures performances par cat√©gorie
if comparison_df is not None and not comparison_df.empty:
    print("\nüèÜ Meilleurs mod√®les par cat√©gorie:")
    
    categories = ['Baseline', 'Bagging', 'Boosting']
    for category in categories:
        cat_models = comparison_df[comparison_df['Type'] == category]
        if not cat_models.empty:
            best_model = cat_models.loc[cat_models['Accuracy'].idxmax()]
            print(f"{category}: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")
    
    # Mod√®le global champion
    best_overall = comparison_df.loc[comparison_df['Accuracy'].idxmax()]
    print(f"\nüëë Champion global: {best_overall['Model']}")
    print(f"   üìä Accuracy: {best_overall['Accuracy']:.4f}")
    print(f"   üìä F1-Score: {best_overall['F1_Score']:.4f}")
    print(f"   üìä Type: {best_overall['Type']}")

print("\n‚úÖ √âvaluation comparative termin√©e")

## üé® Section 13: Prediction Visualization

In [None]:
# ===================================
# 13.1 VISUALISATION DES PR√âDICTIONS ET INTERPR√âTABILIT√â
# ===================================

def visualize_predictions_with_confidence(model, X_samples, y_true, sample_indices, 
                                        model_name, class_names, is_deep_learning=False):
    """
    Visualise les pr√©dictions avec scores de confiance
    """
    n_samples = len(sample_indices)
    fig, axes = plt.subplots(2, n_samples//2, figsize=(16, 8))
    fig.suptitle(f'Pr√©dictions et Confiance - {model_name}', fontsize=16, fontweight='bold')
    
    if is_deep_learning:
        # Pour les mod√®les de deep learning
        predictions_proba = model.predict(X_samples, verbose=0)
        predictions = np.argmax(predictions_proba, axis=1)
    else:
        # Pour les mod√®les traditionnels avec features
        X_features = extract_traditional_features(X_samples)
        X_features_scaled = scaler.transform(X_features)
        predictions_proba = model.predict_proba(X_features_scaled)
        predictions = model.predict(X_features_scaled)
    
    for idx, sample_idx in enumerate(sample_indices):
        row = idx // (n_samples//2)
        col = idx % (n_samples//2)
        
        # Image
        axes[row, col].imshow(X_samples[sample_idx])
        
        # Informations sur la pr√©diction
        true_label = class_names[y_true[sample_idx]]
        pred_label = class_names[predictions[sample_idx]]
        confidence = np.max(predictions_proba[sample_idx])
        
        # Couleur selon la justesse de la pr√©diction
        color = 'green' if true_label == pred_label else 'red'
        
        title = f'Vraie: {true_label}\nPr√©d: {pred_label}\nConf: {confidence:.3f}'
        axes[row, col].set_title(title, color=color, fontsize=10)
        axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Graphique des scores de confiance par classe
    fig, ax = plt.subplots(1, 1, figsize=(12, 6))
    
    for i, sample_idx in enumerate(sample_indices):
        proba_scores = predictions_proba[sample_idx]
        x_pos = np.arange(len(class_names)) + i * 0.1
        
        bars = ax.bar(x_pos, proba_scores, width=0.1, alpha=0.7, 
                     label=f'√âchantillon {sample_idx+1}')
        
        # Highlight de la classe pr√©dite
        max_idx = np.argmax(proba_scores)
        bars[max_idx].set_color('red')
        bars[max_idx].set_alpha(1.0)
    
    ax.set_xlabel('Classes')
    ax.set_ylabel('Score de Confiance')
    ax.set_title(f'Scores de Confiance par Classe - {model_name}')
    ax.set_xticks(np.arange(len(class_names)) + 0.2)
    ax.set_xticklabels(class_names, rotation=45)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# D√©monstration avec le meilleur mod√®le traditionnel
if baseline_results and X_val_processed is not None:
    print("üé® Visualisation des pr√©dictions...")
    
    # S√©lection d'√©chantillons pour la visualisation
    n_viz_samples = 6
    sample_indices = np.random.choice(len(X_val_processed), n_viz_samples, replace=False)
    
    # R√©cup√©ration du meilleur mod√®le traditionnel
    best_traditional_model = None
    best_traditional_name = ""
    best_score = 0
    
    for model_name, results in baseline_results.items():
        if 'val_accuracy' in results and results['val_accuracy'] > best_score:
            best_score = results['val_accuracy']
            best_traditional_model = results['model']
            best_traditional_name = model_name
    
    if best_traditional_model:
        print(f"üìä Visualisation avec le meilleur mod√®le: {best_traditional_name}")
        
        # √âchantillons pour la visualisation
        X_viz = X_val_processed[sample_indices]
        y_viz = y_val[sample_indices]
        
        # Visualisation
        visualize_predictions_with_confidence(
            best_traditional_model,
            X_viz,
            y_viz,
            range(len(sample_indices)),
            best_traditional_name,
            CLASSES,
            is_deep_learning=False
        )
    
    else:
        print("‚ö†Ô∏è Aucun mod√®le traditionnel disponible pour la visualisation")

print("‚úÖ Visualisations cr√©√©es")

## üéØ Conclusion et R√©capitulatif

### üìä **R√©sum√© du Notebook**

Ce notebook complet a d√©montr√© les techniques avanc√©es de machine learning pour la classification d'images m√©dicales COVID-19 :

#### üîß **Techniques Impl√©ment√©es :**

1. **üì¶ M√©thodes d'Ensemble - Bagging :**
   - Random Forest optimis√©
   - Extra Trees (Extremely Randomized Trees)
   - Bagging avec SVM et Logistic Regression
   - Analyse d'importance des features

2. **üöÄ M√©thodes d'Ensemble - Boosting :**
   - AdaBoost avec arbres de d√©cision
   - Gradient Boosting Machine (GBM)
   - XGBoost (si disponible)
   - Comparaison des performances

3. **üß† Deep Learning :**
   - CNN personnalis√© from scratch
   - Transfer Learning (VGG16, ResNet50, EfficientNetB0)
   - Fine-Tuning avec strat√©gies avanc√©es
   - Ensemble de mod√®les de deep learning

4. **‚öôÔ∏è Optimisation :**
   - Grid Search pour hyperparam√®tres
   - Random Search alternatif
   - Cross-validation stratifi√©e
   - Learning rate scheduling

5. **üìä √âvaluation :**
   - M√©triques compl√®tes (Accuracy, Precision, Recall, F1)
   - Matrices de confusion d√©taill√©es
   - Comparaisons visuelles
   - Dashboard de performances

#### üí° **Points Cl√©s Appris :**

- **Bagging** : R√©duit la variance, bon pour overfitting
- **Boosting** : R√©duit le biais, s√©quentiel et puissant
- **Transfer Learning** : Efficace pour datasets m√©dicaux limit√©s
- **Fine-Tuning** : Am√©liore les performances avec attention aux learning rates
- **Ensemble Deep Learning** : Combine les forces de diff√©rents mod√®les

#### üõ†Ô∏è **Recommandations pour la Production :**

1. **Entra√Ænement complet** : Utilisez plus d'√©poques (50-100) pour les mod√®les DL
2. **Validation externe** : Testez sur un dataset externe pour validation
3. **Monitoring** : Impl√©mentez un suivi des performances en temps r√©el
4. **Explicabilit√©** : Ajoutez des techniques comme LIME ou GRAD-CAM
5. **D√©ploiement** : Consid√©rez l'optimisation des mod√®les (quantization, pruning)

### üéØ **Prochaines √âtapes Sugg√©r√©es :**

1. **Augmentation de donn√©es avanc√©e** : Techniques sp√©cifiques au m√©dical
2. **Architecture personnalis√©e** : CNN adapt√© aux radiographies
3. **M√©ta-learning** : Apprentissage sur diff√©rents types d'images m√©dicales
4. **Federated Learning** : Entra√Ænement distribu√© pr√©servant la confidentialit√©
5. **Validation clinique** : Collaboration avec professionnels de sant√©

---

**üèÜ F√©licitations ! Vous avez maintenant une base solide pour les techniques avanc√©es de ML dans le domaine m√©dical.**