# ü§ñ Entra√Ænement & Comparaison de Mod√®les Profonds ‚Äì DCASE 2024

Ce notebook automatise l'entra√Ænement et l'√©valuation de plusieurs architectures de deep learning pour l‚Äôanalyse des sons :

- Autoencodeur dense (reconstruction)
- Autoencodeur convolutionnel
- CNN simple
- CNN profond
- LeNet modifi√©

Les mod√®les sont compar√©s √† partir :
- de leurs courbes de convergence (loss/accuracy)
- de leur matrice de confusion
- de leurs scores globaux sur un ensemble test√©

üìÅ Les mod√®les sont export√©s (`.h5`) et pr√™ts √† √™tre int√©gr√©s √† l‚Äôinterface Streamlit.


In [1]:
# üìå Librairies standards
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import librosa
import librosa.display
from pathlib import Path

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# üìå Configuration globale
import warnings
warnings.filterwarnings('ignore')

# üìÇ Ajout dynamique du r√©pertoire principal au PYTHONPATH
project_root = Path.cwd().parent  # DCASE2024_ASD_Project/
sys.path.append(str(project_root))


# üîß Projet
from src import config, processing
from src.utils.logger_utils import logger
from src.models import baseline_models

In [2]:
# üß± Param√®tres globaux
BATCH_SIZE = 32
EPOCHS = 50
RANDOM_STATE = 42
IMG_SIZE = (64, 64)  # Pour les spectrogrammes en entr√©e CNN
CHANNELS = 1         # Format (H, W, C)


## üß© Bloc 1 ‚Äì Extraction des spectrogrammes √† partir des fichiers `.wav`

Nous allons :
- Parcourir tous les fichiers `.wav` du dossier `dev_data/`
- Extraire des repr√©sentations 2D (MFCC, log-Mel)
- G√©n√©rer un ensemble `X` (features 2D) et `y` (labels binaires)

Format final attendu : `X.shape = (N, H, W, C)`


In [3]:
def load_wav_and_extract_features(path, feature_type="mel", img_size=(64, 64)):
    """
    Charge un fichier audio et extrait un spectrogramme 2D (Mel ou MFCC).

    Args:
        path (str or Path): Chemin vers le fichier .wav
        feature_type (str): "mel" ou "mfcc"
        img_size (tuple): Taille finale de l'image

    Returns:
        np.ndarray: image 2D (H, W)
        int: label (0 = normal, 1 = anomaly)
    """
    signal, sr = librosa.load(path, sr=config.SR)
    if feature_type == "mfcc":
        feat = librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=config.N_MFCC)
    elif feature_type == "mel":
        mel = librosa.feature.melspectrogram(y=signal, sr=sr, n_mels=config.N_MELS)
        feat = librosa.power_to_db(mel, ref=np.max)
    else:
        raise ValueError("feature_type must be 'mel' or 'mfcc'")

    # Resize (padding/truncation) to (H, W)
    feat = librosa.util.fix_length(feat, size=img_size[1], axis=1)
    feat = feat[:img_size[0], :]  # crop height if necessary

    # Normalisation
    feat = (feat - np.min(feat)) / (np.max(feat) - np.min(feat) + 1e-8)

    # Get label
    label = 1 if "anomaly" in path.name else 0

    return feat, label


## üì¶ Bloc 2 ‚Äì G√©n√©ration de l'ensemble complet (X, y)

Nous allons maintenant :
- Lire tous les fichiers `.wav`
- Extraire une image spectrogramme 2D pour chacun
- Cr√©er le jeu de donn√©es :
  - `X` : images (H, W, C)
  - `y` : 0 = normal, 1 = anomaly


In [None]:
from tqdm import tqdm

# üìÇ Dossier contenant les fichiers .wav
audio_dir = config.DCASE_DIR / "dev_data"
all_audio_files = sorted(audio_dir.glob("**/*.wav"))

X_list = []
y_list = []

# üéõÔ∏è Type de spectrogramme √† utiliser
FEATURE_TYPE = "mel"  # ou "mfcc"

for path in tqdm(all_audio_files, desc="Extraction audio"):
    try:
        spec, label = load_wav_and_extract_features(path, feature_type=FEATURE_TYPE, img_size=IMG_SIZE)
        X_list.append(spec)
        y_list.append(label)
    except Exception as e:
        logger.warning(f"Erreur avec {path.name}: {e}")

# üìê Format final : (N, H, W, C)
X = np.array(X_list)[..., np.newaxis]  # Ajoute canal unique (C=1)
y = np.array(y_list)

logger.info(f"‚úÖ Dataset g√©n√©r√© : X shape = {X.shape}, y shape = {y.shape}")


## üß™ Bloc 3 ‚Äì Split & Pr√©paration des donn√©es pour entra√Ænement

Nous d√©coupons les donn√©es (80/20) de mani√®re stratifi√©e selon le label (normal/anomaly)  
et pr√©parons les cibles pour les mod√®les de classification (encodage one-hot).


In [None]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

# ‚ö†Ô∏è S√©paration supervision / non supervision (pour AE)
X_normal = X[y == 0]

# ‚úÖ Split pour les mod√®les supervis√©s
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=RANDOM_STATE
)

# üî¢ Encodage one-hot pour classifieurs CNN
y_train_cat = to_categorical(y_train, num_classes=2)
y_test_cat = to_categorical(y_test, num_classes=2)

logger.info(f"‚úÖ Split termin√© : X_train={X_train.shape}, y_train={y_train_cat.shape}")


## üß† Bloc 4 ‚Äì Entra√Ænement automatique des mod√®les profonds

Nous entra√Ænons les architectures suivantes :
- Autoencodeur dense (`autoencoder_model`)
- Autoencodeur convolutionnel (`autoencoder`)
- Classifieur CNN simple
- Classifieur CNN profond
- LeNet modifi√©

üìÅ Tous les mod√®les sont sauvegard√©s pour usage ult√©rieur dans Streamlit.


In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

models_dict = {}

# üìÅ Dossier de sauvegarde
save_dir = config.MODEL_DIR / "deep"
save_dir.mkdir(parents=True, exist_ok=True)

# üîÅ Liste des mod√®les √† entra√Æner
architectures = {
    "AE_Dense": (autoencoder_model(X_normal[0].shape), X_normal, X_normal),
    "AE_Conv": (autoencoder(shape=(*IMG_SIZE, CHANNELS)), X_normal, X_normal),
    "CNN_Simple": (cnn_simple_model((*IMG_SIZE, CHANNELS), 2), X_train, y_train_cat),
    "CNN_Deep": (cnn_model((*IMG_SIZE, CHANNELS), 2), X_train, y_train_cat),
    "LeNet": (LeNet_model((*IMG_SIZE, CHANNELS), 2), X_train, y_train_cat),
}

for name, (model, X_in, y_in) in architectures.items():
    logger.info(f"üöÄ Entra√Ænement du mod√®le : {name}")

    # Callbacks
    ckpt_path = save_dir / f"{name}.h5"
    checkpoint = ModelCheckpoint(ckpt_path, monitor="val_loss", save_best_only=True, verbose=0)
    earlystop = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)

    # Ajuster y_val en fonction AE ou classifier
    if "AE" in name:
        X_val = X_in[int(len(X_in)*0.8):]
        y_val = X_val
        X_train_sub = X_in[:int(len(X_in)*0.8)]
        y_train_sub = X_train_sub
    else:
        X_train_sub, X_val, y_train_sub, y_val = train_test_split(X_in, y_in, test_size=0.2, stratify=y_in, random_state=RANDOM_STATE)

    # Fit
    history = model.fit(
        X_train_sub, y_train_sub,
        validation_data=(X_val, y_val),
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        verbose=1,
        callbacks=[checkpoint, earlystop]
    )

    models_dict[name] = model
    plot_model_history(history)


## üìä Bloc 5 ‚Äì √âvaluation des mod√®les de classification (CNNs)

Ce bloc charge les mod√®les classifieurs entra√Æn√©s (CNN_Simple, CNN_Deep, LeNet)  
et √©value leur performance sur l‚Äôensemble de test : `X_test`, `y_test_cat`


In [None]:
from sklearn.metrics import accuracy_score, f1_score

# üîÅ Nom + fichier + label ground truth
cnn_models = ["CNN_Simple", "CNN_Deep", "LeNet"]
eval_results = []

for name in cnn_models:
    model_path = save_dir / f"{name}.h5"
    model = tf.keras.models.load_model(model_path)

    # üîç Pr√©diction
    y_pred = model.predict(X_test)
    y_pred_class = np.argmax(y_pred, axis=1)
    y_true_class = np.argmax(y_test_cat, axis=1)

    acc = accuracy_score(y_true_class, y_pred_class)
    f1 = f1_score(y_true_class, y_pred_class)

    eval_results.append({
        "Model": name,
        "Accuracy": acc,
        "F1 Score": f1
    })

    # üìâ Rapport complet
    print(f"üîç Rapport pour : {name}")
    print(classification_report(y_true_class, y_pred_class, target_names=["Normal", "Anomaly"]))
    plot_confusion_matrix(y_true_class, y_pred_class, class_names=["Normal", "Anomaly"], title=f"{name} ‚Äì Confusion Matrix")

# üìä Tableau r√©sum√©
df_eval = pd.DataFrame(eval_results).sort_values("F1 Score", ascending=False).round(4)
display(df_eval)


## üì§ Bloc 6 ‚Äì Export des r√©sultats des mod√®les CNN

Les scores d'√©valuation (`Accuracy`, `F1 Score`) de tous les mod√®les classifieurs sont sauvegard√©s dans un fichier CSV `cnn_evaluation_summary.csv`


In [None]:
# üìÅ Dossier d'export
eval_path = config.PREDICTIONS_DIR / "cnn_evaluation_summary.csv"
df_eval.to_csv(eval_path, index=False)
logger.info(f"‚úÖ R√©sultats export√©s dans : {eval_path}")
display(df_eval)
