# üéµ Classification des Genres Musicaux
## Notebook 3: Mod√©lisation et Entra√Ænement

**Objectif:** Entra√Æner et comparer diff√©rents mod√®les de machine learning.

---

## 1. Configuration et Imports

In [None]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

sys.path.insert(0, '..')

from src.config import Config
from src.models import ModelTrainer
from src.evaluation import Evaluator
from src.visualization import Visualizer

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

# Fixer la graine al√©atoire
np.random.seed(Config.RANDOM_STATE)

print("‚úÖ Imports r√©ussis!")

## 2. Chargement des Features

In [None]:
# Charger les features extraites
features_path = Config.DATA_PROCESSED / Config.FEATURES_FILE

if features_path.exists():
    features_df = pd.read_csv(features_path)
    print(f"‚úÖ Features charg√©es: {len(features_df)} fichiers, {len(features_df.columns)-2} caract√©ristiques")
else:
    print(f"‚ùå Fichier non trouv√©: {features_path}")
    print("   Ex√©cutez d'abord le notebook 02_feature_extraction.ipynb")

In [None]:
features_df.head()

## 3. Pr√©paration des Donn√©es

In [None]:
# Initialiser le trainer
trainer = ModelTrainer()

# Pr√©parer les donn√©es
X_train, X_val, X_test, y_train, y_val, y_test = trainer.prepare_data(features_df)

In [None]:
# Afficher les shapes
print(f"\nüìä Dimensions des donn√©es:")
print(f"   X_train: {X_train.shape}")
print(f"   X_val:   {X_val.shape}")
print(f"   X_test:  {X_test.shape}")

## 4. Mod√®les Disponibles

In [None]:
# Afficher les mod√®les disponibles
trainer.print_models_summary()

## 5. Entra√Ænement de Tous les Mod√®les

In [None]:
# Entra√Æner tous les mod√®les
results_df = trainer.train_all_models(X_train, y_train, X_val, y_val)

In [None]:
# Afficher les r√©sultats
print("\nüìä R√©sultats de l'entra√Ænement:")
print(results_df.to_string(index=False))

In [None]:
# Visualiser les r√©sultats
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(results_df))
width = 0.35

bars1 = ax.bar(x - width/2, results_df['train_accuracy'], width, label='Train', color='steelblue')
bars2 = ax.bar(x + width/2, results_df['val_accuracy'], width, label='Validation', color='coral')

ax.set_xlabel('Mod√®le')
ax.set_ylabel('Accuracy')
ax.set_title('Comparaison des Mod√®les', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(results_df['model_name'], rotation=45, ha='right')
ax.legend()
ax.set_ylim(0, 1)
ax.axhline(y=0.8, color='green', linestyle='--', alpha=0.5, label='80%')

plt.tight_layout()
plt.show()

## 6. Cross-Validation

In [None]:
# Combiner X_train et X_val pour la cross-validation
X_combined = np.vstack([X_train, X_val])
y_combined = np.concatenate([y_train, y_val])

print(f"Donn√©es combin√©es: {X_combined.shape}")

In [None]:
# Cross-validation sur les 3 meilleurs mod√®les
top_models = results_df.head(3)['model_name'].tolist()
print(f"\nüèÜ Top 3 mod√®les: {top_models}")

cv_results = []
for model_name in top_models:
    result = trainer.cross_validate(model_name, X_combined, y_combined, cv=5)
    cv_results.append({
        'model': model_name,
        'cv_mean': result['cv_mean'],
        'cv_std': result['cv_std']
    })

cv_df = pd.DataFrame(cv_results)
print("\nüìä R√©sultats Cross-Validation:")
print(cv_df.to_string(index=False))

## 7. Optimisation des Hyperparam√®tres (Optionnel)

In [None]:
# Optimiser le meilleur mod√®le (optionnel, peut prendre du temps)
# D√©commentez pour ex√©cuter

# best_model_name = trainer.best_model_name
# print(f"\nüîß Optimisation de {best_model_name}...")
# tuning_results = trainer.hyperparameter_tuning(best_model_name, X_combined, y_combined)

## 8. √âvaluation sur le Test Set

In [None]:
# √âvaluer le meilleur mod√®le
evaluator = Evaluator()

best_model = trainer.best_model_name
y_pred = trainer.predict(best_model, X_test)

# Afficher le rapport
evaluator.print_evaluation(best_model, y_test, y_pred)

In [None]:
# Matrice de confusion
evaluator.plot_confusion_matrix(
    y_test, y_pred,
    normalize=True,
    title=f"Matrice de Confusion - {best_model}"
)
plt.show()

In [None]:
# Rapport de classification visuel
evaluator.plot_classification_report(
    y_test, y_pred,
    title=f"Rapport de Classification - {best_model}"
)
plt.show()

## 9. Analyse des Erreurs

In [None]:
# Paires de genres les plus confondues
confused_pairs = evaluator.get_most_confused_pairs(y_test, y_pred, top_n=5)

print("\n‚ö†Ô∏è Genres les plus souvent confondus:")
for real, pred, count in confused_pairs:
    print(f"   {real} ‚Üí {pred}: {count} erreurs")

In [None]:
# Visualiser les confusions
fig, ax = plt.subplots(figsize=(10, 5))

pairs = [f"{r}‚Üí{p}" for r, p, c in confused_pairs]
counts = [c for r, p, c in confused_pairs]

ax.barh(pairs, counts, color='coral')
ax.set_xlabel('Nombre d\'erreurs')
ax.set_title('Paires de Genres les Plus Confondues', fontsize=14, fontweight='bold')

for i, count in enumerate(counts):
    ax.text(count + 0.1, i, str(count), va='center')

plt.tight_layout()
plt.show()

## 10. Comparaison Finale de Tous les Mod√®les

In [None]:
# √âvaluer tous les mod√®les sur le test set
final_results = []

for model_name in trainer.trained_models.keys():
    y_pred_model = trainer.predict(model_name, X_test)
    metrics = evaluator.calculate_metrics(y_test, y_pred_model)
    
    final_results.append({
        'Model': model_name,
        'Accuracy': metrics['accuracy'],
        'Precision': metrics['precision'],
        'Recall': metrics['recall'],
        'F1-Score': metrics['f1_score']
    })

final_df = pd.DataFrame(final_results)
final_df = final_df.sort_values('Accuracy', ascending=False)

print("\nüìä R√©sultats Finaux sur le Test Set:")
print(final_df.to_string(index=False))

In [None]:
# Graphique de comparaison finale
fig, ax = plt.subplots(figsize=(14, 6))

x = np.arange(len(final_df))
width = 0.2

metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
colors = ['steelblue', 'coral', 'seagreen', 'purple']

for i, (metric, color) in enumerate(zip(metrics, colors)):
    offset = (i - 1.5) * width
    ax.bar(x + offset, final_df[metric], width, label=metric, color=color, alpha=0.8)

ax.set_xlabel('Mod√®le')
ax.set_ylabel('Score')
ax.set_title('Comparaison Finale des Mod√®les', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(final_df['Model'], rotation=45, ha='right')
ax.legend()
ax.set_ylim(0, 1)
ax.axhline(y=0.8, color='green', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

## 11. Sauvegarde du Meilleur Mod√®le

In [None]:
# Sauvegarder le meilleur mod√®le
save_path = trainer.save_model(trainer.best_model_name)
print(f"\nüíæ Mod√®le sauvegard√©: {save_path}")

In [None]:
# Sauvegarder les r√©sultats
final_df.to_csv(Config.REPORTS_DIR / 'model_comparison.csv', index=False)
print(f"üìä R√©sultats sauvegard√©s: {Config.REPORTS_DIR / 'model_comparison.csv'}")

## 12. Conclusions

### R√©sum√© des R√©sultats:

- **Meilleur mod√®le:** [Nom du mod√®le]
- **Accuracy:** XX%
- **Genres bien classifi√©s:** Classical, Metal, ...
- **Genres souvent confondus:** Rock/Metal, Jazz/Blues, ...

### Interpr√©tation:
- Les genres avec des caract√©ristiques acoustiques distinctes (classique, metal) sont mieux classifi√©s
- Les genres proches (rock/metal, jazz/blues) sont plus difficiles √† distinguer

### Pistes d'am√©lioration:
1. Utiliser plus de donn√©es
2. Essayer des r√©seaux de neurones profonds (CNN sur spectrogrammes)
3. Feature engineering suppl√©mentaire

In [None]:
print("\n‚úÖ Mod√©lisation termin√©e!")
print(f"\nüèÜ Meilleur mod√®le: {trainer.best_model_name}")
print("\nüìå Passez au notebook 04_evaluation.ipynb pour une analyse plus d√©taill√©e.")