# Systeme d'Aide a la Decision (SAD) - Version Amelioree
## Diagnostic de tumeurs cerebrales (ALIF83 2025-2026)

Cette version vise une meilleure performance clinique:
- features sklearn compactes (texture + gradients + statistiques)
- pipelines RegLog/MLP avec normalisation + PCA
- calibration selectionnee automatiquement (sigmoid vs isotonic)
- CNN renforce (class weights, scheduler, early stopping)
- selection du meilleur modele selon criteres metier
        

## 0. Objectif de l'iteration
L'objectif n'est pas seulement d'augmenter l'accuracy globale, mais d'ameliorer:
1. l'accuracy a haute confiance (>= 0.85),
2. la reduction des faux negatifs tumoraux,
3. la coherence metier (cout, couverture, triage).
        

## 1. Setup

In [4]:
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

SEED = 42
np.random.seed(SEED)

# Mode execution
# FAST_MODE=True -> iteration rapide
# FAST_MODE=False -> run long / plus puissant
FAST_MODE = False
RUN_CNN = True

PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
sys.path.append(str(PROJECT_ROOT))

from src.preprocessing import (
    get_default_config,
    list_images_by_class,
    compute_class_counts,
    build_sklearn_dataset,
    stratified_train_calibration_split,
    build_torch_dataloaders_from_imagefolder,
)
from src.models import (
    train_logistic_regression,
    train_mlp_classifier,
    build_cnn_torch,
    train_cnn_classifier,
    predict_cnn_logits,
    compute_class_weights_from_loader,
)
from src.calibration import (
    calibrate_with_best_method,
    apply_temperature,
    temperature_scaling_fit,
    expected_calibration_error,
)
from src.decision_engine import (
    DecisionThresholds,
    predire_avec_confiance,
    generer_recommandation,
)
from src.evaluation import (
    compute_business_metrics,
    accuracy_by_confidence_bands,
    evaluate_high_confidence_operating_points,
)
from src.reporting import (
    creer_rapport_decision,
    save_reports_to_file,
)
from src.uncertainty import calculer_incertitude_mc_dropout

cfg = get_default_config(PROJECT_ROOT)
thresholds = DecisionThresholds()

print("PROJECT_ROOT:", PROJECT_ROOT)
print("TRAIN_DIR:", cfg.train_dir)
print("TEST_DIR:", cfg.test_dir)
print("CLASSES:", cfg.class_names)
print("FAST_MODE:", FAST_MODE, "RUN_CNN:", RUN_CNN)

PROJECT_ROOT: /Users/aissa/Projet_SAD
TRAIN_DIR: /Users/aissa/Projet_SAD/data/Training
TEST_DIR: /Users/aissa/Projet_SAD/data/Testing
CLASSES: ('glioma', 'meningioma', 'notumor', 'pituitary')
FAST_MODE: False RUN_CNN: True


## 2. Verification des donnees

In [6]:
train_imgs = list_images_by_class(cfg.train_dir, cfg.class_names)
test_imgs = list_images_by_class(cfg.test_dir, cfg.class_names)

train_counts = compute_class_counts(train_imgs)
test_counts = compute_class_counts(test_imgs)

print("Train counts:", train_counts)
print("Test counts:", test_counts)

if any(v == 0 for v in train_counts.values()) or any(v == 0 for v in test_counts.values()):
    raise RuntimeError("Dataset incomplet. Verifie data/Training et data/Testing.")
        

Train counts: {'glioma': 1321, 'meningioma': 1339, 'notumor': 1595, 'pituitary': 1457}
Test counts: {'glioma': 300, 'meningioma': 306, 'notumor': 405, 'pituitary': 300}


## 3. Jeux de donnees sklearn (features compactes)

In [8]:
if FAST_MODE:
    N_TRAIN_PER_CLASS = 180
    COMPACT_SIZE = (64, 64)
else:
    N_TRAIN_PER_CLASS = None  # Utilise toutes les images disponibles
    COMPACT_SIZE = (96, 96)

N_TEST_PER_CLASS = None

X_train, y_train, train_paths = build_sklearn_dataset(
    train_imgs,
    cfg.class_names,
    image_size=cfg.image_size,
    n_per_class=N_TRAIN_PER_CLASS,
    feature_mode="compact",
    compact_size=COMPACT_SIZE,
    random_state=SEED,
)

X_test, y_test, test_paths = build_sklearn_dataset(
    test_imgs,
    cfg.class_names,
    image_size=cfg.image_size,
    n_per_class=N_TEST_PER_CLASS,
    feature_mode="compact",
    compact_size=COMPACT_SIZE,
    random_state=SEED,
)

print("X_train:", X_train.shape, "y_train:", y_train.shape)
print("X_test:", X_test.shape, "y_test:", y_test.shape)

X_fit, X_cal, y_fit, y_cal = stratified_train_calibration_split(
    X_train,
    y_train,
    test_size=0.2,
    random_state=SEED,
)
print("X_fit:", X_fit.shape, "X_cal:", X_cal.shape)

X_train: (5712, 18443) y_train: (5712,)
X_test: (1311, 18443) y_test: (1311,)
X_fit: (4569, 18443) X_cal: (1143, 18443)


## 4. Fonctions utilitaires d'evaluation

In [10]:
def summarize_candidate(name, probabilities, y_true, class_names):
    y_pred = probabilities.argmax(axis=1)
    conf = probabilities.max(axis=1)

    business = compute_business_metrics(
        y_true=y_true,
        y_pred=y_pred,
        conf=conf,
        thr_high=thresholds.high,
        notumor_index=list(class_names).index("notumor"),
    )
    bands = accuracy_by_confidence_bands(y_true, y_pred, conf)
    ops = evaluate_high_confidence_operating_points(
        y_true,
        y_pred,
        conf,
        thresholds=(0.85, 0.88, 0.90, 0.92, 0.95),
    )

    return {
        "name": name,
        "probabilities": probabilities,
        "y_true": y_true,
        "y_pred": y_pred,
        "conf": conf,
        "business": business,
        "bands": bands,
        "ops": ops,
        "class_names": tuple(class_names),
    }


def print_candidate_summary(summary):
    b = summary["business"]
    print(f"=== {summary['name']} ===")
    print(f"Accuracy globale: {(summary['y_pred'] == summary['y_true']).mean():.4f}")
    print("Business:", b)
    print("Accuracy par tranches:")
    for k, v in summary["bands"].items():
        print(f"  {k}: {v:.4f}" if not np.isnan(v) else f"  {k}: nan")
    print("Operating points:")
    for thr, op in summary["ops"].items():
        acc_txt = f"{op.accuracy:.4f}" if not np.isnan(op.accuracy) else "nan"
        print(f"  thr={thr:.2f} -> coverage={op.coverage:.4f}, accuracy={acc_txt}")
        

## 5. Modele A - RegLog amelioree + calibration auto

In [12]:
reglog_result = train_logistic_regression(
    X_fit,
    y_fit,
    max_iter=2500,
    random_state=SEED,
    class_weight="balanced",
    c_value=2.0,
    pca_components=256,
)
reglog_base = reglog_result.model

calibrated_reglog, reglog_calib_eval, reglog_method = calibrate_with_best_method(
    reglog_base,
    X_cal,
    y_cal,
    methods=("sigmoid", "isotonic"),
)

proba_reglog = calibrated_reglog.predict_proba(X_test)
reglog_summary = summarize_candidate("RegLog+PCA+Calibration", proba_reglog, y_test, cfg.class_names)

print("Calibration choisie RegLog:", reglog_method)
print("Calibration metrics:", reglog_calib_eval)
print_candidate_summary(reglog_summary)
        

Calibration choisie RegLog: sigmoid
Calibration metrics: {'sigmoid': {'nll': 0.6345742813409491, 'ece': 0.16311727915947316}, 'isotonic': {'nll': 0.7508039601809385, 'ece': 0.11726677942192396}}
=== RegLog+PCA+Calibration ===
Accuracy globale: 0.7246
Business: BusinessMetrics(auto_coverage=0.007627765064836003, acc_high_conf=1.0, high_conf_count=10, cost_total=139150.0, fn=68, fp=61, revisions=1301)
Accuracy par tranches:
  [0.00,0.50): 0.4826
  [0.50,0.65): 0.6905
  [0.65,0.85): 0.8766
  [0.85,1.01): 1.0000
Operating points:
  thr=0.85 -> coverage=0.0076, accuracy=1.0000
  thr=0.88 -> coverage=0.0000, accuracy=nan
  thr=0.90 -> coverage=0.0000, accuracy=nan
  thr=0.92 -> coverage=0.0000, accuracy=nan
  thr=0.95 -> coverage=0.0000, accuracy=nan


## 6. Modele B - MLP ameliore + calibration auto

In [14]:
mlp_result = train_mlp_classifier(
    X_fit,
    y_fit,
    hidden_layers=(256, 128) if FAST_MODE else (512, 256),
    max_iter=90 if FAST_MODE else 320,
    random_state=SEED,
    alpha=1e-4,
    pca_components=192 if FAST_MODE else 420,
)
mlp_base = mlp_result.model

calibrated_mlp, mlp_calib_eval, mlp_method = calibrate_with_best_method(
    mlp_base,
    X_cal,
    y_cal,
    methods=("sigmoid", "isotonic"),
)

proba_mlp = calibrated_mlp.predict_proba(X_test)
mlp_summary = summarize_candidate("MLP+PCA+Calibration", proba_mlp, y_test, cfg.class_names)

print("Calibration choisie MLP:", mlp_method)
print("Calibration metrics:", mlp_calib_eval)
print_candidate_summary(mlp_summary)

Calibration choisie MLP: isotonic
Calibration metrics: {'sigmoid': {'nll': 0.2742280499361426, 'ece': 0.04212197754803615}, 'isotonic': {'nll': 0.22818912087708457, 'ece': 0.012541374183174872}}
=== MLP+PCA+Calibration ===
Accuracy globale: 0.9291
Business: BusinessMetrics(auto_coverage=0.8497330282227308, acc_high_conf=0.9694793536804309, high_conf_count=1114, cost_total=33050.0, fn=23, fp=2, revisions=197)
Accuracy par tranches:
  [0.00,0.50): 0.6000
  [0.50,0.65): 0.5493
  [0.65,0.85): 0.8017
  [0.85,1.01): 0.9695
Operating points:
  thr=0.85 -> coverage=0.8497, accuracy=0.9695
  thr=0.88 -> coverage=0.8215, accuracy=0.9712
  thr=0.90 -> coverage=0.8162, accuracy=0.9710
  thr=0.92 -> coverage=0.7788, accuracy=0.9765
  thr=0.95 -> coverage=0.7239, accuracy=0.9747


## 7. Modele C - CNN renforce + temperature scaling

In [16]:
cnn_available = RUN_CNN
cnn_summary = None
cnn_test_paths = None

if RUN_CNN:
    try:
        import torch

        if torch.cuda.is_available():
            DEVICE = "cuda"
        elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
            DEVICE = "mps"
        else:
            DEVICE = "cpu"
        print("CNN device:", DEVICE)
    except ImportError:
        cnn_available = False
        print("PyTorch non installe. Installe torch/torchvision pour executer cette section.")

if cnn_available:
    train_loader, val_loader, test_loader, cnn_classes, train_class_counts = build_torch_dataloaders_from_imagefolder(
        cfg.train_dir,
        cfg.test_dir,
        image_size=(128, 128) if FAST_MODE else cfg.image_size,
        batch_size=12 if FAST_MODE else 16,
        val_split=0.2,
        strong_augmentation=False if FAST_MODE else True,
        normalize_with_imagenet_stats=True,
        return_class_counts=True,
        random_state=SEED,
    )

    class_weights = compute_class_weights_from_loader(train_loader, num_classes=len(cnn_classes))

    cnn_model = build_cnn_torch(
        num_classes=len(cnn_classes),
        dropout_p=0.35,
        backbone="custom" if FAST_MODE else "resnet18",
        pretrained=not FAST_MODE,
    )
    backbone_used = "custom" if FAST_MODE else "resnet18"

    cnn_result = train_cnn_classifier(
        cnn_model,
        train_loader,
        val_loader,
        epochs=4 if FAST_MODE else 18,
        lr=1e-3 if FAST_MODE else 8e-4,
        weight_decay=1e-4,
        class_weights=class_weights,
        label_smoothing=0.05,
        early_stopping_patience=2 if FAST_MODE else 6,
        grad_clip_norm=1.0,
        device=DEVICE,
    )

    # Temperature scaling sur le VALIDATION set (pas test) pour eviter le data leak
    logits_val, y_val_cnn = predict_cnn_logits(cnn_result.model, val_loader, device=DEVICE)
    cnn_temperature = temperature_scaling_fit(logits_val, y_val_cnn)

    # Evaluation sur le test set avec la temperature calibree sur val
    logits_test, y_test_cnn = predict_cnn_logits(cnn_result.model, test_loader, device=DEVICE)
    probs_before = apply_temperature(logits_test, 1.0)
    probs_after = apply_temperature(logits_test, cnn_temperature)

    ece_before = expected_calibration_error(probs_before, y_test_cnn)
    ece_after = expected_calibration_error(probs_after, y_test_cnn)

    cnn_summary = summarize_candidate(
        f"CNN-{backbone_used}+TempScaling",
        probs_after,
        y_test_cnn,
        cnn_classes,
    )

    sample_batch, _ = next(iter(test_loader))
    model_device = next(cnn_result.model.parameters()).device
    mc_unc = calculer_incertitude_mc_dropout(
        sample_batch[0:1].to(model_device),
        cnn_result.model,
        n_iter=8 if FAST_MODE else 20,
    )

    print("Backbone:", backbone_used)
    print("Pretrained:", not FAST_MODE)
    print("Train class counts:", train_class_counts)
    print("Class weights:", class_weights)
    print("Train metrics:", cnn_result.metrics)
    print(f"ECE avant: {ece_before:.4f} | ECE apres: {ece_after:.4f}")
    print("Temperature optimale:", cnn_temperature)
    print("MC Dropout sample:", mc_unc)
    print_candidate_summary(cnn_summary)

    cnn_test_paths = [path for path, _ in test_loader.dataset.samples]
else:
    print("CNN ignore (RUN_CNN=False ou torch absent).")

CNN device: mps
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/aissa/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100.0%


Backbone: resnet18
Pretrained: True
Train class counts: {'glioma': 1057, 'meningioma': 1071, 'notumor': 1276, 'pituitary': 1165}
Class weights: [1.0806528 1.0665267 0.8951802 0.9804721]
Train metrics: {'best_val_accuracy': 0.989501312335958, 'best_epoch': 18.0, 'final_train_accuracy': 0.9658568614576494, 'final_val_accuracy': 0.989501312335958}
ECE avant: 0.0424 | ECE apres: 0.0053
Temperature optimale: 0.6425399316141461
MC Dropout sample: {'entropy_predictive': 0.20162886381149292, 'entropy_expected': 0.19613900780677795, 'mutual_information': 0.005489856004714966, 'mean_max_prob': 0.9623433947563171, 'std_max_prob': 0.015193751081824303}
=== CNN-resnet18+TempScaling ===
Accuracy globale: 0.9931
Business: BusinessMetrics(auto_coverage=0.9931350114416476, acc_high_conf=0.9969278033794163, high_conf_count=1302, cost_total=650.0, fn=0, fp=2, revisions=9)
Accuracy par tranches:
  [0.00,0.50): nan
  [0.50,0.65): 1.0000
  [0.65,0.85): 0.2857
  [0.85,1.01): 0.9969
Operating points:
  thr=0.

## 8. Comparaison et selection du meilleur candidat

In [18]:
candidates = [reglog_summary, mlp_summary]
path_by_name = {
    reglog_summary["name"]: test_paths,
    mlp_summary["name"]: test_paths,
}
if cnn_summary is not None:
    candidates.append(cnn_summary)
    path_by_name[cnn_summary["name"]] = cnn_test_paths

rows = []
for c in candidates:
    b = c["business"]
    rows.append(
        {
            "modele": c["name"],
            "accuracy": float((c["y_pred"] == c["y_true"]).mean()),
            "auto_coverage": b.auto_coverage,
            "acc_high_conf": b.acc_high_conf,
            "fn": b.fn,
            "fp": b.fp,
            "cost_total": b.cost_total,
        }
    )

comparison_df = pd.DataFrame(rows).sort_values(
    by=["fn", "acc_high_conf", "cost_total"],
    ascending=[True, False, True],
)
comparison_df
        

Unnamed: 0,modele,accuracy,auto_coverage,acc_high_conf,fn,fp,cost_total
2,CNN-resnet18+TempScaling,0.993135,0.993135,0.996928,0,2,650.0
1,MLP+PCA+Calibration,0.929062,0.849733,0.969479,23,2,33050.0
0,RegLog+PCA+Calibration,0.724638,0.007628,1.0,68,61,139150.0


In [19]:
best_model_name = comparison_df.iloc[0]["modele"]
best_summary = next(c for c in candidates if c["name"] == best_model_name)

print("Modele retenu (critere metier):", best_model_name)
print("Critere projet (>95% accuracy quand conf>0.85):")
acc_085 = best_summary["ops"][0.85].accuracy
if np.isnan(acc_085):
    print("  Aucun cas a conf >= 0.85")
else:
    print(f"  Accuracy@0.85 = {acc_085:.4f}")
    print("  ATTEINT" if acc_085 >= 0.95 else "  NON ATTEINT")
        

Modele retenu (critere metier): CNN-resnet18+TempScaling
Critere projet (>95% accuracy quand conf>0.85):
  Accuracy@0.85 = 0.9969
  ATTEINT


## 9. Evaluation detaillee du modele retenu

In [21]:
y_true_best = best_summary["y_true"]
y_pred_best = best_summary["y_pred"]
class_names_best = best_summary["class_names"]

print(classification_report(y_true_best, y_pred_best, target_names=class_names_best, digits=4))

cm = confusion_matrix(y_true_best, y_pred_best)
cm_df = pd.DataFrame(
    cm,
    index=[f"true_{c}" for c in class_names_best],
    columns=[f"pred_{c}" for c in class_names_best],
)
cm_df
        

              precision    recall  f1-score   support

      glioma     0.9933    0.9933    0.9933       300
  meningioma     0.9776    0.9967    0.9871       306
     notumor     1.0000    0.9951    0.9975       405
   pituitary     1.0000    0.9867    0.9933       300

    accuracy                         0.9931      1311
   macro avg     0.9927    0.9929    0.9928      1311
weighted avg     0.9932    0.9931    0.9932      1311



Unnamed: 0,pred_glioma,pred_meningioma,pred_notumor,pred_pituitary
true_glioma,298,2,0,0
true_meningioma,1,305,0,0
true_notumor,0,2,403,0
true_pituitary,1,3,0,296


## 10. Generation de 20 rapports avec le meilleur modele

In [23]:
best_prob = best_summary["probabilities"]
class_names_best = list(best_summary["class_names"])

n_reports = min(20, len(best_prob))
reports = []
for i in range(n_reports):
    scores = {class_names_best[j]: float(best_prob[i, j]) for j in range(len(class_names_best))}
    decision = generer_recommandation(scores, thresholds)
    report = creer_rapport_decision(
        patient_id=f"P_{i+1:05d}",
        scores_by_class=scores,
        decision=decision,
    )
    reports.append(report)

out_path = PROJECT_ROOT / "reports" / "sample_reports.txt"
save_reports_to_file(reports, out_path)

print(f"{len(reports)} rapports sauvegardes dans: {out_path}")
print("\n--- Exemple rapport 1 ---\n")
print(reports[0])


20 rapports sauvegardes dans: /Users/aissa/Projet_SAD/reports/sample_reports.txt

--- Exemple rapport 1 ---

RAPPORT AUTOMATISE
RAPPORT D'AIDE A LA DECISION
Patient ID: P_00001 Date: 14/02/2026

PREDICTION PRINCIPALE
---------------------
Classe: glioma
Confiance: 99.8%
Niveau de certitude: ELEVE [OK]

SCORES PAR CLASSE
-----------------
- glioma: 99.8%
- meningioma: 0.1%
- pituitary: 0.1%
- notumor: 0.1%

RECOMMANDATIONS CLINIQUES
--------------------------
Diagnostic: Diagnostic automatique valide
Action: Rapport envoye au medecin traitant
Priorite: [!] URGENT - Prise en charge sous 12h
Revision humaine: Optionnelle (validation finale)

ELEMENTS D'ATTENTION
---------------------
- Suspicion tumeur maligne


In [36]:
import numpy as np
import pandas as pd

checks = []

checks.append(("RegLog évalué", "reglog_summary" in globals()))
checks.append(("MLP évalué", "mlp_summary" in globals()))
checks.append(("CNN évalué", ("cnn_summary" in globals()) and (cnn_summary is not None)))

checks.append(("Comparaison modèles", "comparison_df" in globals()))
checks.append(("Modèle retenu", "best_summary" in globals()))

reports_path = PROJECT_ROOT / "reports" / "sample_reports.txt"
reports_ok = reports_path.exists()
reports_count = 0
if reports_ok:
    reports_count = reports_path.read_text(encoding="utf-8").count("Patient ID:")
checks.append(("20 rapports générés", reports_ok and reports_count >= 20))

crit_ok = False
acc085 = np.nan
if "best_summary" in globals() and "ops" in best_summary and 0.85 in best_summary["ops"]:
    acc085 = best_summary["ops"][0.85].accuracy
    crit_ok = (not np.isnan(acc085)) and (acc085 >= 0.95)
checks.append(("Critère prof: Accuracy@0.85 >= 0.95", crit_ok))

audit_df = pd.DataFrame(checks, columns=["Exigence", "OK"])
display(audit_df)

print(f"Reports count: {reports_count}")
print(f"Accuracy@0.85: {acc085:.4f}" if not np.isnan(acc085) else "Accuracy@0.85: nan")

Unnamed: 0,Exigence,OK
0,RegLog évalué,True
1,MLP évalué,True
2,CNN évalué,True
3,Comparaison modèles,True
4,Modèle retenu,True
5,20 rapports générés,True
6,Critère prof: Accuracy@0.85 >= 0.95,True


Reports count: 20
Accuracy@0.85: 0.9969


## 11. Analyse critique et ethique

### 11.1 Checklist pipeline
- RegLog evalue: **True**
- MLP evalue: **True**
- CNN evalue: **True**
- Comparaison modeles: **True**
- Modele retenu: **True**
- 20 rapports generes: **True**
- Critere prof (`Accuracy@0.85 >= 0.95`): voir section 8

### 11.2 Interpretation
Le pipeline SAD repond au cahier des charges dans cette configuration longue
(entrainement renforce + calibration + moteur de decision).
Les metriques de confiance sont calculees sans data leak (temperature
calibree sur validation, evaluee sur test).

### 11.3 Limites a documenter
- Les performances dependent du split et des hyperparametres (variabilite possible).
- La validation clinique reelle necessite une evaluation externe et une supervision medicale.
- Le SAD reste un outil d'assistance, non un remplacement du radiologue.
- Les modeles sklearn (RegLog, MLP) sont entraines sur des features compactes et ne peuvent pas rivaliser avec un CNN sur des images.