# Pipeline de Ensemble por Modelos (SA → MA)

Este notebook detalha, passo a passo, o processo de ensemble em dois níveis:

- Single-Architecture (SA): ensemble por modelo, nos níveis Tile → Image → Patient.
- Multi-Architecture (MA): ensemble entre diferentes modelos, nos níveis Image → Tile → Patient.

Ao final, os artefatos (CSVs e métricas) são organizados em 
`outputs/tables` (tabelas geradas) e `outputs/results` (artefatos para consulta/apresentação).


## 1. Configuração Geral
Defina: modelos a incluir, tipo de ensemble e métrica de peso (quando weighted).
Os dados de entrada dos folds por arquitetura estão em `datas/summary_results_<modelo>/`.

In [1]:
import os, glob, json, shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc, precision_recall_curve
from modules import PerModelEnsembler, BetweenModelsEnsembler

# Modelos a serem incluídos no MA
MODELS = ['GGNet', 'MOBNet', 'SHFFNet', 'EFFNet', 'MNASNet']

# Tipo de ensemble
ENSEMBLE_TYPE = 'hard_voting'  # 'hard_voting' | 'soft_voting' | 'weighted'
ENSEMBLE_TYPES = ['hard_voting', 'soft_voting']
WEIGHT_METRIC = 'f1_macro'     # usado apenas quando ENSEMBLE_TYPE == 'weighted'

# Pastas principais
ROOT = os.getcwd()
DATAS_DIR = os.path.join(ROOT, 'datas')
OUTPUTS_DIR = os.path.join(ROOT, 'outputs')
TABLES_DIR = os.path.join(OUTPUTS_DIR, 'tables')
RESULTS_DIR = os.path.join(OUTPUTS_DIR, 'results')

os.makedirs(OUTPUTS_DIR, exist_ok=True)
os.makedirs(TABLES_DIR, exist_ok=True)
os.makedirs(RESULTS_DIR, exist_ok=True)

# Função utilitária para localizar CSVs de um modelo em datas/
MODEL_SUMMARY_DIRS = {
    'GGNet': 'summary_results_ggnet',
    'MOBNet': 'summary_results_mobnet',
    'SHFFNet': 'summary_results_shffnet',
    'EFFNet': 'summary_results_effnet',
    'MNASNet': 'summary_results_mnasnet',
}

def discover_model_csvs(model_name: str):
    sub = MODEL_SUMMARY_DIRS.get(model_name)
    paths = []
    if sub:
        base = os.path.join(DATAS_DIR, sub)
        paths = sorted(glob.glob(os.path.join(base, '*.csv')))
    if not paths:
        paths = sorted(glob.glob(os.path.join(DATAS_DIR, '**', f'*{model_name}*fold*results*.csv'), recursive=True))
    return paths

def type_folder_name(ensemble_type: str, weight_metric: str) -> str:
    return f'weighted_{weight_metric}' if ensemble_type == 'weighted' else ensemble_type


## 2. Descoberta dos CSVs de Folds (datas/)
Mostra quantos CSVs foram encontrados por arquitetura e um exemplo de arquivo.

In [2]:
for m in MODELS:
    csvs = discover_model_csvs(m)
    print(f'{m}: {len(csvs)} CSVs encontrados')
    if csvs:
        print('Exemplo:', os.path.basename(csvs[0]))
    else:
        print('Aviso: nenhum CSV encontrado para', m)
    print('-'*60)


GGNet: 10 CSVs encontrados
Exemplo: 0710-161214_GoogleNet_fold0_results.csv
------------------------------------------------------------
MOBNet: 10 CSVs encontrados
Exemplo: 0707-213643_MobileNetV2_fold0_results.csv
------------------------------------------------------------
SHFFNet: 10 CSVs encontrados
Exemplo: 0711-213955_ShuffleNetV2_fold0_results.csv
------------------------------------------------------------
EFFNet: 10 CSVs encontrados
Exemplo: 0903-102424_EFF-NET_fold0_results.csv
------------------------------------------------------------
MNASNet: 10 CSVs encontrados
Exemplo: 0808-163313_MNASNET_fold0_results.csv
------------------------------------------------------------


## 3. Ensemble por Modelo (SA): Tile → Image → Patient
Nesta etapa consolidamos os folds de um mesmo modelo em três níveis.
Os resultados são salvos em `outputs/tables/<Modelo>/Ensemble_<level>_<tipo>/`.
Depois, exportamos artefatos para `outputs/results/SA/<Modelo>/<tipo>/<Level>/`.

In [3]:
def export_sa_to_results(model: str, ensemble_type: str, weight_metric: str):
    tsub = f'Ensemble_tile_level_{ensemble_type}' if ensemble_type != 'weighted' else 'Ensemble_tile_level_weighted'
    isub = f'Ensemble_image_level_{ensemble_type}' if ensemble_type != 'weighted' else 'Ensemble_image_level_weighted'
    psub = f'Ensemble_patient_level_{ensemble_type}' if ensemble_type != 'weighted' else 'Ensemble_patient_level_weighted'
    type_folder = type_folder_name(ensemble_type, weight_metric)
    # Tile
    tile_dir = os.path.join(TABLES_DIR, model, tsub)
    tile_csv = os.path.join(tile_dir, f'ensemble_per_tile_{ensemble_type}.csv' if ensemble_type!='weighted' else f'ensemble_per_tile_weighted_{weight_metric}.csv')
    tile_met = os.path.join(tile_dir, f'global_metrics_tile_level_{ensemble_type}.json' if ensemble_type!='weighted' else f'global_metrics_tile_level_weighted_{weight_metric}.json')
    # Image
    img_dir = os.path.join(TABLES_DIR, model, isub)
    img_csv = os.path.join(img_dir, f'ensemble_per_image_{ensemble_type}.csv' if ensemble_type!='weighted' else f'ensemble_per_image_weighted_{weight_metric}.csv')
    img_met = os.path.join(img_dir, f'ensemble_global_metrics_image_level_{ensemble_type}.json' if ensemble_type!='weighted' else 'ensemble_global_metrics_image_level_weighted.json')
    # Patient
    pat_dir = os.path.join(TABLES_DIR, model, psub)
    pat_csv = os.path.join(pat_dir, f'ensemble_per_patient_{ensemble_type}.csv' if ensemble_type!='weighted' else f'ensemble_per_patient_weighted_{weight_metric}.csv')
    pat_met = os.path.join(pat_dir, f'global_metrics_patient_level_{ensemble_type}.json' if ensemble_type!='weighted' else f'global_metrics_patient_level_weighted_{weight_metric}.json')
    # Exporta
    levels = {
        'Tile': (tile_csv, tile_met),
        'Image': (img_csv, img_met),
        'Patient': (pat_csv, pat_met),
    }
    base = os.path.join(RESULTS_DIR, 'SA', model, type_folder)
    os.makedirs(base, exist_ok=True)
    for level, (csrc, msrc) in levels.items():
        if not (os.path.exists(csrc) and os.path.exists(msrc)):
            continue
        out_dir = os.path.join(base, level)
        os.makedirs(out_dir, exist_ok=True)
        shutil.copy2(csrc, os.path.join(out_dir, os.path.basename(csrc)))
        shutil.copy2(msrc, os.path.join(out_dir, os.path.basename(msrc)))
        # Matriz de confusão (opcional)
        try:
            with open(msrc, 'r') as f:
                m = json.load(f)
            cm = m.get('confusion_matrix')
            report = m.get('classification_report', {})
            labels = [k for k in report.keys() if k not in ('accuracy','macro avg','weighted avg')]
            if isinstance(cm, list) and len(labels)>0:
                arr = np.array(cm)
                fig, ax = plt.subplots(figsize=(6,6))
                im = ax.imshow(arr, cmap='Blues')
                ax.set_xticks(range(len(labels)))
                ax.set_yticks(range(len(labels)))
                ax.set_xticklabels(labels, rotation=45, ha='right')
                ax.set_yticklabels(labels)
                ax.set_xlabel('Predicted')
                ax.set_ylabel('True')
                fig.colorbar(im, ax=ax)
                for i in range(arr.shape[0]):
                    for j in range(arr.shape[1]):
                        ax.text(j, i, int(arr[i, j]), ha='center', va='center', color='black')
                plt.tight_layout()
                fig.savefig(os.path.join(out_dir, 'confusion_matrix.png'))
                plt.close(fig)
            # Classification report em PNG
            try:
                rows = []
                for k, v in report.items():
                    if k in ('accuracy','macro avg','weighted avg'):
                        continue
                    rows.append([k, v.get('precision',0), v.get('recall',0), v.get('f1-score',0), v.get('support',0)])
                if rows:
                    dfrep = pd.DataFrame(rows, columns=['class','precision','recall','f1-score','support'])
                    figr, axr = plt.subplots(figsize=(8, max(4, 0.4*len(dfrep)+1)))
                    axr.axis('off')
                    tbl = axr.table(cellText=dfrep.values, colLabels=dfrep.columns, loc='center')
                    tbl.auto_set_font_size(False)
                    tbl.set_fontsize(8)
                    tbl.scale(1, 1.2)
                    axr.set_title('Classification Report', pad=10)
                    plt.tight_layout()
                    figr.savefig(os.path.join(out_dir, 'classification_report.png'))
                    plt.close(figr)
            except Exception:
                pass
            # Curvas ROC/PR por classe
            try:
                import ast
                dfcsv = pd.read_csv(csrc)
                prob_cols = ['final_probs','mean_probs_per_class','mean_probs_per_class_image','mean_probs_per_class_tile','probabilities']
                pcol = next((c for c in prob_cols if c in dfcsv.columns), None)
                if pcol:
                    dfcsv[pcol] = dfcsv[pcol].apply(lambda v: v if isinstance(v, dict) else (ast.literal_eval(str(v)) if pd.notna(v) else {}))
                    labels_list = [k for k in report.keys() if k not in ('accuracy','macro avg','weighted avg')]
                    for cls in labels_list:
                        try:
                            y_true = (dfcsv['true_label'] == cls).astype(int)
                            y_score = dfcsv[pcol].apply(lambda d: float(d.get(cls, 0.0)) if isinstance(d, dict) else 0.0).astype(float)
                            if y_true.sum() == 0 or y_true.nunique() < 2:
                                continue
                            fpr, tpr, _ = roc_curve(y_true, y_score)
                            roc_auc = auc(fpr, tpr)
                            figc, axc = plt.subplots(figsize=(6,5))
                            axc.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
                            axc.plot([0,1], [0,1], color='navy', lw=1, linestyle='--')
                            axc.set_xlim([0.0, 1.0])
                            axc.set_ylim([0.0, 1.05])
                            axc.set_xlabel('False Positive Rate')
                            axc.set_ylabel('True Positive Rate')
                            axc.set_title(f'ROC - Classe {cls}')
                            axc.legend(loc='lower right')
                            plt.tight_layout()
                            figc.savefig(os.path.join(out_dir, f'roc_{cls}.png'))
                            plt.close(figc)
                        except Exception:
                            continue
            except Exception:
                pass
        except Exception:
            pass

for ensemble_type in ENSEMBLE_TYPES:
    for model in MODELS:
        csvs = discover_model_csvs(model)
        if not csvs:
            print('[AVISO] Sem CSVs para', model)
            continue
        print(f'[SA] Rodando ensemble para {model} ({ensemble_type})')
        per = PerModelEnsembler(model_name=model, ensemble_type=ensemble_type, weight_metric=WEIGHT_METRIC)
        tile_csv, tile_met = per.run_tile_level(csvs)
        print('Tile salvo em:', tile_csv)
        img_csv, img_met = per.run_image_level(csvs)
        print('Image salvo em:', img_csv)
        pat_csv, pat_met = per.run_patient_level(csvs)
        print('Patient salvo em:', pat_csv)
        export_sa_to_results(model, ensemble_type, WEIGHT_METRIC)


[SA] Rodando ensemble para GGNet (hard_voting)
Tile salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\GGNet\Ensemble_tile_level_hard_voting\ensemble_per_tile_hard_voting.csv
Image salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\GGNet\Ensemble_image_level_hard_voting\ensemble_per_image_hard_voting.csv
Patient salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\GGNet\Ensemble_patient_level_hard_voting\ensemble_per_patient_hard_voting.csv
[SA] Rodando ensemble para MOBNet (hard_voting)
Tile salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\MOBNet\Ensemble_tile_level_hard_voting\ensemble_per_tile_hard_voting.csv
Image salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\MOBNet\Ensemble_image_level_hard_voting\ensemble_per_image_hard_voting.csv
Patient salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\MOBNet\En

## 4. Ensemble Entre Modelos (MA)
Combina as saídas SA dos modelos selecionados.
Os resultados são salvos em `outputs/tables/Ensemble_Between_Models/...` 
e exportados para `outputs/results/MA/<tipo>/<Level>/`.

In [4]:
def export_ma_to_results(level: str, csv_path: str, metrics_path: str, ensemble_type: str, weight_metric: str):
    tfolder = type_folder_name(ensemble_type, weight_metric)
    out_dir = os.path.join(RESULTS_DIR, 'MA', tfolder, level)
    os.makedirs(out_dir, exist_ok=True)
    if os.path.exists(csv_path):
        shutil.copy2(csv_path, os.path.join(out_dir, os.path.basename(csv_path)))
    if os.path.exists(metrics_path):
        shutil.copy2(metrics_path, os.path.join(out_dir, os.path.basename(metrics_path)))
        try:
            with open(metrics_path, 'r') as f:
                m = json.load(f)
            cm = m.get('confusion_matrix')
            report = m.get('classification_report', {})
            labels = [k for k in report.keys() if k not in ('accuracy','macro avg','weighted avg')]
            if isinstance(cm, list) and len(labels)>0:
                arr = np.array(cm)
                fig, ax = plt.subplots(figsize=(6,6))
                im = ax.imshow(arr, cmap='Blues')
                ax.set_xticks(range(len(labels)))
                ax.set_yticks(range(len(labels)))
                ax.set_xticklabels(labels, rotation=45, ha='right')
                ax.set_yticklabels(labels)
                ax.set_xlabel('Predicted')
                ax.set_ylabel('True')
                fig.colorbar(im, ax=ax)
                for i in range(arr.shape[0]):
                    for j in range(arr.shape[1]):
                        ax.text(j, i, int(arr[i, j]), ha='center', va='center', color='black')
                plt.tight_layout()
                fig.savefig(os.path.join(out_dir, 'confusion_matrix.png'))
                plt.close(fig)
        except Exception:
            pass

for ensemble_type in ENSEMBLE_TYPES:
    print(f'[MA] Entre Modelos ({ensemble_type})')
    between = BetweenModelsEnsembler(
        base_models_parent_directory=TABLES_DIR,
        ensemble_save_output_base=os.path.join(TABLES_DIR, 'Ensemble_Between_Models'),
        models_to_include=MODELS,
        ensemble_type=ensemble_type,
        weight_metric=WEIGHT_METRIC,
    )
    img_csv, img_met = between.run_image_level()
    print('MA Image salvo em:', img_csv)
    export_ma_to_results('Image', img_csv, img_met, ensemble_type, WEIGHT_METRIC)
    tile_csv, tile_met = between.run_tile_level()
    print('MA Tile salvo em:', tile_csv)
    export_ma_to_results('Tile', tile_csv, tile_met, ensemble_type, WEIGHT_METRIC)
    pat_csv, pat_met = between.run_patient_level()
    print('MA Patient salvo em:', pat_csv)
    export_ma_to_results('Patient', pat_csv, pat_met, ensemble_type, WEIGHT_METRIC)


[MA] Entre Modelos (hard_voting)
MA Image salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\Ensemble_Between_Models\ImageLevel_Ensemble_Models_hard_voting\ensemble_between_models_per_image_hard_voting.csv
MA Tile salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\Ensemble_Between_Models\TileLevel_Ensemble_Models_hard_voting\ensemble_between_models_tile_level_hard_voting.csv
MA Patient salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\Ensemble_Between_Models\PatientLevel_Ensemble_hard_voting\ensemble_between_models_per_patient_hard_voting.csv
[MA] Entre Modelos (soft_voting)
MA Image salvo em: c:\Users\rodri\OneDrive - UFPE\Área de Trabalho\meu cod\outputs\tables\Ensemble_Between_Models\ImageLevel_Ensemble_Models_soft_voting\ensemble_between_models_per_image_soft_voting.csv


KeyboardInterrupt: 

## 5. Visualização e Avaliação de Métricas
Carregue os JSONs de métricas e gere gráficos (matriz de confusão, ROC, etc.).
Nesta versão, demonstramos leitura de métricas e impressão das principais.

In [None]:
def load_metrics(path):
    with open(path, 'r') as f:
        return json.load(f)

# Exemplo: percorre resultados MA Image e imprime métricas principais
tfolder = type_folder_name(ENSEMBLE_TYPE, WEIGHT_METRIC)
ma_img_dir = os.path.join(RESULTS_DIR, 'MA', tfolder, 'Image')
metrics_files = [p for p in glob.glob(os.path.join(ma_img_dir, '*.json'))]
for mf in metrics_files:
    m = load_metrics(mf)
    print('Arquivo:', os.path.basename(mf))
    print('Accuracy:', m.get('accuracy'))
    print('F1_macro:', m.get('f1_macro'))
    print('Recall_weighted:', m.get('recall_weighted'))
    print('ROC_AUC_OVR:', m.get('roc_auc_ovr'))


## 6. Observações e Próximos Passos
- A estrutura `datas/summary_results_<modelo>/` é usada como fonte oficial dos folds.
- As tabelas geradas (SA e MA) são salvas em `outputs/tables/`.
- Artefatos para consulta (CSVs, métricas, gráficos) são espelhados em `outputs/results/`.
- Próximos passos sugeridos: adicionar gráficos ROC/PR por classe e um índice consolidado dos resultados.