(I only used Gemini to translate the original Korean text.)

This experiment was conducted based on the groundwork laid by the Professor's research, specifically referencing the paper **'Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification'** (hereafter referred to as the Target Paper).

Drawing inspiration from the Target Paper's methodology, we applied the **Machine Learning (ML) Ensemble** approach to the AlexNet architecture. Unlike the Target Paper, which used various Vision Transformer (ViT) models, this experiment utilized **features extracted from different layers of AlexNet**. Specifically, features were extracted from the **last Fully Connected (FC) layer** (the one preceding the final $4096 \to \text{num\_class}$ output layer) and the **last Convolutional Neural Network (CNN) layer**.

Initially, we had intended to experiment with a wide array of ML classifiers and optimize each model using a comprehensive range of hyperparameters.

However, due to the **limited timeframe of only one to two weeks** and the **concurrent exam period**, it was challenging to dedicate sufficient time to the research. Consequently, we regret that we were unable to conduct thorough analysis of prior work, explore diverse architectural ideas, or perform extensive experiments.

The resulting performance was **lower than the SOTA model** found in the Proof-of-Concept (POC) dataset from the Professor's related work, **'Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis'**. Had more time been available, we believe we could have conducted more meaningful and impactful experiments.

The **best performance** was achieved by the version using the **AlexNet + BN + CBAM** architecture with **10-Crop** data input, and employing **MLP, KNN, & SVM** as the classifiers in the final ensemble. The **test accuracy** achieved in this setup was **0.8623**. This corresponds to the experiment shown in the final block of this Jupyter Notebook (`ipynb`).

In [7]:
import yaml
import torch
import numpy as np
from tqdm import tqdm
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
from src.data_loader import create_dataloader
from src.models import create_model


def train_and_evaluate_ml_model(X_train, y_train, X_test, y_test, model_name):
    model_map = {
        'mlp': MLPClassifier(random_state=42, max_iter=1000, 
                             activation='relu', solver='adam', 
                             hidden_layer_sizes=(100, 100, 50), learning_rate_init=0.001),
                             
        'knn': KNeighborsClassifier(n_jobs=4, n_neighbors=7, 
                                    weights='distance', algorithm='auto'),
                                     
        'svm_rbf': SVC(probability=True, random_state=42, kernel='rbf', 
                       C=1.0, gamma='auto') 
    }

    model = model_map[model_name]
    
    N_train, N_crops_train, D = X_train.shape
    X_train_flat = X_train.reshape(-1, D)
    y_train_flat = np.repeat(y_train, N_crops_train)

    print(f"\n--- Training and Evaluating {model_name.upper()} ---")
    
    model.fit(X_train_flat, y_train_flat)
    cv_scores = cross_val_score(model, X_train_flat, y_train_flat, cv=5, scoring='accuracy', n_jobs=-1)
    val_accuracy = np.mean(cv_scores)
    
    print(f"\n[RESULTS] Single {model_name.upper()} model:")
    print(f"  - Mean 5-fold CV Accuracy: {val_accuracy:.4f}")
    
    N_test, N_crops_test, _ = X_test.shape
    X_test_flat = X_test.reshape(-1, D)
    
    probas_flat = model.predict_proba(X_test_flat)
    
    Num_classes = probas_flat.shape[-1]
    probas_reshaped = probas_flat.reshape(N_test, N_crops_test, Num_classes)
    
    
    probas_mean = probas_reshaped.mean(axis=1)
    y_pred = np.argmax(probas_mean, axis=1)
    
    print("  - Test Set Performance:")
    print(classification_report(y_test, y_pred, digits=4))
    
    return {
        'name': model_name,
        'estimator': model,
        'val_score': val_accuracy,
        'probas_mean': probas_mean
    }

def run_ml_experiment_flow(X_train, y_train, X_test, y_test):
    model_names = ['mlp', 'knn', 'svm_rbf'] 
    results = []
    
    for name in model_names:
        result = train_and_evaluate_ml_model(X_train, y_train, X_test, y_test, name)
        results.append(result)

    weights = []
    for r in results:
        weights.append(r['val_score'])
    
    weights = np.array(weights) / np.sum(weights)
    print(f"Normalized weights: {[f'{w:.4f}' for w in weights]}")

    final_probas = np.zeros(results[0]['probas_mean'].shape)

    for i, r in enumerate(results):
        probas_mean = r['probas_mean']
        final_probas += weights[i] * probas_mean
    
    final_predictions = np.argmax(final_probas, axis=1)

    print("\n--- Final Ensemble Test Set Performance ---")
    print(classification_report(y_test, final_predictions, digits=4))
    
    print(f"\nTest Accuracy: {(y_test == final_predictions).sum().item() / len(y_test)}")

def extract_features(model, dataloader, mode, device):
    model.eval()
    features_list, labels_list = [], []
    with torch.no_grad():
        pbar = tqdm(dataloader, desc=f"Extracting features ({mode})")
        for inputs, labels in pbar:
            inputs = inputs.to(device)
            if len(inputs.shape) == 5:
                B, N_crops, C, H, W = inputs.shape
                inputs = inputs.view(-1, C, H, W) 
                features = model(inputs, mode=mode)
                Feature_Dim = features.shape[-1]
                features = features.view(B, N_crops, Feature_Dim) 
            else:
                features = model(inputs, mode=mode).unsqueeze(1) # (B, 1, Feature_Dim)
            features_list.append(features.cpu().numpy())
            labels_list.append(labels.cpu().numpy())
    return np.concatenate(features_list, axis=0), np.concatenate(labels_list, axis=0)

def classifier_ml(config_path):
    with open(config_path, 'r') as f:
        exp3_config = yaml.safe_load(f)

    feature_mode = exp3_config['feature_mode']
    yaml_name = exp3_config['feature_extraction_model']['yaml_name']
    best_model = exp3_config['feature_extraction_model']['pt_name']
    dl_config_path = f"configs/{yaml_name}.yaml"
    checkpoint_path = f"saved_models/{best_model}/best_model.pth"
    
    print(f"--- Starting Quick Experiment: Feature Mode '{feature_mode}' ---")
    print(f"Using DL model from run: '{best_model}'")

    with open(dl_config_path, 'r') as f:
        dl_config = yaml.safe_load(f)

    device = torch.device(dl_config['training'].get('device', 'cpu'))
    model = create_model(
        model_name=dl_config['model']['name'],
        num_classes=dl_config['data']['num_classes'],
        pretrained=dl_config['model']['pretrained'],
        **dl_config['model'].get('params', {})
    ).to(device)
    
    try:
        model.load_state_dict(torch.load(checkpoint_path, map_location=device))
        print(f"Successfully loaded checkpoint: {checkpoint_path}")
    except FileNotFoundError:
        print(f"ERROR: Checkpoint file not found at '{checkpoint_path}'")
        return

    train_loader, test_loader = create_dataloader(dl_config['data'])
    X_train, y_train = extract_features(model, train_loader, feature_mode, device)
    X_test, y_test = extract_features(model, test_loader, feature_mode, device)
    print(f"Features extracted. Train shape: {X_train.shape}, Test shape: {X_test.shape}")

    run_ml_experiment_flow(X_train, y_train, X_test, y_test)

In [8]:
classifier_ml('configs/exp3_a.yaml')

--- Starting Quick Experiment: Feature Mode 'extract_fc7' ---
Using DL model from run: 'exp2_04_10crop_bn+cbam'
Model 'alexnet_bn_cbam' created. Pretrained: True, Num classes: 4


  model.load_state_dict(torch.load(checkpoint_path, map_location=device))


Successfully loaded checkpoint: saved_models/exp2_04_10crop_bn+cbam/best_model.pth
DataLoaders created successfully.
  - Preprocessing: 10crop
  - Train samples: 4155, Test samples: 1511


Extracting features (extract_fc7): 100%|██████████| 130/130 [00:15<00:00,  8.40it/s]
Extracting features (extract_fc7): 100%|██████████| 48/48 [00:19<00:00,  2.42it/s]


Features extracted. Train shape: (4155, 1, 4096), Test shape: (1511, 10, 4096)

--- Training and Evaluating MLP ---

[RESULTS] Single MLP model:
  - Mean 5-fold CV Accuracy: 0.9153
  - Test Set Performance:
              precision    recall  f1-score   support

           0     0.8819    0.9769    0.9270       390
           1     0.6587    0.7135    0.6850       349
           2     0.7906    0.7981    0.7943       421
           3     0.9928    0.7806    0.8740       351

    accuracy                         0.8206      1511
   macro avg     0.8310    0.8173    0.8201      1511
weighted avg     0.8307    0.8206    0.8218      1511


--- Training and Evaluating KNN ---

[RESULTS] Single KNN model:
  - Mean 5-fold CV Accuracy: 0.9122
  - Test Set Performance:
              precision    recall  f1-score   support

           0     0.8991    0.9821    0.9387       390
           1     0.8351    0.6676    0.7420       349
           2     0.7764    0.9074    0.8368       421
           3 

In [9]:
classifier_ml('configs/exp3_b.yaml')

--- Starting Quick Experiment: Feature Mode 'extract_fused' ---
Using DL model from run: 'exp2_04_10crop_bn+cbam'
Model 'alexnet_bn_cbam' created. Pretrained: True, Num classes: 4


  model.load_state_dict(torch.load(checkpoint_path, map_location=device))


Successfully loaded checkpoint: saved_models/exp2_04_10crop_bn+cbam/best_model.pth
DataLoaders created successfully.
  - Preprocessing: 10crop
  - Train samples: 4155, Test samples: 1511


Extracting features (extract_fused): 100%|██████████| 130/130 [00:15<00:00,  8.38it/s]
Extracting features (extract_fused): 100%|██████████| 48/48 [00:20<00:00,  2.39it/s]


Features extracted. Train shape: (4155, 1, 4352), Test shape: (1511, 10, 4352)

--- Training and Evaluating MLP ---

[RESULTS] Single MLP model:
  - Mean 5-fold CV Accuracy: 0.9208
  - Test Set Performance:
              precision    recall  f1-score   support

           0     0.8991    0.9821    0.9387       390
           1     0.7597    0.6160    0.6804       349
           2     0.7696    0.8646    0.8143       421
           3     0.9331    0.8746    0.9029       351

    accuracy                         0.8398      1511
   macro avg     0.8404    0.8343    0.8341      1511
weighted avg     0.8387    0.8398    0.8361      1511


--- Training and Evaluating KNN ---

[RESULTS] Single KNN model:
  - Mean 5-fold CV Accuracy: 0.9148
  - Test Set Performance:
              precision    recall  f1-score   support

           0     0.9054    0.9821    0.9422       390
           1     0.8172    0.6533    0.7261       349
           2     0.7658    0.8931    0.8246       421
           3 