# Analysis and Visualization of Benchmark Results

This notebook loads the results generated by `run_benchmark.py` and reproduces the key figures and tables from the paper *'Reproducible Benchmark of Wavelet-Enhanced Intrabody Communication Biometric Identification'*.

The primary goal is to provide a clear and reproducible workflow for analyzing the benchmark outcomes without re-running the experiments.

## 1. Setup and Loading Results

In [None]:
import json
import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style for better aesthetics
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

RESULTS_DIR = 'results'

print(f"Loading results from: {RESULTS_DIR}/")

# Find all JSON result files
result_files = glob.glob(os.path.join(RESULTS_DIR, '*.json'))

# Load and parse results into a list of dictionaries
all_results = []
for f_path in result_files:
    with open(f_path, 'r') as f:
        res = json.load(f)
        # Flatten the nested structure for easier DataFrame creation
        flat_res = {
            'model': res['config']['model'],
            'feature': res['config']['feature'],
            'accuracy': res['metrics'].get('accuracy'),
            'roc_auc_ovr': res['metrics'].get('roc_auc_ovr')
        }
        all_results.append(flat_res)

# Create a pandas DataFrame
df_results = pd.DataFrame(all_results)

print(f"Loaded {len(df_results)} experiment results.")
df_results.head()

## 2. Performance Summary Table

This table summarizes the main findings, showing the performance of each model and feature combination. The results are sorted by accuracy to highlight the best-performing approaches.

In [None]:
# Create a pivot table for a clear overview
performance_summary = df_results.pivot_table(
    index='model',
    columns='feature',
    values='accuracy'
).sort_values(by='combined', ascending=False) # Sort by our best feature set

# Display formatted table
performance_summary.style.format("{:.4f}").background_gradient(cmap='viridis')

## 3. Visual Comparison of Model Accuracy

A bar chart provides an intuitive visual comparison of the key results, making it easy to see the superiority of the MLP model with combined features.

In [None]:
plt.figure(figsize=(12, 7))
ax = sns.barplot(data=df_results, x='model', y='accuracy', hue='feature', palette='viridis')

ax.set_title('Model Accuracy Comparison by Feature Set', fontsize=16)
ax.set_xlabel('Model', fontsize=12)
ax.set_ylabel('Test Accuracy', fontsize=12)
ax.set_ylim(0, 1.0)
plt.legend(title='Feature Set')
plt.tight_layout()
plt.show()

## 4. In-depth Analysis (Placeholder)

The cells below serve as templates for more detailed analyses, such as plotting ROC curves or feature importances. To fully execute them, `run_benchmark.py` would need to be modified to save not just the metrics, but also the raw predictions, probabilities, and/or trained model objects.

### 4.1. ROC Curve Analysis
This requires saving `y_test` and `y_prob` for each experiment.

In [None]:
# Placeholder: This code demonstrates the logic for plotting ROC curves.
# To run, you would first need to load the saved true labels and prediction probabilities.

def plot_roc_placeholder():
    # --- This is mock data for demonstration --- #
    # In a real run, load this from files, e.g.:
    # y_test = np.load('results/y_test.npy')
    # y_prob_mlp = np.load('results/mlp_combined_probs.npy')
    # y_prob_cnn = np.load('results/cnn_raw_probs.npy')
    num_classes = 30
    y_test = np.random.randint(0, num_classes, 1000)
    y_prob_mlp = np.random.rand(1000, num_classes)
    y_prob_cnn = np.random.rand(1000, num_classes)
    # Normalize probabilities to sum to 1
    y_prob_mlp /= y_prob_mlp.sum(axis=1, keepdims=True)
    y_prob_cnn /= y_prob_cnn.sum(axis=1, keepdims=True)
    # --- End of mock data ---
    
    # Binarize the output for multi-class ROC
    y_test_bin = pd.get_dummies(y_test).values

    # Compute ROC for MLP
    fpr_mlp, tpr_mlp, _ = roc_curve(y_test_bin.ravel(), y_prob_mlp.ravel())
    roc_auc_mlp = auc(fpr_mlp, tpr_mlp)

    # Compute ROC for CNN
    fpr_cnn, tpr_cnn, _ = roc_curve(y_test_bin.ravel(), y_prob_cnn.ravel())
    roc_auc_cnn = auc(fpr_cnn, tpr_cnn)

    # Plotting
    plt.figure(figsize=(8, 8))
    plt.plot(fpr_mlp, tpr_mlp, color='darkorange', lw=2, label=f'MLP (Combined) - micro-avg AUC = {roc_auc_mlp:.2f}')
    plt.plot(fpr_cnn, tpr_cnn, color='navy', lw=2, label=f'CNN (Raw) - micro-avg AUC = {roc_auc_cnn:.2f}')
    plt.plot([0, 1], [0, 1], 'k--', lw=2)
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Micro-Average ROC Curve Comparison')
    plt.legend(loc="lower right")
    plt.show()

plot_roc_placeholder()

### 4.2. Feature Importance
This requires saving the trained model object, for instance, a Random Forest classifier.

In [None]:
# Placeholder: This demonstrates how to plot feature importances from a trained RF model.
# To run, you would first need to load a saved model, e.g., from a pickle file.

def plot_feature_importance_placeholder():
    # --- This is mock data for demonstration --- #
    # In a real run, load a trained RF model and feature names
    # from sklearn.ensemble import RandomForestClassifier
    # model = load('results/rf_combined_model.pkl') 
    # feature_names = get_feature_names() # A helper function to get feature names
    feature_names = [f'freq_{i}' for i in range(256)] + [f'dwt_{i}' for i in range(12)]
    importances = np.random.rand(len(feature_names))
    # --- End of mock data ---

    # Create a pandas series for easy sorting and plotting
    feat_importances = pd.Series(importances, index=feature_names)
    top_20 = feat_importances.nlargest(20)

    plt.figure(figsize=(10, 8))
    sns.barplot(x=top_20.values, y=top_20.index, palette='mako')
    plt.title('Top 20 Feature Importances (Placeholder)')
    plt.xlabel('Importance')
    plt.ylabel('Features')
    plt.show()

plot_feature_importance_placeholder()