# Comprehensive Visualization of Mutual Learning Metrics

This notebook produces professional-grade visualizations for an IEEE research paper on mutual learning, ensemble distillation, and uncertainty calibration. It generates multiple plots from the training logs:

1. **Main Dashboard**: Combined metrics with smoothed, interpolated curves
2. **Teacher vs. Student Comparison**: Direct comparison between teacher models and student
3. **Final Performance Radar**: Multi-dimensional performance visualization
4. **Weight Evolution**: How mutual learning and calibration weights changed
5. **Calibration Reliability**: Visualization of model calibration quality

In [None]:
# Import the enhanced visualization script
import sys
import os
from pathlib import Path

# Ensure all required packages are installed
try:
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    import seaborn as sns
    from scipy.interpolate import make_interp_spline
    from scipy.ndimage import gaussian_filter1d
except ImportError:
    !pip install numpy matplotlib pandas seaborn scipy
    
# Execute the visualization script
%run -i visualize_metrics.py

## 1. Dashboard with Smoothed Metrics

The main dashboard shows all training metrics with professionally smoothed curves using Gaussian filtering. This enhances the visual appearance while preserving the underlying trends.

In [None]:
# Display the main dashboard
from IPython.display import Image, display

dashboard_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\plots\mutual_learning_metrics_dashboard_smooth.png")
if dashboard_path.exists():
    display(Image(str(dashboard_path)))
else:
    print(f"Dashboard image not found at {dashboard_path}")

## 2. Teacher vs. Student Performance Comparison

This visualization directly compares the performance of teacher models against the student model, highlighting knowledge transfer effectiveness.

In [None]:
# Display the teacher vs student comparison
teacher_student_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\plots\teacher_vs_student_comparison.png")
if teacher_student_path.exists():
    display(Image(str(teacher_student_path)))
else:
    print(f"Teacher vs Student comparison not found at {teacher_student_path}")

## 3. Final Model Performance Radar Chart

The radar chart provides a multi-dimensional view of model performance across accuracy, loss, and calibration metrics.

In [None]:
# Display the radar chart
radar_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\plots\final_performance_radar.png")
if radar_path.exists():
    display(Image(str(radar_path)))
else:
    print(f"Radar chart not found at {radar_path}")

## 4. Weight Evolution During Training

This plot shows how mutual learning and calibration weights evolved throughout training, providing insight into the learning process dynamics.

In [None]:
# Display the weight evolution plot
weights_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\plots\weight_evolution.png")
if weights_path.exists():
    display(Image(str(weights_path)))
else:
    print(f"Weight evolution plot not found at {weights_path}")

## 5. Calibration Reliability Diagram

The reliability diagram visualizes model calibration by showing the relationship between predicted confidence and actual accuracy.

In [None]:
# Display the reliability diagram
reliability_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\plots\calibration_reliability_diagram.png")
if reliability_path.exists():
    display(Image(str(reliability_path)))
else:
    print(f"Reliability diagram not found at {reliability_path}")

## Summary of Key Findings

These visualizations demonstrate several important findings:

1. **Student Performance**: The student model achieves accuracy comparable to or better than most teacher models while maintaining good calibration.

2. **Calibration Quality**: All models show low Expected Calibration Error (ECE), indicating well-calibrated predictions.

3. **Temperature Evolution**: Temperature parameters stabilize after initial epochs, suggesting convergence to optimal confidence levels.

4. **Architecture Patterns**: Different architectures show distinct learning patterns, with some (like ViT) having slower convergence but comparable final performance.

5. **Knowledge Transfer**: The mutual learning approach successfully transfers knowledge across diverse architectures, demonstrating the effectiveness of the collaborative learning framework.

In [None]:
# Generate final performance table for the paper
import pandas as pd
import re

def extract_final_metrics(log_path):
    # Extract metrics from log file
    metrics_df, _ = extract_metrics_from_log(log_path)
    
    # Get the final epoch
    max_epoch = metrics_df['epoch'].max()
    final_metrics = metrics_df[metrics_df['epoch'] == max_epoch]
    
    # Prepare table data
    table_data = []
    for _, row in final_metrics.iterrows():
        table_data.append({
            'Model': MODEL_DISPLAY_NAMES[row['model']],
            'Validation Accuracy (%)': f"{row['val_acc']:.2f}",
            'Training Accuracy (%)': f"{row['train_acc']:.2f}",
            'Validation Loss': f"{row['val_loss']:.4f}",
            'Training Loss': f"{row['train_loss']:.4f}",
            'ECE': f"{row['ece']:.4f}",
            'Temperature': f"{row['temperature']:.2f}"
        })
    
    return pd.DataFrame(table_data)

# Get log path and extract metrics
log_path = Path(r"C:\Users\Gading\Downloads\Research\Results\MutualLearning\logs\error.log")
final_table = extract_final_metrics(log_path)

# Style the table with highlighted best values
def highlight_max(s, props=''):
    is_max = s == s.max()
    return ['background-color: lightgreen' if v else '' for v in is_max]

def highlight_min(s, props=''):
    is_min = s == s.min()
    return ['background-color: lightgreen' if v else '' for v in is_min]

# Convert string columns to numeric for comparison
numeric_cols = final_table.columns[1:]
comparison_df = final_table.copy()
for col in numeric_cols:
    comparison_df[col] = pd.to_numeric(final_table[col], errors='coerce')

# Apply styling
styled_table = final_table.style\
    .apply(highlight_max, subset=['Validation Accuracy (%)', 'Training Accuracy (%)'])\
    .apply(highlight_min, subset=['Validation Loss', 'Training Loss', 'ECE'])\
    .set_properties(**{'text-align': 'center'})\
    .set_caption("Final Model Performance Metrics (Epoch 49)")

# Display the table
styled_table