# Analysis - Physics-SR Framework v4.1 Benchmark

## Results Analysis and Visualization Module

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Contact:** zz3239@columbia.edu  
**Date:** January 2026  
**Version:** 4.1 (Structure-Guided Feature Library + Computational Optimization)

---

### Purpose

This notebook analyzes and visualizes the benchmark results from Experiments.ipynb:

1. **Summary Statistics**: Overall, by-factor, and by-equation-type performance metrics
2. **Core Visualizations**: 7 figures for main results
3. **v4.1 Library Analysis**: 4 new figures for augmented library composition
4. **Supplementary Visualizations**: 2 figures for additional analysis
5. **LaTeX Tables**: 4 publication-ready tables
6. **Statistical Tests**: Significance testing between methods

### Test Equations (AI Feynman Benchmark)

| # | Name | AI Feynman ID | Equation | Type |
|---|------|---------------|----------|------|
| 1 | Coulomb | I.12.2 | F = k * q1 * q2 / r^2 | Rational |
| 2 | Cosines | I.29.16 | x = sqrt(x1^2 + x2^2 - 2*x1*x2*cos(theta1-theta2)) | Nested Trig |
| 3 | Barometric | I.40.1 | n = n0 * exp(-m*g*x/(k_B*T)) | Exponential |
| 4 | DotProduct | I.11.19 | A = x1*y1 + x2*y2 + x3*y3 | Polynomial |

---
## Section 1: Header and Imports

In [None]:
# ==============================================================================
# ENVIRONMENT RESET AND FRESH CLONE
# ==============================================================================

import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    import os
    import shutil
    import gc
    
    repo_path = '/content/Physics-Informed-Symbolic-Regression'
    current_dir = os.getcwd()
    
    if not current_dir.startswith(repo_path) or not os.path.exists(repo_path + '/.git'):
        os.chdir('/content')
        gc.collect()
        
        if os.path.exists(repo_path):
            shutil.rmtree(repo_path)
            print("[OK] Removed existing repository.")
        
        !git clone https://github.com/Garthzzz/Physics-Informed-Symbolic-Regression.git
        
        if os.path.exists(repo_path + '/.git'):
            os.chdir(repo_path + '/benchmark')
            print(f"[OK] Working directory: {os.getcwd()}")
        else:
            print("[FAIL] Clone incomplete!")
        
        print("[OK] Environment reset complete.")
    else:
        print("[SKIP] Already in valid repository.")
else:
    print("[INFO] Not in Colab environment.")

In [None]:
"""
Analysis.ipynb - Results Analysis and Visualization Module
Physics-SR Framework v4.1 Benchmark Suite

Author: Zhengze Zhang
Contact: zz3239@columbia.edu
Version: 4.1
"""

import os
import sys
import pickle
import warnings
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Any, Union

import numpy as np
import pandas as pd
from scipy import stats

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

warnings.filterwarnings('ignore')

print("Analysis v4.1: All imports successful.")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")

In [None]:
# ==============================================================================
# PATH CONFIGURATION
# ==============================================================================

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    BASE_DIR = Path('/content/Physics-Informed-Symbolic-Regression')
    BENCHMARK_DIR = BASE_DIR / 'benchmark'
else:
    BENCHMARK_DIR = Path('.').resolve()
    BASE_DIR = BENCHMARK_DIR.parent

RESULTS_DIR = BENCHMARK_DIR / 'results'
FIGURES_DIR = RESULTS_DIR / 'figures'
TABLES_DIR = RESULTS_DIR / 'tables'
DATA_DIR = BENCHMARK_DIR / 'data'

RESULTS_DIR.mkdir(exist_ok=True, parents=True)
FIGURES_DIR.mkdir(exist_ok=True, parents=True)
TABLES_DIR.mkdir(exist_ok=True, parents=True)

print(f"Environment: {'Google Colab' if IN_COLAB else 'Local'}")
print(f"Base directory: {BASE_DIR}")
print(f"Results directory: {RESULTS_DIR}")
print(f"Figures directory: {FIGURES_DIR}")
print(f"Tables directory: {TABLES_DIR}")

In [None]:
# ==============================================================================
# PLOTTING CONFIGURATION (v4.1)
# ==============================================================================

plt.style.use('seaborn-v0_8-whitegrid')

METHOD_COLORS = {
    'physics_sr': '#2E86AB',
    'pysr_only': '#E94F37',
    'lasso_pysr': '#F39C12',
}

METHOD_NAMES = {
    'physics_sr': 'Physics-SR',
    'pysr_only': 'PySR-Only',
    'lasso_pysr': 'LASSO+PySR',
}

EQUATION_NAMES = {
    'coulomb': 'Coulomb',
    'cosines': 'Cosines',
    'barometric': 'Barometric',
    'dotproduct': 'DotProduct',
}

EQUATION_TYPE_NAMES = {
    'rational': 'Rational',
    'nested_trigonometric': 'Nested Trig',
    'exponential': 'Exponential',
    'polynomial_interaction': 'Polynomial',
}

AI_FEYNMAN_IDS = {
    'coulomb': 'I.12.2',
    'cosines': 'I.29.16',
    'barometric': 'I.40.1',
    'dotproduct': 'I.11.19',
}

LIBRARY_COLORS = {
    'pysr': '#2ecc71',
    'variant': '#3498db',
    'poly': '#9b59b6',
    'op': '#e74c3c',
}

TIMING_COLORS = {
    'stage1': '#3498db',
    'pysr': '#e74c3c',
    'library': '#2ecc71',
    'ewsindy': '#9b59b6',
    'stage3': '#f39c12',
}

FIGURE_SIZES = {
    'single': (8, 6),
    'wide': (12, 6),
    'tall': (8, 10),
    'square': (8, 8),
    'large': (14, 10),
}

FIGURE_DPI = 300

FONTSIZE_TITLE = 14
FONTSIZE_LABEL = 12
FONTSIZE_TICK = 10
FONTSIZE_LEGEND = 10
FONTSIZE_ANNOTATION = 8

plt.rcParams.update({
    'font.size': FONTSIZE_TICK,
    'axes.titlesize': FONTSIZE_TITLE,
    'axes.labelsize': FONTSIZE_LABEL,
    'xtick.labelsize': FONTSIZE_TICK,
    'ytick.labelsize': FONTSIZE_TICK,
    'legend.fontsize': FONTSIZE_LEGEND,
    'figure.titlesize': FONTSIZE_TITLE,
})

print("Plotting configuration v4.1 set.")

---
## Section 2: Load Results

In [None]:
# ==============================================================================
# LOAD EXPERIMENT RESULTS (v4.1)
# ==============================================================================

def load_results_csv(filepath: Path = None) -> pd.DataFrame:
    """
    Load experiment results from CSV file (v4.1).
    """
    if filepath is None:
        filepath = RESULTS_DIR / 'experiment_results.csv'
    
    if not filepath.exists():
        raise FileNotFoundError(
            f"Results file not found: {filepath}\n"
            f"Please run Experiments.ipynb first to generate results."
        )
    
    df = pd.read_csv(filepath)
    
    if 'with_dims' in df.columns:
        df['with_dims'] = df['with_dims'].astype(bool)
    if 'selected_correct' in df.columns:
        df['selected_correct'] = df['selected_correct'].astype(bool)
    if 'success' in df.columns:
        df['success'] = df['success'].astype(bool)
    
    print(f"Loaded {len(df)} experiment results from {filepath}")
    return df


def load_results_pkl(filepath: Path = None) -> Optional[List]:
    """
    Load detailed experiment results from PKL file (v4.1).
    """
    if filepath is None:
        filepath = RESULTS_DIR / 'experiment_results.pkl'
    
    if not filepath.exists():
        print(f"Warning: PKL file not found: {filepath}")
        return None
    
    try:
        with open(filepath, 'rb') as f:
            results = pickle.load(f)
        print(f"Loaded detailed results from {filepath}")
        return results
    except (AttributeError, ModuleNotFoundError) as e:
        print(f"Warning: Cannot load PKL file: {e}")
        print("Continuing with CSV data only.")
        return None


print("Load functions defined.")

In [None]:
# ==============================================================================
# DATA VALIDATION (v4.1)
# ==============================================================================

def validate_results(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Validate results DataFrame and return summary (v4.1).
    """
    validation = {
        'n_experiments': len(df),
        'n_successful': df['success'].sum() if 'success' in df.columns else len(df),
        'success_rate': df['success'].mean() * 100 if 'success' in df.columns else 100.0,
        'methods': df['method'].unique().tolist(),
        'equations': df['equation_name'].unique().tolist(),
        'noise_levels': sorted(df['noise_level'].unique().tolist()),
        'dummy_counts': sorted(df['n_dummy'].unique().tolist()),
        'sample_sizes': sorted(df['n_samples'].unique().tolist()) if 'n_samples' in df.columns else [500],
        'ai_feynman_ids': df['ai_feynman_id'].unique().tolist() if 'ai_feynman_id' in df.columns else [],
        'has_library_info': 'library_n_total' in df.columns,
        'has_timing_info': 'timing_stage1' in df.columns,
    }
    return validation


def get_available_equations(df: pd.DataFrame) -> List[str]:
    """
    Get list of equations available in DataFrame, ordered consistently.
    """
    preferred_order = ['coulomb', 'cosines', 'barometric', 'dotproduct']
    available = df['equation_name'].unique().tolist()
    return [eq for eq in preferred_order if eq in available] + \
           [eq for eq in available if eq not in preferred_order]


def add_equation_type_column(df: pd.DataFrame) -> pd.DataFrame:
    """
    Add equation type column if not present (v4.1).
    """
    if 'eq_type' not in df.columns:
        type_map = {
            'coulomb': 'rational',
            'cosines': 'nested_trigonometric',
            'barometric': 'exponential',
            'dotproduct': 'polynomial_interaction',
        }
        df['eq_type'] = df['equation_name'].map(type_map)
        print("Added eq_type column.")
    return df


print("Validation functions defined.")

In [None]:
# ==============================================================================
# LOAD DATA
# ==============================================================================

df = load_results_csv()
print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

detailed_results = load_results_pkl()

df = add_equation_type_column(df)

validation = validate_results(df)
print("\n" + "=" * 70)
print("DATA VALIDATION SUMMARY (v4.1)")
print("=" * 70)
print(f"Total experiments: {validation['n_experiments']}")
print(f"Successful experiments: {validation['n_successful']}")
print(f"Success rate: {validation['success_rate']:.1f}%")
print(f"Methods: {validation['methods']}")
print(f"Equations: {validation['equations']}")
print(f"AI Feynman IDs: {validation['ai_feynman_ids']}")
print(f"Noise levels: {validation['noise_levels']}")
print(f"Dummy counts: {validation['dummy_counts']}")
print(f"Sample sizes: {validation['sample_sizes']}")
print(f"Has library info: {validation['has_library_info']}")
print(f"Has timing info: {validation['has_timing_info']}")

print("\nFirst 5 rows:")
display(df.head())

---
## Section 3: Summary Statistics

In [None]:
# ==============================================================================
# METHOD COMPARISON (v4.1)
# ==============================================================================

def compute_method_comparison(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute performance comparison by method (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    comparison = []
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        method_df = core_df[core_df['method'] == method]
        if len(method_df) == 0:
            continue
        
        row = {
            'Method': METHOD_NAMES.get(method, method),
            'N': len(method_df),
            'F1': f"{method_df['var_f1'].mean():.3f} +/- {method_df['var_f1'].std():.3f}",
            'Precision': f"{method_df['var_precision'].mean():.3f} +/- {method_df['var_precision'].std():.3f}",
            'Recall': f"{method_df['var_recall'].mean():.3f} +/- {method_df['var_recall'].std():.3f}",
            'Test R2': f"{method_df['test_r2'].mean():.3f} +/- {method_df['test_r2'].std():.3f}",
            'Runtime (s)': f"{method_df['runtime_seconds'].mean():.1f} +/- {method_df['runtime_seconds'].std():.1f}",
            'Exact Match': f"{method_df['selected_correct'].mean()*100:.1f}%",
        }
        comparison.append(row)
    
    return pd.DataFrame(comparison)


method_comparison = compute_method_comparison(df)
print("\n" + "=" * 100)
print("METHOD COMPARISON (v4.1 - All Equations)")
print("=" * 100)
display(method_comparison)

In [None]:
# ==============================================================================
# BY-EQUATION COMPARISON (v4.1 with AI Feynman IDs)
# ==============================================================================

def compute_equation_comparison(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute performance comparison by equation with AI Feynman IDs (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    pivot = core_df.pivot_table(
        index='equation_name',
        columns='method',
        values=['var_f1', 'test_r2', 'selected_correct'],
        aggfunc='mean'
    ).round(3)
    
    return pivot


equation_comparison = compute_equation_comparison(df)
print("\n" + "=" * 80)
print("PERFORMANCE BY EQUATION (v4.1)")
print("=" * 80)
display(equation_comparison)

print("\nAI Feynman ID Mapping:")
for eq_name in get_available_equations(df):
    ai_id = AI_FEYNMAN_IDS.get(eq_name, 'Unknown')
    print(f"  {eq_name}: {ai_id}")

In [None]:
# ==============================================================================
# METHOD COMPARISON BY EQUATION TYPE (v4.1)
# ==============================================================================

def compute_method_comparison_by_type(df: pd.DataFrame, eq_type: str) -> pd.DataFrame:
    """
    Compute performance comparison by method for specific equation type (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    type_df = core_df[core_df['eq_type'] == eq_type]
    
    comparison = []
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        method_df = type_df[type_df['method'] == method]
        if len(method_df) == 0:
            continue
        
        row = {
            'Method': METHOD_NAMES.get(method, method),
            'N': len(method_df),
            'F1': f"{method_df['var_f1'].mean():.3f}",
            'Precision': f"{method_df['var_precision'].mean():.3f}",
            'Recall': f"{method_df['var_recall'].mean():.3f}",
            'Test R2': f"{method_df['test_r2'].mean():.3f}",
            'Exact Match': f"{method_df['selected_correct'].mean()*100:.1f}%",
        }
        comparison.append(row)
    
    return pd.DataFrame(comparison)


print("\n" + "=" * 100)
print("METHOD COMPARISON BY EQUATION TYPE (v4.1)")
print("=" * 100)

for eq_type in df['eq_type'].unique():
    if pd.isna(eq_type):
        continue
    print(f"\n[{EQUATION_TYPE_NAMES.get(eq_type, eq_type)}]")
    display(compute_method_comparison_by_type(df, eq_type))

In [None]:
# ==============================================================================
# STATISTICAL SIGNIFICANCE TESTS (v4.1)
# ==============================================================================

def compute_statistical_tests(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Perform statistical significance tests between methods (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    physics_sr_f1 = core_df[core_df['method'] == 'physics_sr']['var_f1'].values
    pysr_only_f1 = core_df[core_df['method'] == 'pysr_only']['var_f1'].values
    lasso_pysr_f1 = core_df[core_df['method'] == 'lasso_pysr']['var_f1'].values
    
    results = {}
    
    if len(physics_sr_f1) > 0 and len(pysr_only_f1) > 0:
        t_stat, p_value = stats.ttest_ind(physics_sr_f1, pysr_only_f1)
        effect_size = (physics_sr_f1.mean() - pysr_only_f1.mean()) / np.sqrt(
            (physics_sr_f1.std()**2 + pysr_only_f1.std()**2) / 2
        )
        results['physics_sr_vs_pysr_only'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
            'cohens_d': effect_size,
        }
    
    if len(physics_sr_f1) > 0 and len(lasso_pysr_f1) > 0:
        t_stat, p_value = stats.ttest_ind(physics_sr_f1, lasso_pysr_f1)
        effect_size = (physics_sr_f1.mean() - lasso_pysr_f1.mean()) / np.sqrt(
            (physics_sr_f1.std()**2 + lasso_pysr_f1.std()**2) / 2
        )
        results['physics_sr_vs_lasso_pysr'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
            'cohens_d': effect_size,
        }
    
    if len(physics_sr_f1) > 0 and len(pysr_only_f1) > 0 and len(lasso_pysr_f1) > 0:
        f_stat, p_value = stats.f_oneway(physics_sr_f1, pysr_only_f1, lasso_pysr_f1)
        results['anova'] = {
            'f_statistic': f_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
        }
    
    return results


stat_tests = compute_statistical_tests(df)
print("\n" + "=" * 80)
print("STATISTICAL SIGNIFICANCE TESTS (Variable Selection F1)")
print("=" * 80)

if 'anova' in stat_tests:
    print("\n[ANOVA - Overall]")
    print(f"  F-statistic: {stat_tests['anova']['f_statistic']:.4f}")
    print(f"  p-value: {stat_tests['anova']['p_value']:.6f}")
    print(f"  Significant (p < 0.05): {stat_tests['anova']['significant']}")

print("\n[Pairwise t-tests with Effect Size]")
for comparison, result in stat_tests.items():
    if comparison != 'anova':
        print(f"\n  {comparison.replace('_', ' ').title()}:")
        print(f"    t-statistic: {result['t_statistic']:.4f}")
        print(f"    p-value: {result['p_value']:.6f}")
        print(f"    Significant (p < 0.05): {result['significant']}")
        print(f"    Cohen's d: {result['cohens_d']:.4f}")

---
## Section 4: Core Result Visualizations

In [None]:
# ==============================================================================
# FIGURE 1: VARIABLE SELECTION F1 VS NOISE LEVEL
# ==============================================================================

def plot_f1_vs_noise(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Line plot showing F1 score vs noise level, grouped by method (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        means = subset.groupby('noise_level')['var_f1'].mean()
        stds = subset.groupby('noise_level')['var_f1'].std()
        
        ax.errorbar(
            means.index * 100,
            means.values,
            yerr=stds.values,
            label=METHOD_NAMES[method],
            marker='o',
            markersize=8,
            capsize=5,
            color=METHOD_COLORS[method],
            linewidth=2,
        )
    
    ax.set_xlabel('Noise Level (%)')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection Performance vs Noise Level (v4.1)')
    ax.legend(loc='lower left')
    ax.set_ylim(0, 1.05)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig1 = plot_f1_vs_noise(df, save_path=FIGURES_DIR / 'fig1_f1_vs_noise.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 2: VARIABLE SELECTION F1 VS DUMMY COUNT
# ==============================================================================

def plot_f1_vs_dummy(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Line plot showing F1 score vs dummy feature count, grouped by method (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        means = subset.groupby('n_dummy')['var_f1'].mean()
        stds = subset.groupby('n_dummy')['var_f1'].std()
        
        ax.errorbar(
            means.index,
            means.values,
            yerr=stds.values,
            label=METHOD_NAMES[method],
            marker='s',
            markersize=8,
            capsize=5,
            color=METHOD_COLORS[method],
            linewidth=2,
        )
    
    ax.set_xlabel('Number of Dummy Features')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection Performance vs Dummy Features (v4.1)')
    ax.legend(loc='lower left')
    ax.set_ylim(0, 1.05)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig2 = plot_f1_vs_dummy(df, save_path=FIGURES_DIR / 'fig2_f1_vs_dummy.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 3: TEST R2 COMPARISON
# ==============================================================================

def plot_test_r2_comparison(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Bar chart comparing Test R2 by method (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    methods = [m for m in methods if m in core_df['method'].unique()]
    x = np.arange(len(methods))
    
    means = []
    stds = []
    colors = []
    for method in methods:
        subset = core_df[core_df['method'] == method]
        means.append(subset['test_r2'].mean())
        stds.append(subset['test_r2'].std())
        colors.append(METHOD_COLORS[method])
    
    bars = ax.bar(x, means, yerr=stds, color=colors, capsize=5, alpha=0.8)
    
    for bar, mean in zip(bars, means):
        ax.annotate(
            f'{mean:.3f}', xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
            xytext=(0, 3), textcoords='offset points', ha='center', fontsize=10,
        )
    
    ax.set_xlabel('Method')
    ax.set_ylabel('Test R2 Score')
    ax.set_title('Prediction Accuracy Comparison (v4.1)')
    ax.set_xticks(x)
    ax.set_xticklabels([METHOD_NAMES[m] for m in methods])
    ax.set_ylim(0, 1.1)
    ax.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig3 = plot_test_r2_comparison(df, save_path=FIGURES_DIR / 'fig3_test_r2_comparison.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 4: DIMENSIONAL INFORMATION BENEFIT
# ==============================================================================

def plot_dims_benefit(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Grouped bar chart showing performance with vs without dimensional info (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    physics_sr_df = core_df[core_df['method'] == 'physics_sr']
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    equations = get_available_equations(physics_sr_df)
    x = np.arange(len(equations))
    width = 0.35
    
    with_dims_means = []
    without_dims_means = []
    for equation in equations:
        subset_with = physics_sr_df[(physics_sr_df['equation_name'] == equation) & (physics_sr_df['with_dims'] == True)]
        subset_without = physics_sr_df[(physics_sr_df['equation_name'] == equation) & (physics_sr_df['with_dims'] == False)]
        with_dims_means.append(subset_with['var_f1'].mean() if len(subset_with) > 0 else 0)
        without_dims_means.append(subset_without['var_f1'].mean() if len(subset_without) > 0 else 0)
    
    ax.bar(x - width/2, with_dims_means, width, label='With Dimensions', color='#2E86AB', alpha=0.8)
    ax.bar(x + width/2, without_dims_means, width, label='Without Dimensions', color='#E94F37', alpha=0.8)
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Benefit of Dimensional Information (Physics-SR v4.1)')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.15)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig4 = plot_dims_benefit(df, save_path=FIGURES_DIR / 'fig4_dims_benefit.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 5: RUNTIME COMPARISON
# ==============================================================================

def plot_runtime_comparison(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Bar chart comparing runtime by method (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    methods = [m for m in methods if m in core_df['method'].unique()]
    x = np.arange(len(methods))
    
    means = []
    stds = []
    colors = []
    for method in methods:
        subset = core_df[core_df['method'] == method]
        means.append(subset['runtime_seconds'].mean())
        stds.append(subset['runtime_seconds'].std())
        colors.append(METHOD_COLORS[method])
    
    bars = ax.bar(x, means, yerr=stds, color=colors, capsize=5, alpha=0.8)
    
    for bar, mean in zip(bars, means):
        ax.annotate(
            f'{mean:.1f}s', xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
            xytext=(0, 3), textcoords='offset points', ha='center', fontsize=10,
        )
    
    ax.axhline(y=180, color='red', linestyle='--', label='Colab Pro Budget (180s)', alpha=0.7)
    
    ax.set_xlabel('Method')
    ax.set_ylabel('Runtime (seconds)')
    ax.set_title('Computational Cost Comparison (v4.1)')
    ax.set_xticks(x)
    ax.set_xticklabels([METHOD_NAMES[m] for m in methods])
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig5 = plot_runtime_comparison(df, save_path=FIGURES_DIR / 'fig5_runtime.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 6: COMPREHENSIVE HEATMAP
# ==============================================================================

def plot_comprehensive_heatmap(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Heatmap showing F1 scores across all experimental conditions (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    core_df['row_label'] = core_df['equation_name'].map(EQUATION_NAMES) + ' | ' + core_df['method'].map(METHOD_NAMES)
    core_df['col_label'] = (
        'Noise=' + (core_df['noise_level'] * 100).astype(int).astype(str) + '%, ' +
        'Dummy=' + core_df['n_dummy'].astype(str) + ', ' +
        'Dims=' + core_df['with_dims'].map({True: 'T', False: 'F'})
    )
    
    pivot = core_df.pivot_table(
        index='row_label',
        columns='col_label',
        values='var_f1',
        aggfunc='mean'
    )
    
    equations = [EQUATION_NAMES.get(eq, eq) for eq in get_available_equations(core_df)]
    methods = ['Physics-SR', 'PySR-Only', 'LASSO+PySR']
    row_order = [f"{eq} | {m}" for eq in equations for m in methods]
    pivot = pivot.reindex([r for r in row_order if r in pivot.index])
    
    col_order = sorted(pivot.columns)
    pivot = pivot[col_order]
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['large'])
    
    sns.heatmap(
        pivot,
        annot=True,
        fmt='.2f',
        cmap='RdYlGn',
        vmin=0,
        vmax=1,
        ax=ax,
        cbar_kws={'label': 'F1 Score'},
        annot_kws={'size': 8},
    )
    
    ax.set_xlabel('Experimental Condition')
    ax.set_ylabel('Equation | Method')
    ax.set_title('Variable Selection F1 Score Across All Conditions (v4.1)')
    
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig6 = plot_comprehensive_heatmap(df, save_path=FIGURES_DIR / 'fig6_heatmap.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 7: F1 COMPARISON BY EQUATION
# ==============================================================================

def plot_f1_comparison_by_equation(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Grouped bar chart comparing F1 by method and equation (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['wide'])
    
    equations = get_available_equations(core_df)
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    x = np.arange(len(equations))
    width = 0.25
    
    for i, method in enumerate(methods):
        means = []
        stds = []
        for eq in equations:
            subset = core_df[(core_df['method'] == method) & (core_df['equation_name'] == eq)]
            if len(subset) > 0:
                means.append(subset['var_f1'].mean())
                stds.append(subset['var_f1'].std())
            else:
                means.append(0)
                stds.append(0)
        
        ax.bar(
            x + i * width - width, means, width, yerr=stds,
            label=METHOD_NAMES[method], color=METHOD_COLORS[method],
            capsize=3, alpha=0.8,
        )
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection F1 by Method and Equation (v4.1)')
    ax.set_xticks(x)
    ax.set_xticklabels([f"{EQUATION_NAMES.get(eq, eq)}\n({AI_FEYNMAN_IDS.get(eq, '')})" for eq in equations])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.15)
    ax.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig7 = plot_f1_comparison_by_equation(df, save_path=FIGURES_DIR / 'fig7_f1_by_equation.png')
plt.show()

---
## Section 5: v4.1 Library Analysis Visualizations (NEW)

In [None]:
# ==============================================================================
# FIGURE 8: LIBRARY COMPOSITION BY EQUATION (v4.1 NEW)
# ==============================================================================

def plot_library_composition(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Stacked bar chart showing 4-layer library composition by equation (v4.1 NEW).
    
    Only for Physics-SR method.
    """
    physics_sr = df[df['method'] == 'physics_sr'].copy()
    
    if 'library_n_pysr' not in physics_sr.columns:
        print("Warning: Library composition columns not found in results.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
        ax.text(0.5, 0.5, 'Library composition data not available', ha='center', va='center', fontsize=14)
        ax.set_title('Library Composition (Not Available)')
        return fig
    
    grouped = physics_sr.groupby('equation_name').agg({
        'library_n_pysr': 'mean',
        'library_n_variant': 'mean',
        'library_n_poly': 'mean',
        'library_n_op': 'mean',
    }).reset_index()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    equations = grouped['equation_name']
    x = np.arange(len(equations))
    width = 0.6
    
    bottom = np.zeros(len(equations))
    colors = [LIBRARY_COLORS['pysr'], LIBRARY_COLORS['variant'], LIBRARY_COLORS['poly'], LIBRARY_COLORS['op']]
    labels = ['[PySR] Exact', '[Var] Variants', '[Poly] Polynomial', '[Op] Operators']
    
    for col, color, label in zip(
        ['library_n_pysr', 'library_n_variant', 'library_n_poly', 'library_n_op'],
        colors, labels
    ):
        values = grouped[col].fillna(0).values
        ax.bar(x, values, width, bottom=bottom, label=label, color=color)
        bottom += values
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Number of Library Terms')
    ax.set_title('v4.0 Augmented Library Composition by Equation')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='upper right')
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig8 = plot_library_composition(df, save_path=FIGURES_DIR / 'fig8_library_composition.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 9: SELECTION SOURCES BY EQUATION (v4.1 NEW)
# ==============================================================================

def plot_selection_sources(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Stacked bar chart showing where selected terms came from (v4.1 NEW).
    
    This visualizes the v4.0 innovation: PySR terms can be kept,
    polynomial terms can fill gaps.
    """
    physics_sr = df[df['method'] == 'physics_sr'].copy()
    
    if 'selected_from_pysr' not in physics_sr.columns:
        print("Warning: Selection source columns not found in results.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
        ax.text(0.5, 0.5, 'Selection source data not available', ha='center', va='center', fontsize=14)
        ax.set_title('Selection Sources (Not Available)')
        return fig
    
    grouped = physics_sr.groupby('equation_name').agg({
        'selected_from_pysr': 'mean',
        'selected_from_variant': 'mean',
        'selected_from_poly': 'mean',
        'selected_from_op': 'mean',
    }).reset_index()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    equations = grouped['equation_name']
    x = np.arange(len(equations))
    width = 0.6
    
    bottom = np.zeros(len(equations))
    colors = [LIBRARY_COLORS['pysr'], LIBRARY_COLORS['variant'], LIBRARY_COLORS['poly'], LIBRARY_COLORS['op']]
    labels = ['From PySR', 'From Variants', 'From Polynomial', 'From Operators']
    
    for col, color, label in zip(
        ['selected_from_pysr', 'selected_from_variant', 'selected_from_poly', 'selected_from_op'],
        colors, labels
    ):
        values = grouped[col].fillna(0).values
        ax.bar(x, values, width, bottom=bottom, label=label, color=color)
        bottom += values
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Number of Selected Terms')
    ax.set_title('Source of Final Equation Terms (v4.0 Structure-Guided)')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='upper right')
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig9 = plot_selection_sources(df, save_path=FIGURES_DIR / 'fig9_selection_sources.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 10: PYSR VS POLYNOMIAL CONTRIBUTION (v4.1 NEW)
# ==============================================================================

def plot_pysr_vs_polynomial(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Grouped bar chart comparing PySR vs Polynomial layer contribution (v4.1 NEW).
    """
    physics_sr = df[df['method'] == 'physics_sr'].copy()
    
    if 'selected_from_pysr' not in physics_sr.columns or 'selected_from_poly' not in physics_sr.columns:
        print("Warning: Selection source columns not found in results.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
        ax.text(0.5, 0.5, 'Data not available', ha='center', va='center', fontsize=14)
        ax.set_title('PySR vs Polynomial Contribution (Not Available)')
        return fig
    
    grouped = physics_sr.groupby('equation_name').agg({
        'selected_from_pysr': 'mean',
        'selected_from_poly': 'mean',
        'library_n_pysr': 'mean',
        'library_n_poly': 'mean',
    }).reset_index()
    
    fig, axes = plt.subplots(1, 2, figsize=FIGURE_SIZES['wide'])
    
    equations = grouped['equation_name']
    x = np.arange(len(equations))
    width = 0.35
    
    # Left: Library size comparison
    ax1 = axes[0]
    ax1.bar(x - width/2, grouped['library_n_pysr'].fillna(0), width, label='PySR Layer', color=LIBRARY_COLORS['pysr'])
    ax1.bar(x + width/2, grouped['library_n_poly'].fillna(0), width, label='Polynomial Layer', color=LIBRARY_COLORS['poly'])
    ax1.set_xlabel('Equation')
    ax1.set_ylabel('Number of Terms')
    ax1.set_title('Library Size: PySR vs Polynomial')
    ax1.set_xticks(x)
    ax1.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax1.legend()
    ax1.grid(axis='y', alpha=0.3)
    
    # Right: Selection comparison
    ax2 = axes[1]
    ax2.bar(x - width/2, grouped['selected_from_pysr'].fillna(0), width, label='Selected from PySR', color=LIBRARY_COLORS['pysr'])
    ax2.bar(x + width/2, grouped['selected_from_poly'].fillna(0), width, label='Selected from Polynomial', color=LIBRARY_COLORS['poly'])
    ax2.set_xlabel('Equation')
    ax2.set_ylabel('Number of Selected Terms')
    ax2.set_title('Selection: PySR vs Polynomial')
    ax2.set_xticks(x)
    ax2.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax2.legend()
    ax2.grid(axis='y', alpha=0.3)
    
    plt.suptitle('v4.0 Structure-Guided: PySR vs Polynomial Layer Analysis', fontsize=14, y=1.02)
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig10 = plot_pysr_vs_polynomial(df, save_path=FIGURES_DIR / 'fig10_pysr_vs_polynomial.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 11: TIMING PROFILE BREAKDOWN (v4.1 NEW)
# ==============================================================================

def plot_timing_profile(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Stacked bar chart showing timing breakdown by stage (v4.1 NEW).
    
    Only for Physics-SR method.
    """
    physics_sr = df[df['method'] == 'physics_sr'].copy()
    
    if 'timing_stage1' not in physics_sr.columns:
        print("Warning: Timing profile columns not found in results.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
        ax.text(0.5, 0.5, 'Timing profile data not available', ha='center', va='center', fontsize=14)
        ax.set_title('Timing Profile (Not Available)')
        return fig
    
    grouped = physics_sr.groupby('equation_name').agg({
        'timing_stage1': 'mean',
        'timing_pysr': 'mean',
        'timing_library': 'mean',
        'timing_ewsindy': 'mean',
        'timing_stage3': 'mean',
    }).reset_index()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    equations = grouped['equation_name']
    x = np.arange(len(equations))
    width = 0.6
    
    bottom = np.zeros(len(equations))
    colors = list(TIMING_COLORS.values())
    labels = ['Stage 1 (Screening)', 'PySR Discovery', 'Library Build', 'E-WSINDy', 'Stage 3 (UQ)']
    
    for col, color, label in zip(
        ['timing_stage1', 'timing_pysr', 'timing_library', 'timing_ewsindy', 'timing_stage3'],
        colors, labels
    ):
        values = grouped[col].fillna(0).values
        ax.bar(x, values, width, bottom=bottom, label=label, color=color)
        bottom += values
    
    ax.axhline(y=180, color='red', linestyle='--', label='Colab Pro Budget (180s)', linewidth=2)
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Time (seconds)')
    ax.set_title('v4.1 Computational Profile by Equation')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='upper right')
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig11 = plot_timing_profile(df, save_path=FIGURES_DIR / 'fig11_timing_profile.png')
plt.show()

---
## Section 6: Supplementary Visualizations

In [None]:
# ==============================================================================
# FIGURE 12: SAMPLE SIZE SENSITIVITY (PHYSICS-SR ONLY)
# ==============================================================================

def plot_sample_size_sensitivity(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Line plot showing performance vs sample size for Physics-SR (v4.1).
    """
    physics_sr_df = df[df['method'] == 'physics_sr'].copy()
    
    if 'n_samples' not in physics_sr_df.columns or len(physics_sr_df['n_samples'].unique()) < 2:
        print("Warning: Not enough sample size variation for sensitivity plot.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['wide'])
        ax.text(0.5, 0.5, 'Insufficient sample size variation', ha='center', va='center', fontsize=14)
        ax.set_title('Sample Size Sensitivity (Not Available)')
        return fig
    
    fig, axes = plt.subplots(1, 2, figsize=FIGURE_SIZES['wide'])
    equations = get_available_equations(physics_sr_df)
    
    styles = {
        'coulomb':   {'marker': 'o', 'markersize': 10, 'linewidth': 2.5, 'color': '#2E86AB'},
        'cosines':   {'marker': 's', 'markersize': 8,  'linewidth': 2.0, 'color': '#E94F37'},
        'barometric': {'marker': '^', 'markersize': 8,  'linewidth': 2.0, 'color': '#4CAF50'},
        'dotproduct': {'marker': 'd', 'markersize': 8,  'linewidth': 2.0, 'color': '#9C27B0'},
    }
    
    ax1 = axes[0]
    for equation in equations:
        subset = physics_sr_df[physics_sr_df['equation_name'] == equation]
        if len(subset) > 0 and len(subset['n_samples'].unique()) > 1:
            means = subset.groupby('n_samples')['var_f1'].mean()
            style = styles.get(equation, {'marker': 'o', 'markersize': 8, 'linewidth': 2.0, 'color': 'gray'})
            ax1.plot(
                means.index,
                means.values,
                marker=style['marker'],
                markersize=style['markersize'],
                linewidth=style['linewidth'],
                color=style['color'],
                label=EQUATION_NAMES.get(equation, equation),
            )
    
    ax1.set_xlabel('Sample Size (n)')
    ax1.set_ylabel('Variable Selection F1 Score')
    ax1.set_title('F1 Score vs Sample Size')
    ax1.legend()
    ax1.set_ylim(0, 1.05)
    ax1.grid(True, alpha=0.3)
    
    ax2 = axes[1]
    for equation in equations:
        subset = physics_sr_df[physics_sr_df['equation_name'] == equation]
        if len(subset) > 0 and len(subset['n_samples'].unique()) > 1:
            means = subset.groupby('n_samples')['test_r2'].mean()
            style = styles.get(equation, {'marker': 'o', 'markersize': 8, 'linewidth': 2.0, 'color': 'gray'})
            ax2.plot(
                means.index,
                means.values,
                marker=style['marker'],
                markersize=style['markersize'],
                linewidth=style['linewidth'],
                color=style['color'],
                label=EQUATION_NAMES.get(equation, equation),
            )
    
    ax2.set_xlabel('Sample Size (n)')
    ax2.set_ylabel('Test R2 Score')
    ax2.set_title('Test R2 vs Sample Size')
    ax2.legend()
    ax2.set_ylim(0, 1.05)
    ax2.grid(True, alpha=0.3)
    
    plt.suptitle('Sample Size Sensitivity (Physics-SR v4.1)', fontsize=14, y=1.02)
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig12 = plot_sample_size_sensitivity(df, save_path=FIGURES_DIR / 'fig12_sample_size.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 13: NOISE ROBUSTNESS BY EQUATION TYPE
# ==============================================================================

def plot_noise_robustness_by_type(df: pd.DataFrame, save_path: Optional[Path] = None) -> plt.Figure:
    """
    Compare noise robustness across equation types (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, axes = plt.subplots(1, 2, figsize=FIGURE_SIZES['wide'])
    
    eq_types = [t for t in core_df['eq_type'].unique() if pd.notna(t)]
    
    ax1 = axes[0]
    for eq_type in eq_types:
        subset = core_df[(core_df['eq_type'] == eq_type) & (core_df['method'] == 'physics_sr')]
        if len(subset) > 0:
            means = subset.groupby('noise_level')['var_f1'].mean()
            ax1.plot(
                means.index * 100,
                means.values,
                marker='o',
                markersize=8,
                linewidth=2,
                label=EQUATION_TYPE_NAMES.get(eq_type, eq_type),
            )
    
    ax1.set_xlabel('Noise Level (%)')
    ax1.set_ylabel('Variable Selection F1 Score')
    ax1.set_title('Physics-SR Noise Robustness by Equation Type')
    ax1.legend()
    ax1.set_ylim(0, 1.05)
    ax1.grid(True, alpha=0.3)
    
    ax2 = axes[1]
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) > 0:
            means = subset.groupby('noise_level')['var_f1'].mean()
            ax2.plot(
                means.index * 100,
                means.values,
                marker='o',
                markersize=8,
                linewidth=2,
                color=METHOD_COLORS[method],
                label=METHOD_NAMES[method],
            )
    
    ax2.set_xlabel('Noise Level (%)')
    ax2.set_ylabel('Variable Selection F1 Score')
    ax2.set_title('Method Comparison: Noise Robustness')
    ax2.legend()
    ax2.set_ylim(0, 1.05)
    ax2.grid(True, alpha=0.3)
    
    plt.suptitle('Noise Robustness Analysis (v4.1)', fontsize=14, y=1.02)
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig13 = plot_noise_robustness_by_type(df, save_path=FIGURES_DIR / 'fig13_noise_robustness.png')
plt.show()

---
## Section 7: LaTeX Tables

In [None]:
# ==============================================================================
# TABLE 1: MAIN RESULTS SUMMARY
# ==============================================================================

def generate_main_results_table(df: pd.DataFrame, save_path: Optional[Path] = None) -> str:
    """
    Generate LaTeX table for main results (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Main Benchmark Results (v4.1 - Core Experiments)}
\label{tab:main_results}
\begin{tabular}{lccc}
\toprule
Method & Variable Selection F1 & Test $R^2$ & Runtime (s) \\
\midrule
"""
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        
        f1_mean = subset['var_f1'].mean()
        f1_std = subset['var_f1'].std()
        r2_mean = subset['test_r2'].mean()
        r2_std = subset['test_r2'].std()
        rt_mean = subset['runtime_seconds'].mean()
        rt_std = subset['runtime_seconds'].std()
        
        method_display = METHOD_NAMES.get(method, method)
        
        latex += f"{method_display} & {f1_mean:.3f} $\\pm$ {f1_std:.3f} & "
        latex += f"{r2_mean:.3f} $\\pm$ {r2_std:.3f} & "
        latex += f"{rt_mean:.1f} $\\pm$ {rt_std:.1f} \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    if save_path:
        with open(save_path, 'w') as f:
            f.write(latex)
        print(f"Table saved to {save_path}")
    
    return latex


table1_latex = generate_main_results_table(df, save_path=TABLES_DIR / 'table1_main_results.tex')
print("TABLE 1: Main Results Summary (v4.1)")
print("=" * 60)
print(table1_latex)

In [None]:
# ==============================================================================
# TABLE 2: VARIABLE SELECTION METRICS BY METHOD
# ==============================================================================

def generate_variable_selection_table(df: pd.DataFrame, save_path: Optional[Path] = None) -> str:
    """
    Generate LaTeX table for variable selection metrics (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Variable Selection Performance (v4.1)}
\label{tab:var_selection}
\begin{tabular}{lcccc}
\toprule
Method & Precision & Recall & F1 & Exact Match \\
\midrule
"""
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        prec = subset['var_precision'].mean()
        rec = subset['var_recall'].mean()
        f1 = subset['var_f1'].mean()
        exact = subset['selected_correct'].mean() * 100
        
        method_display = METHOD_NAMES.get(method, method)
        latex += f"{method_display} & {prec:.3f} & {rec:.3f} & {f1:.3f} & {exact:.1f}\\% \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    if save_path:
        with open(save_path, 'w') as f:
            f.write(latex)
        print(f"Table saved to {save_path}")
    
    return latex


table2_latex = generate_variable_selection_table(df, save_path=TABLES_DIR / 'table2_var_selection.tex')
print("\nTABLE 2: Variable Selection Metrics (v4.1)")
print("=" * 60)
print(table2_latex)

In [None]:
# ==============================================================================
# TABLE 3: PREDICTION ACCURACY BY EQUATION
# ==============================================================================

def generate_prediction_table(df: pd.DataFrame, save_path: Optional[Path] = None) -> str:
    """
    Generate LaTeX table for prediction accuracy by equation (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Prediction Accuracy by Equation (v4.1)}
\label{tab:prediction}
\begin{tabular}{llccc}
\toprule
Equation & AI Feynman ID & Physics-SR & PySR-Only & LASSO+PySR \\
\midrule
"""
    
    for eq_name in get_available_equations(core_df):
        ai_id = AI_FEYNMAN_IDS.get(eq_name, 'Unknown')
        eq_display = EQUATION_NAMES.get(eq_name, eq_name)
        
        r2_values = []
        for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
            subset = core_df[(core_df['equation_name'] == eq_name) & (core_df['method'] == method)]
            if len(subset) > 0:
                r2_values.append(f"{subset['test_r2'].mean():.3f}")
            else:
                r2_values.append("--")
        
        latex += f"{eq_display} & {ai_id} & {r2_values[0]} & {r2_values[1]} & {r2_values[2]} \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    if save_path:
        with open(save_path, 'w') as f:
            f.write(latex)
        print(f"Table saved to {save_path}")
    
    return latex


table3_latex = generate_prediction_table(df, save_path=TABLES_DIR / 'table3_prediction.tex')
print("\nTABLE 3: Prediction Accuracy by Equation (v4.1)")
print("=" * 60)
print(table3_latex)

In [None]:
# ==============================================================================
# TABLE 4: LIBRARY COMPOSITION (v4.1 NEW)
# ==============================================================================

def generate_library_table(df: pd.DataFrame, save_path: Optional[Path] = None) -> str:
    """
    Generate LaTeX table for library composition analysis (v4.1 NEW).
    """
    physics_sr = df[df['method'] == 'physics_sr'].copy()
    
    if 'library_n_pysr' not in physics_sr.columns:
        print("Warning: Library composition data not available.")
        return "% Library composition data not available"
    
    grouped = physics_sr.groupby('equation_name').agg({
        'library_n_total': 'mean',
        'library_n_pysr': 'mean',
        'library_n_variant': 'mean',
        'library_n_poly': 'mean',
        'library_n_op': 'mean',
        'selected_from_pysr': 'mean',
        'selected_from_poly': 'mean',
    }).round(1)
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{v4.0 Augmented Library Composition and Selection Analysis}
\label{tab:library}
\begin{tabular}{l|cccc|c|cc}
\toprule
Equation & [PySR] & [Var] & [Poly] & [Op] & Total & Sel. PySR & Sel. Poly \\
\midrule
"""
    
    for eq_name in get_available_equations(df):
        if eq_name not in grouped.index:
            continue
        row = grouped.loc[eq_name]
        eq_display = EQUATION_NAMES.get(eq_name, eq_name)
        
        latex += f"{eq_display} & "
        latex += f"{row.get('library_n_pysr', 0):.0f} & "
        latex += f"{row.get('library_n_variant', 0):.0f} & "
        latex += f"{row.get('library_n_poly', 0):.0f} & "
        latex += f"{row.get('library_n_op', 0):.0f} & "
        latex += f"{row.get('library_n_total', 0):.0f} & "
        latex += f"{row.get('selected_from_pysr', 0):.1f} & "
        latex += f"{row.get('selected_from_poly', 0):.1f} \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}
"""
    
    if save_path:
        with open(save_path, 'w') as f:
            f.write(latex)
        print(f"Table saved to {save_path}")
    
    return latex


table4_latex = generate_library_table(df, save_path=TABLES_DIR / 'table4_library.tex')
print("\nTABLE 4: Library Composition (v4.1 NEW)")
print("=" * 60)
print(table4_latex)

---
## Section 8: Statistical Analysis

In [None]:
# ==============================================================================
# PAIRED COMPARISON TESTS
# ==============================================================================

def compute_paired_comparisons(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Perform paired comparison tests between Physics-SR and baselines (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    results = {}
    
    equations = get_available_equations(core_df)
    
    for baseline in ['pysr_only', 'lasso_pysr']:
        physics_sr_scores = []
        baseline_scores = []
        
        for eq in equations:
            ps = core_df[(core_df['method'] == 'physics_sr') & (core_df['equation_name'] == eq)]['var_f1'].mean()
            bl = core_df[(core_df['method'] == baseline) & (core_df['equation_name'] == eq)]['var_f1'].mean()
            if not np.isnan(ps) and not np.isnan(bl):
                physics_sr_scores.append(ps)
                baseline_scores.append(bl)
        
        if len(physics_sr_scores) > 1:
            t_stat, p_value = stats.ttest_rel(physics_sr_scores, baseline_scores)
            
            try:
                wilcoxon_stat, wilcoxon_p = stats.wilcoxon(physics_sr_scores, baseline_scores)
            except:
                wilcoxon_stat, wilcoxon_p = np.nan, np.nan
            
            diff = np.array(physics_sr_scores) - np.array(baseline_scores)
            mean_diff = np.mean(diff)
            ci_low, ci_high = stats.t.interval(0.95, len(diff)-1, loc=mean_diff, scale=stats.sem(diff))
            
            results[f'physics_sr_vs_{baseline}'] = {
                'paired_t_stat': t_stat,
                'paired_t_pvalue': p_value,
                'wilcoxon_stat': wilcoxon_stat,
                'wilcoxon_pvalue': wilcoxon_p,
                'mean_difference': mean_diff,
                'ci_95_low': ci_low,
                'ci_95_high': ci_high,
            }
    
    return results


paired_tests = compute_paired_comparisons(df)
print("\n" + "=" * 80)
print("PAIRED COMPARISON TESTS (v4.1)")
print("=" * 80)

for comparison, result in paired_tests.items():
    print(f"\n{comparison.replace('_', ' ').title()}:")
    print(f"  Paired t-test: t={result['paired_t_stat']:.4f}, p={result['paired_t_pvalue']:.6f}")
    print(f"  Wilcoxon test: W={result['wilcoxon_stat']:.4f}, p={result['wilcoxon_pvalue']:.6f}")
    print(f"  Mean difference: {result['mean_difference']:.4f}")
    print(f"  95% CI: [{result['ci_95_low']:.4f}, {result['ci_95_high']:.4f}]")

In [None]:
# ==============================================================================
# EFFECT SIZE CALCULATIONS
# ==============================================================================

def compute_effect_sizes(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate Cohen's d effect sizes for all pairwise comparisons (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    methods = [m for m in methods if m in core_df['method'].unique()]
    
    effect_sizes = []
    
    for i, m1 in enumerate(methods):
        for m2 in methods[i+1:]:
            scores1 = core_df[core_df['method'] == m1]['var_f1'].values
            scores2 = core_df[core_df['method'] == m2]['var_f1'].values
            
            pooled_std = np.sqrt((scores1.std()**2 + scores2.std()**2) / 2)
            cohens_d = (scores1.mean() - scores2.mean()) / pooled_std if pooled_std > 0 else 0
            
            interpretation = 'negligible'
            if abs(cohens_d) >= 0.8:
                interpretation = 'large'
            elif abs(cohens_d) >= 0.5:
                interpretation = 'medium'
            elif abs(cohens_d) >= 0.2:
                interpretation = 'small'
            
            effect_sizes.append({
                'Comparison': f"{METHOD_NAMES[m1]} vs {METHOD_NAMES[m2]}",
                'Mean1': f"{scores1.mean():.3f}",
                'Mean2': f"{scores2.mean():.3f}",
                'Cohen\'s d': f"{cohens_d:.3f}",
                'Interpretation': interpretation,
            })
    
    return pd.DataFrame(effect_sizes)


effect_size_df = compute_effect_sizes(df)
print("\n" + "=" * 80)
print("EFFECT SIZE ANALYSIS (Cohen's d)")
print("=" * 80)
display(effect_size_df)

---
## Section 9: Conclusions

In [None]:
# ==============================================================================
# KEY FINDINGS SUMMARY (v4.1)
# ==============================================================================

def generate_key_findings(df: pd.DataFrame) -> None:
    """
    Generate key findings summary (v4.1).
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    print("=" * 70)
    print(" KEY FINDINGS SUMMARY (v4.1)")
    print("=" * 70)
    print()
    
    # 1. Overall performance
    print("1. OVERALL PERFORMANCE")
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) > 0:
            print(f"   {METHOD_NAMES[method]:15s}: F1={subset['var_f1'].mean():.3f}, R2={subset['test_r2'].mean():.3f}")
    print()
    
    # 2. Best method
    best_method_f1 = core_df.groupby('method')['var_f1'].mean().idxmax()
    best_f1 = core_df.groupby('method')['var_f1'].mean().max()
    print(f"2. BEST METHOD (by F1): {METHOD_NAMES[best_method_f1]} ({best_f1:.3f})")
    print()
    
    # 3. Dimensional analysis benefit
    physics_sr = core_df[core_df['method'] == 'physics_sr']
    if len(physics_sr) > 0:
        with_dims = physics_sr[physics_sr['with_dims'] == True]['var_f1'].mean()
        without_dims = physics_sr[physics_sr['with_dims'] == False]['var_f1'].mean()
        improvement = with_dims - without_dims
        print(f"3. DIMENSIONAL ANALYSIS BENEFIT")
        print(f"   With dimensions:    F1={with_dims:.3f}")
        print(f"   Without dimensions: F1={without_dims:.3f}")
        print(f"   Improvement:        +{improvement:.3f}")
        print()
    
    # 4. Noise robustness
    print("4. NOISE ROBUSTNESS (0% -> 5% noise degradation)")
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) > 0:
            f1_0 = subset[subset['noise_level'] == 0.0]['var_f1'].mean()
            f1_5 = subset[subset['noise_level'] == 0.05]['var_f1'].mean()
            degradation = f1_0 - f1_5
            print(f"   {METHOD_NAMES[method]:15s}: {degradation:+.3f} (F1 drop)")
    print()
    
    # 5. Runtime comparison
    print("5. COMPUTATIONAL EFFICIENCY")
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) > 0:
            rt = subset['runtime_seconds'].mean()
            print(f"   {METHOD_NAMES[method]:15s}: {rt:.1f}s average")
    print()
    
    # 6. v4.0 Library benefit (if available)
    if 'library_n_pysr' in df.columns:
        print("6. v4.0 STRUCTURE-GUIDED LIBRARY BENEFIT")
        physics_sr = df[df['method'] == 'physics_sr']
        if len(physics_sr) > 0:
            avg_pysr = physics_sr['library_n_pysr'].mean()
            avg_poly = physics_sr['library_n_poly'].mean()
            avg_total = physics_sr['library_n_total'].mean()
            print(f"   Average PySR terms:      {avg_pysr:.1f}")
            print(f"   Average Polynomial terms: {avg_poly:.1f}")
            print(f"   Average Total library:   {avg_total:.1f}")
        print()
    
    print("=" * 70)


generate_key_findings(df)

In [None]:
# ==============================================================================
# FINAL OUTPUT SUMMARY
# ==============================================================================

print("=" * 70)
print(" ANALYSIS COMPLETE (v4.1)")
print("=" * 70)
print()
print("OUTPUT FILES GENERATED:")
print()

# List figures
print("FIGURES:")
for fig_file in sorted(FIGURES_DIR.glob('*.png')):
    print(f"  - {fig_file}")
print()

# List tables
print("TABLES:")
for table_file in sorted(TABLES_DIR.glob('*.tex')):
    print(f"  - {table_file}")
print()

print("=" * 70)
print(" Ready for publication!")
print("=" * 70)

---
## Module Summary

### Analysis.ipynb v4.1 - Complete

**Sections:**
1. Header and Imports
2. Load Results (experiment_results.csv/pkl)
3. Summary Statistics (method comparison, equation comparison, statistical tests)
4. Core Visualizations (Figures 1-7)
5. v4.1 Library Analysis (Figures 8-11 - NEW)
6. Supplementary Visualizations (Figures 12-13)
7. LaTeX Tables (4 tables including library composition)
8. Statistical Analysis (paired tests, effect sizes)
9. Conclusions (key findings summary)

**v4.1 New Features:**
- AI Feynman ID tracking for all equations
- Library composition visualizations (Figures 8-11)
- Selection source analysis
- Timing profile breakdown
- Table 4: Library Composition LaTeX table

**Output Files:**
- 13 PNG figures in results/figures/
- 4 LaTeX tables in results/tables/