# Analysis - Physics-SR Framework v3.0 Benchmark

## Results Analysis and Visualization Module

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

This notebook analyzes and visualizes the benchmark results from Experiments.ipynb:

1. **Summary Statistics**: Overall, by-factor, and by-equation-type performance metrics
2. **Core Visualizations**: 8 figures for main results
3. **Supplementary Visualizations**: 3 figures for additional analysis
4. **LaTeX Tables**: Publication-ready tables for academic papers
5. **Statistical Tests**: Significance testing between methods

### Input Files

- `results/experiment_results.csv`: Main results table
- `results/experiment_results_full.pkl`: Detailed results with nested data

### Output Files

- `results/figures/*.png`: 11 visualization figures
- `results/tables/*.tex`: 5 LaTeX tables
- `results/summary_table.csv`: Final summary table

### Test Equations

| # | Name | Equation | Type | Category |
|---|------|----------|------|----------|
| 1 | Coulomb | F = k * q1 * q2 / r^2 | Rational | Power-Law |
| 2 | Newton | F = G * m1 * m2 / r^2 | Rational | Power-Law |
| 3 | Ideal Gas | P = n * R * T / V | Rational | Power-Law |
| 4 | Damped | x = A * exp(-b*t) * cos(omega*t) | Nested | Transcendental |

---
## Section 1: Header and Imports

In [None]:
# ==============================================================================
# ENVIRONMENT RESET AND FRESH CLONE
# ==============================================================================

import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    import os
    import shutil
    import gc
    
    repo_path = '/content/Physics-Informed-Symbolic-Regression'
    current_dir = os.getcwd()
    
    if not current_dir.startswith(repo_path) or not os.path.exists(repo_path + '/.git'):
        os.chdir('/content')
        gc.collect()
        
        if os.path.exists(repo_path):
            shutil.rmtree(repo_path)
            print("[OK] Removed existing repository.")
        
        !git clone https://github.com/Garthzzz/Physics-Informed-Symbolic-Regression.git
        
        if os.path.exists(repo_path + '/.git'):
            os.chdir(repo_path + '/benchmark')
            print(f"[OK] Working directory: {os.getcwd()}")
        else:
            print("[FAIL] Clone incomplete!")
        
        print("[OK] Environment reset complete.")
    else:
        print("[SKIP] Already in valid repository.")
else:
    print("[INFO] Not in Colab environment.")

In [None]:
"""
Analysis.ipynb - Results Analysis and Visualization Module
===========================================================

Physics-SR Framework v3.0 Benchmark Suite

This module provides:
- Summary statistics computation
- Visualization functions for benchmark results
- LaTeX table generation for publications
- Statistical significance testing

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Standard library imports
import os
import sys
import pickle
import warnings
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Any, Union

# Scientific computing
import numpy as np
import pandas as pd
from scipy import stats

# Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print("Analysis: All imports successful.")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")

In [None]:
# ==============================================================================
# PATH CONFIGURATION
# ==============================================================================

# Determine paths based on environment
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    # Colab paths
    BASE_DIR = Path('/content/Physics-Informed-Symbolic-Regression')
    BENCHMARK_DIR = BASE_DIR / 'benchmark'
else:
    # Local paths
    BENCHMARK_DIR = Path('.').resolve()
    BASE_DIR = BENCHMARK_DIR.parent

RESULTS_DIR = BENCHMARK_DIR / 'results'
FIGURES_DIR = RESULTS_DIR / 'figures'
TABLES_DIR = RESULTS_DIR / 'tables'
DATA_DIR = BENCHMARK_DIR / 'data'

# Create directories if needed
RESULTS_DIR.mkdir(exist_ok=True, parents=True)
FIGURES_DIR.mkdir(exist_ok=True, parents=True)
TABLES_DIR.mkdir(exist_ok=True, parents=True)

print(f"Environment: {'Google Colab' if IN_COLAB else 'Local'}")
print(f"Base directory: {BASE_DIR}")
print(f"Benchmark directory: {BENCHMARK_DIR}")
print(f"Results directory: {RESULTS_DIR}")
print(f"Figures directory: {FIGURES_DIR}")
print(f"Tables directory: {TABLES_DIR}")

In [None]:
# ==============================================================================
# PLOTTING CONFIGURATION
# ==============================================================================

# Set style
plt.style.use('seaborn-v0_8-whitegrid')

# Color palette for methods
METHOD_COLORS = {
    'physics_sr': '#2E86AB',    # Blue
    'pysr_only': '#E94F37',     # Red
    'lasso_pysr': '#F39C12',    # Orange
}

# Display names for methods
METHOD_NAMES = {
    'physics_sr': 'Physics-SR',
    'pysr_only': 'PySR-Only',
    'lasso_pysr': 'LASSO+PySR',
}

# Display names for equations
EQUATION_NAMES = {
    'coulomb': 'Coulomb',
    'newton': 'Newton',
    'ideal_gas': 'Ideal Gas',
    'damped': 'Damped Osc.',
}

# Display names for equation types
EQUATION_TYPE_NAMES = {
    'power_law': 'Power-Law',
    'nested_transcendental': 'Nested Transcendental',
}

# Figure size defaults
FIGURE_SIZES = {
    'single': (8, 6),
    'wide': (12, 6),
    'tall': (8, 10),
    'square': (8, 8),
    'large': (14, 10),
}

# DPI for saved figures
FIGURE_DPI = 300

# Font sizes
FONTSIZE_TITLE = 14
FONTSIZE_LABEL = 12
FONTSIZE_TICK = 10
FONTSIZE_LEGEND = 10
FONTSIZE_ANNOTATION = 8

# Set default font sizes
plt.rcParams.update({
    'font.size': FONTSIZE_TICK,
    'axes.titlesize': FONTSIZE_TITLE,
    'axes.labelsize': FONTSIZE_LABEL,
    'xtick.labelsize': FONTSIZE_TICK,
    'ytick.labelsize': FONTSIZE_TICK,
    'legend.fontsize': FONTSIZE_LEGEND,
    'figure.titlesize': FONTSIZE_TITLE,
})

print("Plotting configuration set.")
print(f"Method colors: {METHOD_COLORS}")

---
## Section 2: Load Results

In [None]:
# ==============================================================================
# LOAD EXPERIMENT RESULTS (CSV)
# ==============================================================================

def load_results_csv(filepath: Path = None) -> pd.DataFrame:
    """
    Load experiment results from CSV file.
    
    Parameters
    ----------
    filepath : Path, optional
        Path to CSV file. Defaults to results/experiment_results.csv
    
    Returns
    -------
    pd.DataFrame
        Results DataFrame
    
    Raises
    ------
    FileNotFoundError
        If results file does not exist
    """
    if filepath is None:
        filepath = RESULTS_DIR / 'experiment_results.csv'
    
    if not filepath.exists():
        raise FileNotFoundError(
            f"Results file not found: {filepath}\n"
            f"Please run Experiments.ipynb first to generate results."
        )
    
    df = pd.read_csv(filepath)
    
    # Convert types
    if 'with_dims' in df.columns:
        df['with_dims'] = df['with_dims'].astype(bool)
    if 'selected_correct' in df.columns:
        df['selected_correct'] = df['selected_correct'].astype(bool)
    if 'success' in df.columns:
        df['success'] = df['success'].astype(bool)
    
    print(f"Loaded {len(df)} experiment results from {filepath}")
    return df


print("load_results_csv() defined.")

In [None]:
# ==============================================================================
# LOAD EXPERIMENT RESULTS (PKL)
# ==============================================================================

def load_results_pkl(filepath: Path = None) -> Optional[Dict[str, Any]]:
    """
    Load detailed experiment results from PKL file.
    
    Parameters
    ----------
    filepath : Path, optional
        Path to PKL file. Defaults to results/experiment_results_full.pkl
    
    Returns
    -------
    Optional[Dict[str, Any]]
        Detailed results or None if file not found or cannot be loaded
    
    Notes
    -----
    The PKL file may contain custom classes (e.g., ExperimentResult) that
    are not defined in this notebook. In such cases, the function returns
    None and prints a warning. The CSV file contains all necessary data
    for analysis.
    """
    if filepath is None:
        filepath = RESULTS_DIR / 'experiment_results_full.pkl'
    
    if not filepath.exists():
        print(f"Warning: PKL file not found: {filepath}")
        return None
    
    try:
        with open(filepath, 'rb') as f:
            results = pickle.load(f)
        print(f"Loaded detailed results from {filepath}")
        return results
    except (AttributeError, ModuleNotFoundError) as e:
        print(f"Warning: Cannot load PKL file (missing class definition): {e}")
        print("Continuing with CSV data only. This is sufficient for all analyses.")
        return None


print("load_results_pkl() defined.")

In [None]:
# ==============================================================================
# DATA VALIDATION
# ==============================================================================

def validate_results(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Validate results DataFrame and return summary.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    
    Returns
    -------
    Dict[str, Any]
        Validation summary
    """
    validation = {
        'n_experiments': len(df),
        'n_successful': df['success'].sum() if 'success' in df.columns else len(df),
        'success_rate': df['success'].mean() * 100 if 'success' in df.columns else 100.0,
        'methods': df['method'].unique().tolist(),
        'equations': df['equation_name'].unique().tolist(),
        'noise_levels': sorted(df['noise_level'].unique().tolist()),
        'dummy_counts': sorted(df['n_dummy'].unique().tolist()),
        'sample_sizes': sorted(df['n_samples'].unique().tolist()) if 'n_samples' in df.columns else [500],
    }
    return validation


def get_available_equations(df: pd.DataFrame) -> List[str]:
    """
    Get list of equations available in DataFrame, ordered consistently.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    
    Returns
    -------
    List[str]
        List of equation names in preferred order
    """
    preferred_order = ['coulomb', 'newton', 'ideal_gas', 'damped']
    available = df['equation_name'].unique().tolist()
    # Return in preferred order, only including those that exist
    return [eq for eq in preferred_order if eq in available] + \
           [eq for eq in available if eq not in preferred_order]


print("validate_results() defined.")
print("get_available_equations() defined.")

In [None]:
# ==============================================================================
# LOAD DATA
# ==============================================================================

# Load results
df = load_results_csv()
print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

# Load detailed results
detailed_results = load_results_pkl()

# Validate
validation = validate_results(df)
print("\n" + "="*60)
print("DATA VALIDATION SUMMARY")
print("="*60)
print(f"Total experiments: {validation['n_experiments']}")
print(f"Successful experiments: {validation['n_successful']}")
print(f"Success rate: {validation['success_rate']:.1f}%")
print(f"Methods: {validation['methods']}")
print(f"Equations: {validation['equations']}")
print(f"Noise levels: {validation['noise_levels']}")
print(f"Dummy counts: {validation['dummy_counts']}")
print(f"Sample sizes: {validation['sample_sizes']}")

In [None]:
# ==============================================================================
# ADD EQUATION TYPE COLUMN
# ==============================================================================

if 'eq_type' not in df.columns:
    df['eq_type'] = df['equation_name'].map({
        'coulomb': 'power_law',
        'newton': 'power_law',
        'ideal_gas': 'power_law',
        'damped': 'nested_transcendental'
    })
    print("Added eq_type column to DataFrame.")

print("\nEquation type distribution:")
print(df['eq_type'].value_counts())

print("\nFirst 5 rows:")
display(df.head())

---
## Section 3: Summary Statistics

In [None]:
# ==============================================================================
# METHOD COMPARISON
# ==============================================================================

def compute_method_comparison(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute performance comparison by method.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    
    Returns
    -------
    pd.DataFrame
        Method comparison table
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    comparison = []
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        method_df = core_df[core_df['method'] == method]
        if len(method_df) == 0:
            continue
        
        row = {
            'Method': METHOD_NAMES.get(method, method),
            'N': len(method_df),
            'F1': f"{method_df['var_f1'].mean():.3f} +/- {method_df['var_f1'].std():.3f}",
            'Precision': f"{method_df['var_precision'].mean():.3f} +/- {method_df['var_precision'].std():.3f}",
            'Recall': f"{method_df['var_recall'].mean():.3f} +/- {method_df['var_recall'].std():.3f}",
            'Test R2': f"{method_df['test_r2'].mean():.3f} +/- {method_df['test_r2'].std():.3f}",
            'Runtime (s)': f"{method_df['runtime_seconds'].mean():.1f} +/- {method_df['runtime_seconds'].std():.1f}",
            'Exact Match': f"{method_df['selected_correct'].mean()*100:.1f}%",
        }
        comparison.append(row)
    
    return pd.DataFrame(comparison)


# Compute and display
method_comparison = compute_method_comparison(df)
print("\n" + "="*100)
print("METHOD COMPARISON (All Equations)")
print("="*100)
display(method_comparison)

In [None]:
# ==============================================================================
# METHOD COMPARISON BY EQUATION TYPE
# ==============================================================================

def compute_method_comparison_by_type(df: pd.DataFrame, eq_type: str) -> pd.DataFrame:
    """
    Compute performance comparison by method for specific equation type.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    eq_type : str
        Equation type ('power_law' or 'nested_transcendental')
    
    Returns
    -------
    pd.DataFrame
        Method comparison table for specified equation type
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    type_df = core_df[core_df['eq_type'] == eq_type]
    
    comparison = []
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        method_df = type_df[type_df['method'] == method]
        if len(method_df) == 0:
            continue
        
        row = {
            'Method': METHOD_NAMES.get(method, method),
            'N': len(method_df),
            'F1': f"{method_df['var_f1'].mean():.3f} +/- {method_df['var_f1'].std():.3f}",
            'Precision': f"{method_df['var_precision'].mean():.3f}",
            'Recall': f"{method_df['var_recall'].mean():.3f}",
            'Test R2': f"{method_df['test_r2'].mean():.3f}",
            'Exact Match': f"{method_df['selected_correct'].mean()*100:.1f}%",
        }
        comparison.append(row)
    
    return pd.DataFrame(comparison)


# Display by type
print("\n" + "="*100)
print("METHOD COMPARISON (Power-Law Equations Only)")
print("="*100)
display(compute_method_comparison_by_type(df, 'power_law'))

print("\n" + "="*100)
print("METHOD COMPARISON (Nested Transcendental)")
print("="*100)
display(compute_method_comparison_by_type(df, 'nested_transcendental'))

In [None]:
# ==============================================================================
# BY-EQUATION COMPARISON
# ==============================================================================

def compute_equation_comparison(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute performance comparison by equation.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    
    Returns
    -------
    pd.DataFrame
        Equation comparison table
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    # Pivot table
    comparison = core_df.pivot_table(
        index='equation_name',
        columns='method',
        values=['var_f1', 'test_r2', 'selected_correct'],
        aggfunc='mean'
    ).round(3)
    
    return comparison


# Compute and display
equation_comparison = compute_equation_comparison(df)
print("\n" + "="*80)
print("PERFORMANCE BY EQUATION")
print("="*80)
display(equation_comparison)

In [None]:
# ==============================================================================
# STATISTICAL SIGNIFICANCE TESTS
# ==============================================================================

def compute_statistical_tests(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Perform statistical significance tests between methods.
    
    Parameters
    ----------
    df : pd.DataFrame
        Results DataFrame
    
    Returns
    -------
    Dict[str, Any]
        Test results
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    # Extract F1 scores by method
    physics_sr_f1 = core_df[core_df['method'] == 'physics_sr']['var_f1'].values
    pysr_only_f1 = core_df[core_df['method'] == 'pysr_only']['var_f1'].values
    lasso_pysr_f1 = core_df[core_df['method'] == 'lasso_pysr']['var_f1'].values
    
    results = {}
    
    # Physics-SR vs PySR-Only
    if len(physics_sr_f1) > 0 and len(pysr_only_f1) > 0:
        t_stat, p_value = stats.ttest_ind(physics_sr_f1, pysr_only_f1)
        results['physics_sr_vs_pysr_only'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
        }
    
    # Physics-SR vs LASSO+PySR
    if len(physics_sr_f1) > 0 and len(lasso_pysr_f1) > 0:
        t_stat, p_value = stats.ttest_ind(physics_sr_f1, lasso_pysr_f1)
        results['physics_sr_vs_lasso_pysr'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
        }
    
    # LASSO+PySR vs PySR-Only
    if len(lasso_pysr_f1) > 0 and len(pysr_only_f1) > 0:
        t_stat, p_value = stats.ttest_ind(lasso_pysr_f1, pysr_only_f1)
        results['lasso_pysr_vs_pysr_only'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
        }
    
    # ANOVA for overall comparison
    if len(physics_sr_f1) > 0 and len(pysr_only_f1) > 0 and len(lasso_pysr_f1) > 0:
        f_stat, p_value = stats.f_oneway(physics_sr_f1, pysr_only_f1, lasso_pysr_f1)
        results['anova'] = {
            'f_statistic': f_stat,
            'p_value': p_value,
            'significant': p_value < 0.05,
        }
    
    return results


# Compute and display
stat_tests = compute_statistical_tests(df)
print("\n" + "="*80)
print("STATISTICAL SIGNIFICANCE TESTS (Variable Selection F1)")
print("="*80)

if 'anova' in stat_tests:
    print("\n[ANOVA - Overall]")
    print(f"  F-statistic: {stat_tests['anova']['f_statistic']:.4f}")
    print(f"  p-value: {stat_tests['anova']['p_value']:.6f}")
    print(f"  Significant (p < 0.05): {stat_tests['anova']['significant']}")

print("\n[Pairwise t-tests]")
for comparison, result in stat_tests.items():
    if comparison != 'anova':
        print(f"\n  {comparison.replace('_', ' ').title()}:")
        print(f"    t-statistic: {result['t_statistic']:.4f}")
        print(f"    p-value: {result['p_value']:.6f}")
        print(f"    Significant (p < 0.05): {result['significant']}")

---
## Section 4: Core Result Visualizations

In [None]:
# ==============================================================================
# FIGURE 1: VARIABLE SELECTION F1 VS NOISE LEVEL
# ==============================================================================

def plot_f1_vs_noise(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Line plot showing F1 score vs noise level, grouped by method.
    """
    # Filter core experiments
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        means = subset.groupby('noise_level')['var_f1'].mean()
        stds = subset.groupby('noise_level')['var_f1'].std()
        
        ax.errorbar(
            means.index * 100,  # Convert to percentage
            means.values,
            yerr=stds.values,
            label=METHOD_NAMES[method],
            marker='o',
            markersize=8,
            capsize=5,
            color=METHOD_COLORS[method],
            linewidth=2,
        )
    
    ax.set_xlabel('Noise Level (%)')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection Performance vs Noise Level')
    ax.legend(loc='lower left')
    ax.set_ylim(0, 1.05)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


# Generate figure
fig1 = plot_f1_vs_noise(df, save_path=FIGURES_DIR / 'fig1_f1_vs_noise.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 2: VARIABLE SELECTION F1 VS DUMMY COUNT
# ==============================================================================

def plot_f1_vs_dummy(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Line plot showing F1 score vs dummy feature count, grouped by method.
    """
    # Filter core experiments
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        means = subset.groupby('n_dummy')['var_f1'].mean()
        stds = subset.groupby('n_dummy')['var_f1'].std()
        
        ax.errorbar(
            means.index,
            means.values,
            yerr=stds.values,
            label=METHOD_NAMES[method],
            marker='s',
            markersize=8,
            capsize=5,
            color=METHOD_COLORS[method],
            linewidth=2,
        )
    
    ax.set_xlabel('Number of Dummy Features')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection Performance vs Dummy Features')
    ax.legend(loc='lower left')
    ax.set_ylim(0, 1.05)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


# Generate figure
fig2 = plot_f1_vs_dummy(df, save_path=FIGURES_DIR / 'fig2_f1_vs_dummy.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 3: TEST R2 COMPARISON (BAR CHART)
# ==============================================================================

def plot_r2_comparison(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Grouped bar chart comparing Test R2 by method and equation.
    Uses dynamic equation list from data.
    """
    # Filter core experiments
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['wide'])
    
    # Get equations dynamically from data
    equations = get_available_equations(core_df)
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    x = np.arange(len(equations))
    width = 0.25
    
    for i, method in enumerate(methods):
        means = []
        stds = []
        for equation in equations:
            subset = core_df[(core_df['method'] == method) & (core_df['equation_name'] == equation)]
            if len(subset) > 0:
                means.append(subset['test_r2'].mean())
                stds.append(subset['test_r2'].std())
            else:
                means.append(0)
                stds.append(0)
        
        ax.bar(
            x + i * width - width,
            means,
            width,
            yerr=stds,
            label=METHOD_NAMES[method],
            color=METHOD_COLORS[method],
            capsize=3,
            alpha=0.8,
        )
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Test R$^2$')
    ax.set_title('Prediction Accuracy by Method and Equation')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.05)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


# Generate figure
fig3 = plot_r2_comparison(df, save_path=FIGURES_DIR / 'fig3_r2_comparison.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 4: F1 BY METHOD AND EQUATION
# ==============================================================================

def plot_f1_comparison(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Grouped bar chart comparing F1 by method and equation.
    Uses dynamic equation list from data.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['wide'])
    
    # Get equations dynamically from data
    equations = get_available_equations(core_df)
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    x = np.arange(len(equations))
    width = 0.25
    
    for i, method in enumerate(methods):
        means = []
        stds = []
        for equation in equations:
            subset = core_df[(core_df['method'] == method) & (core_df['equation_name'] == equation)]
            if len(subset) > 0:
                means.append(subset['var_f1'].mean())
                stds.append(subset['var_f1'].std())
            else:
                means.append(0)
                stds.append(0)
        
        ax.bar(
            x + i * width - width, means, width, yerr=stds,
            label=METHOD_NAMES[method], color=METHOD_COLORS[method],
            capsize=3, alpha=0.8,
        )
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection F1 by Method and Equation')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.15)
    ax.axhline(y=1.0, color='gray', linestyle='--', alpha=0.5)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig4 = plot_f1_comparison(df, save_path=FIGURES_DIR / 'fig4_f1_comparison.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 5: F1 BY EQUATION TYPE
# ==============================================================================

def plot_f1_by_equation_type(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Grouped bar chart comparing F1 by method and equation type.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    # Get equation types dynamically
    eq_types = [t for t in ['power_law', 'nested_transcendental'] if t in core_df['eq_type'].unique()]
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    x = np.arange(len(eq_types))
    width = 0.25
    
    for i, method in enumerate(methods):
        means = []
        stds = []
        for eq_type in eq_types:
            subset = core_df[(core_df['method'] == method) & (core_df['eq_type'] == eq_type)]
            if len(subset) > 0:
                means.append(subset['var_f1'].mean())
                stds.append(subset['var_f1'].std())
            else:
                means.append(0)
                stds.append(0)
        
        bars = ax.bar(
            x + i * width - width, means, width, yerr=stds,
            label=METHOD_NAMES[method], color=METHOD_COLORS[method],
            capsize=3, alpha=0.8,
        )
        
        # Add value annotations
        for bar, mean in zip(bars, means):
            if mean > 0:
                ax.annotate(
                    f'{mean:.2f}', xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
                    xytext=(0, 3), textcoords='offset points', ha='center', fontsize=8,
                )
    
    ax.set_xlabel('Equation Type')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Variable Selection F1 by Equation Type')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_TYPE_NAMES.get(t, t) for t in eq_types])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.2)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig5 = plot_f1_by_equation_type(df, save_path=FIGURES_DIR / 'fig5_f1_by_type.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 6: DIMENSION BENEFIT
# ==============================================================================

def plot_dims_benefit(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Grouped bar chart showing performance with vs without dimensional info.
    Uses dynamic equation list from data.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    physics_sr_df = core_df[core_df['method'] == 'physics_sr']
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    # Get equations dynamically
    equations = get_available_equations(physics_sr_df)
    x = np.arange(len(equations))
    width = 0.35
    
    with_dims_means = []
    without_dims_means = []
    for equation in equations:
        subset_with = physics_sr_df[(physics_sr_df['equation_name'] == equation) & (physics_sr_df['with_dims'] == True)]
        subset_without = physics_sr_df[(physics_sr_df['equation_name'] == equation) & (physics_sr_df['with_dims'] == False)]
        with_dims_means.append(subset_with['var_f1'].mean() if len(subset_with) > 0 else 0)
        without_dims_means.append(subset_without['var_f1'].mean() if len(subset_without) > 0 else 0)
    
    ax.bar(x - width/2, with_dims_means, width, label='With Dimensions', color='#2E86AB', alpha=0.8)
    ax.bar(x + width/2, without_dims_means, width, label='Without Dimensions', color='#E94F37', alpha=0.8)
    
    ax.set_xlabel('Equation')
    ax.set_ylabel('Variable Selection F1 Score')
    ax.set_title('Benefit of Dimensional Information (Physics-SR)')
    ax.set_xticks(x)
    ax.set_xticklabels([EQUATION_NAMES.get(eq, eq) for eq in equations])
    ax.legend(loc='lower right')
    ax.set_ylim(0, 1.15)
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig6 = plot_dims_benefit(df, save_path=FIGURES_DIR / 'fig6_dims_benefit.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 7: RUNTIME COMPARISON
# ==============================================================================

def plot_runtime_comparison(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Bar chart comparing runtime by method.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['single'])
    
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    # Filter to methods that exist in data
    methods = [m for m in methods if m in core_df['method'].unique()]
    x = np.arange(len(methods))
    
    means = []
    stds = []
    colors = []
    for method in methods:
        subset = core_df[core_df['method'] == method]
        means.append(subset['runtime_seconds'].mean())
        stds.append(subset['runtime_seconds'].std())
        colors.append(METHOD_COLORS[method])
    
    bars = ax.bar(x, means, yerr=stds, color=colors, capsize=5, alpha=0.8)
    
    # Add value annotations
    for bar, mean in zip(bars, means):
        ax.annotate(
            f'{mean:.1f}s', xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
            xytext=(0, 3), textcoords='offset points', ha='center', fontsize=10,
        )
    
    ax.set_xlabel('Method')
    ax.set_ylabel('Runtime (seconds)')
    ax.set_title('Computational Cost Comparison')
    ax.set_xticks(x)
    ax.set_xticklabels([METHOD_NAMES[m] for m in methods])
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig7 = plot_runtime_comparison(df, save_path=FIGURES_DIR / 'fig7_runtime.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 8: COMPREHENSIVE HEATMAP
# ==============================================================================

def plot_comprehensive_heatmap(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Heatmap showing F1 scores across all experimental conditions.
    """
    # Filter core experiments
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    # Create pivot table
    core_df['row_label'] = core_df['equation_name'].map(EQUATION_NAMES) + ' | ' + core_df['method'].map(METHOD_NAMES)
    core_df['col_label'] = (
        'Noise=' + (core_df['noise_level'] * 100).astype(int).astype(str) + '%, ' +
        'Dummy=' + core_df['n_dummy'].astype(str) + ', ' +
        'Dims=' + core_df['with_dims'].map({True: 'T', False: 'F'})
    )
    
    # Pivot
    pivot = core_df.pivot_table(
        index='row_label',
        columns='col_label',
        values='var_f1',
        aggfunc='mean'
    )
    
    # Sort rows by equation then method
    equations = [EQUATION_NAMES.get(eq, eq) for eq in get_available_equations(core_df)]
    methods = ['Physics-SR', 'PySR-Only', 'LASSO+PySR']
    row_order = [f"{eq} | {m}" for eq in equations for m in methods]
    pivot = pivot.reindex([r for r in row_order if r in pivot.index])
    
    # Sort columns
    col_order = sorted(pivot.columns)
    pivot = pivot[col_order]
    
    fig, ax = plt.subplots(figsize=FIGURE_SIZES['large'])
    
    # Create heatmap
    sns.heatmap(
        pivot,
        annot=True,
        fmt='.2f',
        cmap='RdYlGn',
        vmin=0,
        vmax=1,
        ax=ax,
        cbar_kws={'label': 'F1 Score'},
        annot_kws={'size': 8},
    )
    
    ax.set_xlabel('Experimental Condition')
    ax.set_ylabel('Equation | Method')
    ax.set_title('Variable Selection F1 Score Across All Conditions')
    
    # Rotate x labels
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig8 = plot_comprehensive_heatmap(df, save_path=FIGURES_DIR / 'fig8_heatmap.png')
plt.show()

---
## Section 5: Supplementary Visualizations

In [None]:
# ==============================================================================
# FIGURE 9: SAMPLE SIZE SENSITIVITY (PHYSICS-SR ONLY)
# ==============================================================================

def plot_sample_size_sensitivity(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Line plot showing performance vs sample size for Physics-SR.
    """
    # Filter Physics-SR only (relaxed filter to include all conditions)
    physics_sr_df = df[df['method'] == 'physics_sr'].copy()
    
    # Check if we have multiple sample sizes
    if 'n_samples' not in physics_sr_df.columns or len(physics_sr_df['n_samples'].unique()) < 2:
        print("Warning: Not enough sample size variation for sensitivity plot.")
        fig, ax = plt.subplots(figsize=FIGURE_SIZES['wide'])
        ax.text(0.5, 0.5, 'Insufficient sample size variation', ha='center', va='center', fontsize=14)
        ax.set_title('Sample Size Sensitivity (Not Available)')
        return fig
    
    fig, axes = plt.subplots(1, 2, figsize=FIGURE_SIZES['wide'])
    equations = ['coulomb', 'newton', 'ideal_gas', 'damped']
    
    # Different styles for each equation to maximize distinction
    styles = {
        'coulomb':   {'marker': 'o', 'markersize': 14, 'linewidth': 5.0, 'color': '#2E86AB'},  # Blue, large circle
        'newton':    {'marker': 's', 'markersize': 8,  'linewidth': 2.0, 'color': '#E94F37'},  # Red, medium square
        'ideal_gas': {'marker': '^', 'markersize': 6,  'linewidth': 1.5, 'color': '#4CAF50'},  # Green, triangle
        'damped':    {'marker': 'd', 'markersize': 7,  'linewidth': 1.5, 'color': '#9C27B0'},  # Purple, small diamond
    }
    
    # F1 vs sample size
    ax1 = axes[0]
    for equation in equations:
        subset = physics_sr_df[physics_sr_df['equation_name'] == equation]
        if len(subset) > 0 and len(subset['n_samples'].unique()) > 1:
            means = subset.groupby('n_samples')['var_f1'].mean()
            style = styles[equation]
            ax1.plot(
                means.index,
                means.values,
                marker=style['marker'],
                markersize=style['markersize'],
                linewidth=style['linewidth'],
                color=style['color'],
                label=EQUATION_NAMES.get(equation, equation),
            )
    
    ax1.set_xlabel('Sample Size (n)')
    ax1.set_ylabel('Variable Selection F1 Score')
    ax1.set_title('F1 Score vs Sample Size')
    ax1.legend()
    ax1.set_ylim(0, 1.05)
    ax1.grid(True, alpha=0.3)
    
    # R2 vs sample size
    ax2 = axes[1]
    for equation in equations:
        subset = physics_sr_df[physics_sr_df['equation_name'] == equation]
        if len(subset) > 0 and len(subset['n_samples'].unique()) > 1:
            # Filter out NaN/negative R2 values
            subset_clean = subset[subset['test_r2'] > 0]
            if len(subset_clean) > 0:
                means = subset_clean.groupby('n_samples')['test_r2'].mean()
                style = styles[equation]
                ax2.plot(
                    means.index,
                    means.values,
                    marker=style['marker'],
                    markersize=style['markersize'],
                    linewidth=style['linewidth'],
                    color=style['color'],
                    label=EQUATION_NAMES.get(equation, equation),
                )
    
    ax2.set_xlabel('Sample Size (n)')
    ax2.set_ylabel('Test R$^2$')
    ax2.set_title('Test R$^2$ vs Sample Size')
    ax2.legend()
    ax2.set_ylim(0, 1.05)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig9 = plot_sample_size_sensitivity(df, save_path=FIGURES_DIR / 'fig9_sample_size.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 10: PER-EQUATION BREAKDOWN
# ==============================================================================

def plot_per_equation_breakdown(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Multi-panel figure showing detailed results per equation.
    Uses dynamic equation list from data.
    """
    # Filter core experiments
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    # Get equations dynamically
    equations = get_available_equations(core_df)
    n_equations = len(equations)
    
    # Determine grid size
    n_cols = min(2, n_equations)
    n_rows = (n_equations + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(6*n_cols, 5*n_rows))
    if n_equations == 1:
        axes = [axes]
    else:
        axes = axes.flatten()
    
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    
    for idx, equation in enumerate(equations):
        ax = axes[idx]
        
        # Filter by equation
        eq_df = core_df[core_df['equation_name'] == equation]
        
        # Box plot of F1 scores
        data_for_boxplot = []
        labels = []
        colors = []
        for m in methods:
            method_data = eq_df[eq_df['method'] == m]['var_f1'].values
            if len(method_data) > 0:
                data_for_boxplot.append(method_data)
                labels.append(METHOD_NAMES[m])
                colors.append(METHOD_COLORS[m])
        
        if len(data_for_boxplot) > 0:
            bp = ax.boxplot(
                data_for_boxplot,
                labels=labels,
                patch_artist=True,
            )
            
            # Color boxes
            for patch, color in zip(bp['boxes'], colors):
                patch.set_facecolor(color)
                patch.set_alpha(0.7)
            
            # Add mean markers
            for i, data in enumerate(data_for_boxplot, 1):
                ax.scatter(i, np.mean(data), color='black', marker='D', s=50, zorder=3)
        
        ax.set_ylabel('Variable Selection F1')
        ax.set_title(f'{EQUATION_NAMES.get(equation, equation)}')
        ax.set_ylim(0, 1.1)
        ax.grid(True, alpha=0.3, axis='y')
    
    # Hide empty subplots
    for idx in range(n_equations, len(axes)):
        axes[idx].set_visible(False)
    
    plt.suptitle('Variable Selection Performance by Equation', fontsize=FONTSIZE_TITLE + 2, y=1.02)
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig10 = plot_per_equation_breakdown(df, save_path=FIGURES_DIR / 'fig10_per_equation.png')
plt.show()

In [None]:
# ==============================================================================
# FIGURE 11: EXACT MATCH RATES
# ==============================================================================

def plot_exact_match_rates(
    df: pd.DataFrame,
    save_path: Optional[Path] = None
) -> plt.Figure:
    """
    Bar chart showing exact match rates by method.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    fig, axes = plt.subplots(1, 2, figsize=FIGURE_SIZES['wide'])
    methods = ['physics_sr', 'pysr_only', 'lasso_pysr']
    # Filter to methods that exist
    methods = [m for m in methods if m in core_df['method'].unique()]
    x = np.arange(len(methods))
    
    # All equations
    ax1 = axes[0]
    rates_all = [core_df[core_df['method'] == m]['selected_correct'].mean() * 100 for m in methods]
    
    bars1 = ax1.bar(x, rates_all, color=[METHOD_COLORS[m] for m in methods], alpha=0.8)
    ax1.set_title('All Equations')
    ax1.set_ylabel('Exact Match Rate (%)')
    ax1.set_xticks(x)
    ax1.set_xticklabels([METHOD_NAMES[m] for m in methods])
    ax1.set_ylim(0, 110)
    ax1.grid(True, alpha=0.3, axis='y')
    
    for bar, rate in zip(bars1, rates_all):
        ax1.annotate(f'{rate:.1f}%', xy=(bar.get_x() + bar.get_width()/2, bar.get_height()),
                     xytext=(0, 3), textcoords='offset points', ha='center', fontweight='bold')
    
    # Power-law only
    ax2 = axes[1]
    power_law_df = core_df[core_df['eq_type'] == 'power_law']
    if len(power_law_df) > 0:
        rates_pl = [power_law_df[power_law_df['method'] == m]['selected_correct'].mean() * 100 
                    if len(power_law_df[power_law_df['method'] == m]) > 0 else 0 
                    for m in methods]
        
        bars2 = ax2.bar(x, rates_pl, color=[METHOD_COLORS[m] for m in methods], alpha=0.8)
        ax2.set_title('Power-Law Equations Only')
        ax2.set_ylabel('Exact Match Rate (%)')
        ax2.set_xticks(x)
        ax2.set_xticklabels([METHOD_NAMES[m] for m in methods])
        ax2.set_ylim(0, 110)
        ax2.axhline(y=100, color='gray', linestyle='--', alpha=0.5)
        ax2.grid(True, alpha=0.3, axis='y')
        
        for bar, rate in zip(bars2, rates_pl):
            ax2.annotate(f'{rate:.1f}%', xy=(bar.get_x() + bar.get_width()/2, bar.get_height()),
                         xytext=(0, 3), textcoords='offset points', ha='center', fontweight='bold')
    
    plt.suptitle('Exact Variable Selection Match Rates', fontsize=16, y=1.02)
    plt.tight_layout()
    
    if save_path:
        fig.savefig(save_path, dpi=FIGURE_DPI, bbox_inches='tight')
        print(f"Figure saved to {save_path}")
    
    return fig


fig11 = plot_exact_match_rates(df, save_path=FIGURES_DIR / 'fig11_exact_match.png')
plt.show()

---
## Section 6: LaTeX Tables

In [None]:
# ==============================================================================
# TABLE 1: MAIN RESULTS SUMMARY
# ==============================================================================

def generate_main_results_table(df: pd.DataFrame) -> str:
    """
    Generate LaTeX table for main results.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Main Benchmark Results (Core Experiments)}
\label{tab:main_results}
\begin{tabular}{lccc}
\toprule
Method & Variable Selection F1 & Test $R^2$ & Runtime (s) \\
\midrule
"""
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        
        f1_mean = subset['var_f1'].mean()
        f1_std = subset['var_f1'].std()
        r2_mean = subset['test_r2'].mean()
        r2_std = subset['test_r2'].std()
        rt_mean = subset['runtime_seconds'].mean()
        rt_std = subset['runtime_seconds'].std()
        
        method_display = METHOD_NAMES.get(method, method)
        
        latex += f"{method_display} & {f1_mean:.3f} $\\pm$ {f1_std:.3f} & "
        latex += f"{r2_mean:.3f} $\\pm$ {r2_std:.3f} & "
        latex += f"{rt_mean:.1f} $\\pm$ {rt_std:.1f} \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    return latex


# Generate table
table1_latex = generate_main_results_table(df)
print("TABLE 1: Main Results Summary")
print("="*60)
print(table1_latex)

In [None]:
# ==============================================================================
# TABLE 2: VARIABLE SELECTION METRICS
# ==============================================================================

def generate_variable_selection_table(df: pd.DataFrame) -> str:
    """
    Generate LaTeX table for variable selection metrics.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Variable Selection Performance}
\label{tab:var_selection}
\begin{tabular}{lcccc}
\toprule
Method & Precision & Recall & F1 & Exact Match \\
\midrule
"""
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = core_df[core_df['method'] == method]
        if len(subset) == 0:
            continue
        prec = subset['var_precision'].mean()
        rec = subset['var_recall'].mean()
        f1 = subset['var_f1'].mean()
        exact = subset['selected_correct'].mean() * 100
        
        method_display = METHOD_NAMES.get(method, method)
        latex += f"{method_display} & {prec:.3f} & {rec:.3f} & {f1:.3f} & {exact:.1f}\\% \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    return latex


# Generate table
table2_latex = generate_variable_selection_table(df)
print("\nTABLE 2: Variable Selection Metrics")
print("="*60)
print(table2_latex)

In [None]:
# ==============================================================================
# TABLE 3: PREDICTION BY EQUATION
# ==============================================================================

def generate_prediction_by_equation_table(df: pd.DataFrame) -> str:
    """
    Generate LaTeX table for prediction accuracy by equation.
    Uses dynamic equation list.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Prediction Accuracy by Equation}
\label{tab:prediction_by_eq}
\begin{tabular}{llcc}
\toprule
Equation & Method & Test $R^2$ & Test RMSE \\
\midrule
"""
    
    # Get equations dynamically
    equations = get_available_equations(core_df)
    
    for equation in equations:
        eq_df = core_df[core_df['equation_name'] == equation]
        eq_display = EQUATION_NAMES.get(equation, equation)
        
        for i, method in enumerate(['physics_sr', 'pysr_only', 'lasso_pysr']):
            subset = eq_df[eq_df['method'] == method]
            if len(subset) == 0:
                continue
            r2 = subset['test_r2'].mean()
            rmse = subset['test_rmse'].mean()
            
            method_display = METHOD_NAMES.get(method, method)
            
            if i == 0:
                latex += f"{eq_display} & {method_display} & {r2:.3f} & {rmse:.4f} \\\\\n"
            else:
                latex += f" & {method_display} & {r2:.3f} & {rmse:.4f} \\\\\n"
        
        if equation != equations[-1]:
            latex += "\\midrule\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    return latex


# Generate table
table3_latex = generate_prediction_by_equation_table(df)
print("\nTABLE 3: Prediction by Equation")
print("="*60)
print(table3_latex)

In [None]:
# ==============================================================================
# TABLE 4: DIMENSIONAL INFORMATION BENEFIT
# ==============================================================================

def generate_dims_benefit_table(df: pd.DataFrame) -> str:
    """
    Generate LaTeX table for dimensional information benefit.
    Uses dynamic equation list.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    physics_sr_df = core_df[core_df['method'] == 'physics_sr']
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Benefit of Dimensional Information (Physics-SR)}
\label{tab:dims_benefit}
\begin{tabular}{lccc}
\toprule
Equation & With Dims (F1) & Without Dims (F1) & Improvement \\
\midrule
"""
    
    # Get equations dynamically
    equations = get_available_equations(physics_sr_df)
    
    for equation in equations:
        eq_df = physics_sr_df[physics_sr_df['equation_name'] == equation]
        with_dims = eq_df[eq_df['with_dims'] == True]['var_f1'].mean()
        without_dims = eq_df[eq_df['with_dims'] == False]['var_f1'].mean()
        
        # Handle NaN
        if pd.isna(with_dims):
            with_dims = 0.0
        if pd.isna(without_dims):
            without_dims = 0.0
        
        if without_dims > 0:
            improvement = (with_dims - without_dims) / without_dims * 100
        else:
            improvement = 0.0
        
        eq_display = EQUATION_NAMES.get(equation, equation)
        latex += f"{eq_display} & {with_dims:.3f} & {without_dims:.3f} & {improvement:+.1f}\\% \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    return latex


# Generate table
table4_latex = generate_dims_benefit_table(df)
print("\nTABLE 4: Dimensional Information Benefit")
print("="*60)
print(table4_latex)

In [None]:
# ==============================================================================
# TABLE 5: POWER-LAW EQUATIONS SUMMARY
# ==============================================================================

def generate_power_law_table(df: pd.DataFrame) -> str:
    """
    Generate LaTeX table for power-law equations only.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    power_law_df = core_df[core_df['eq_type'] == 'power_law']
    
    if len(power_law_df) == 0:
        return "% No power-law equations in data"
    
    latex = r"""\begin{table}[htbp]
\centering
\caption{Power-Law Equations Performance Summary}
\label{tab:power_law_summary}
\begin{tabular}{lcccc}
\toprule
Method & Precision & Recall & F1 & Exact Match \\
\midrule
"""
    
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = power_law_df[power_law_df['method'] == method]
        if len(subset) == 0:
            continue
        prec = subset['var_precision'].mean()
        rec = subset['var_recall'].mean()
        f1 = subset['var_f1'].mean()
        exact = subset['selected_correct'].mean() * 100
        
        method_display = METHOD_NAMES.get(method, method)
        latex += f"{method_display} & {prec:.3f} & {rec:.3f} & {f1:.3f} & {exact:.1f}\\% \\\\\n"
    
    latex += r"""\bottomrule
\end{tabular}
\end{table}"""
    
    return latex


# Generate table
table5_latex = generate_power_law_table(df)
print("\nTABLE 5: Power-Law Equations Summary")
print("="*60)
print(table5_latex)

In [None]:
# ==============================================================================
# EXPORT ALL TABLES
# ==============================================================================

def export_all_tables(
    df: pd.DataFrame,
    tables_dir: Path = TABLES_DIR
) -> None:
    """
    Export all LaTeX tables to files.
    """
    tables = {
        'table1_main_results.tex': generate_main_results_table(df),
        'table2_var_selection.tex': generate_variable_selection_table(df),
        'table3_prediction_by_eq.tex': generate_prediction_by_equation_table(df),
        'table4_dims_benefit.tex': generate_dims_benefit_table(df),
        'table5_power_law_summary.tex': generate_power_law_table(df),
    }
    
    for filename, latex_content in tables.items():
        filepath = tables_dir / filename
        with open(filepath, 'w') as f:
            f.write(latex_content)
        print(f"Saved: {filepath}")


# Export tables
export_all_tables(df)
print("\nAll LaTeX tables exported successfully.")

---
## Section 7: Conclusions

In [None]:
# ==============================================================================
# KEY FINDINGS SUMMARY
# ==============================================================================

def generate_key_findings(df: pd.DataFrame) -> Dict[str, Any]:
    """
    Generate key findings from the benchmark results.
    """
    core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
    physics_sr_df = core_df[core_df['method'] == 'physics_sr']
    power_law_df = core_df[core_df['eq_type'] == 'power_law']
    
    findings = {}
    
    # 1. Overall best method
    method_f1 = core_df.groupby('method')['var_f1'].mean()
    if len(method_f1) > 0:
        best_method = method_f1.idxmax()
        findings['best_method'] = {
            'method': METHOD_NAMES.get(best_method, best_method),
            'f1': method_f1[best_method],
        }
    
    # 2. Improvement over baseline
    if 'physics_sr' in method_f1.index and 'pysr_only' in method_f1.index:
        physics_sr_f1 = method_f1['physics_sr']
        pysr_only_f1 = method_f1['pysr_only']
        if pysr_only_f1 > 0:
            improvement = (physics_sr_f1 - pysr_only_f1) / pysr_only_f1 * 100
            findings['improvement_over_baseline'] = improvement
    
    # 3. Dimension benefit
    if len(physics_sr_df) > 0:
        with_dims_f1 = physics_sr_df[physics_sr_df['with_dims'] == True]['var_f1'].mean()
        without_dims_f1 = physics_sr_df[physics_sr_df['with_dims'] == False]['var_f1'].mean()
        if not pd.isna(without_dims_f1) and without_dims_f1 > 0:
            dims_benefit = (with_dims_f1 - without_dims_f1) / without_dims_f1 * 100
            findings['dimension_benefit'] = dims_benefit
    
    # 4. Noise robustness
    if 0.0 in core_df['noise_level'].values and 0.05 in core_df['noise_level'].values:
        clean_f1 = core_df[core_df['noise_level'] == 0.0].groupby('method')['var_f1'].mean()
        noisy_f1 = core_df[core_df['noise_level'] == 0.05].groupby('method')['var_f1'].mean()
        noise_degradation = {}
        for m in clean_f1.index:
            if m in noisy_f1.index and clean_f1[m] > 0:
                noise_degradation[m] = (clean_f1[m] - noisy_f1[m]) / clean_f1[m] * 100
        findings['noise_degradation'] = noise_degradation
    
    # 5. Exact match rates
    exact_match_rates = core_df.groupby('method')['selected_correct'].mean() * 100
    findings['exact_match_rates'] = exact_match_rates.to_dict()
    
    # 6. Hardest equation
    eq_f1 = core_df.groupby('equation_name')['var_f1'].mean()
    if len(eq_f1) > 0:
        hardest_eq = eq_f1.idxmin()
        findings['hardest_equation'] = {
            'equation': EQUATION_NAMES.get(hardest_eq, hardest_eq),
            'f1': eq_f1[hardest_eq],
        }
    
    # 7. Power-law performance
    if len(power_law_df) > 0:
        pl_physics_sr = power_law_df[power_law_df['method'] == 'physics_sr']
        if len(pl_physics_sr) > 0:
            findings['power_law_performance'] = {
                'f1': pl_physics_sr['var_f1'].mean(),
                'exact_match': pl_physics_sr['selected_correct'].mean() * 100,
            }
    
    return findings


# Generate findings
findings = generate_key_findings(df)

print("\n" + "="*80)
print("KEY FINDINGS SUMMARY")
print("="*80)

if 'best_method' in findings:
    print(f"\n1. BEST METHOD: {findings['best_method']['method']}")
    print(f"   - Average F1 Score: {findings['best_method']['f1']:.3f}")

if 'improvement_over_baseline' in findings:
    print(f"\n2. IMPROVEMENT OVER BASELINE (PySR-Only):")
    print(f"   - Physics-SR improves F1 by {findings['improvement_over_baseline']:+.1f}%")

if 'dimension_benefit' in findings:
    print(f"\n3. DIMENSIONAL INFORMATION BENEFIT:")
    print(f"   - Using dimensional info improves F1 by {findings['dimension_benefit']:+.1f}%")

if 'noise_degradation' in findings:
    print(f"\n4. NOISE ROBUSTNESS (% degradation from 0% to 5% noise):")
    for method, degradation in findings['noise_degradation'].items():
        print(f"   - {METHOD_NAMES.get(method, method)}: {degradation:.1f}% degradation")

if 'exact_match_rates' in findings:
    print(f"\n5. EXACT MATCH RATES:")
    for method, rate in findings['exact_match_rates'].items():
        print(f"   - {METHOD_NAMES.get(method, method)}: {rate:.1f}%")

if 'hardest_equation' in findings:
    print(f"\n6. HARDEST EQUATION: {findings['hardest_equation']['equation']}")
    print(f"   - Average F1 Score: {findings['hardest_equation']['f1']:.3f}")

if 'power_law_performance' in findings:
    print(f"\n7. POWER-LAW EQUATIONS (Physics-SR):")
    print(f"   - Average F1 Score: {findings['power_law_performance']['f1']:.3f}")
    print(f"   - Exact Match Rate: {findings['power_law_performance']['exact_match']:.1f}%")

In [None]:
# ==============================================================================
# FINAL SUMMARY TABLES
# ==============================================================================

print("\n" + "="*80)
print("FINAL SUMMARY TABLE - ALL EQUATIONS")
print("="*80)

# Create publication-ready summary
core_df = df[df['n_samples'] == 500].copy() if 'n_samples' in df.columns else df.copy()
power_law_df = core_df[core_df['eq_type'] == 'power_law']

summary_data = []
for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
    subset = core_df[core_df['method'] == method]
    if len(subset) == 0:
        continue
    summary_data.append({
        'Method': METHOD_NAMES[method],
        'Precision': f"{subset['var_precision'].mean():.3f}",
        'Recall': f"{subset['var_recall'].mean():.3f}",
        'F1': f"{subset['var_f1'].mean():.3f}",
        'Test R2': f"{subset['test_r2'].mean():.3f}",
        'Runtime (s)': f"{subset['runtime_seconds'].mean():.1f}",
        'Exact Match': f"{subset['selected_correct'].mean()*100:.1f}%",
    })

summary_table = pd.DataFrame(summary_data)
print("\n")
display(summary_table)

if len(power_law_df) > 0:
    print("\n" + "="*80)
    print("FINAL SUMMARY TABLE - POWER-LAW EQUATIONS ONLY")
    print("="*80)
    
    pl_summary_data = []
    for method in ['physics_sr', 'pysr_only', 'lasso_pysr']:
        subset = power_law_df[power_law_df['method'] == method]
        if len(subset) == 0:
            continue
        pl_summary_data.append({
            'Method': METHOD_NAMES[method],
            'Precision': f"{subset['var_precision'].mean():.3f}",
            'Recall': f"{subset['var_recall'].mean():.3f}",
            'F1': f"{subset['var_f1'].mean():.3f}",
            'Test R2': f"{subset['test_r2'].mean():.3f}",
            'Runtime (s)': f"{subset['runtime_seconds'].mean():.1f}",
            'Exact Match': f"{subset['selected_correct'].mean()*100:.1f}%",
        })
    
    pl_summary_table = pd.DataFrame(pl_summary_data)
    print("\n")
    display(pl_summary_table)
    
    # Save summary tables
    pl_summary_table.to_csv(RESULTS_DIR / 'summary_table_power_law.csv', index=False)

summary_table.to_csv(RESULTS_DIR / 'summary_table_all.csv', index=False)
print(f"\nSummary tables saved to {RESULTS_DIR}")

In [None]:
# ==============================================================================
# ANALYSIS COMPLETE
# ==============================================================================

print("\n" + "="*80)
print(" ANALYSIS COMPLETE")
print("="*80)

print("\n[Generated Files]")
print("\nFigures:")
for fig_file in sorted(FIGURES_DIR.glob('*.png')):
    print(f"  - {fig_file.name}")

print("\nTables:")
for table_file in sorted(TABLES_DIR.glob('*.tex')):
    print(f"  - {table_file.name}")

print("\nData:")
for csv_file in sorted(RESULTS_DIR.glob('*.csv')):
    print(f"  - {csv_file.name}")

print("\n" + "="*80)
print(" Ready for paper writing!")
print("="*80)

---
## Appendix: Quick Reference

### Output Files Generated

**Figures (`results/figures/`):**
1. `fig1_f1_vs_noise.png` - F1 vs Noise Level
2. `fig2_f1_vs_dummy.png` - F1 vs Dummy Features
3. `fig3_r2_comparison.png` - Test R2 by Method and Equation
4. `fig4_f1_comparison.png` - F1 by Method and Equation
5. `fig5_f1_by_type.png` - F1 by Equation Type
6. `fig6_dims_benefit.png` - Dimensional Information Benefit
7. `fig7_runtime.png` - Runtime Comparison
8. `fig8_heatmap.png` - Comprehensive F1 Heatmap
9. `fig9_sample_size.png` - Sample Size Sensitivity
10. `fig10_per_equation.png` - Per-Equation Breakdown
11. `fig11_exact_match.png` - Exact Match Rates

**Tables (`results/tables/`):**
1. `table1_main_results.tex` - Main Results Summary
2. `table2_var_selection.tex` - Variable Selection Metrics
3. `table3_prediction_by_eq.tex` - Prediction by Equation
4. `table4_dims_benefit.tex` - Dimension Benefit Analysis
5. `table5_power_law_summary.tex` - Power-Law Summary

**Data (`results/`):**
- `summary_table_all.csv` - Final summary table (all equations)
- `summary_table_power_law.csv` - Final summary table (power-law only)

### Key Metrics

| Metric | Description |
|--------|-------------|
| var_f1 | Variable Selection F1 Score |
| var_precision | Precision in variable selection |
| var_recall | Recall in variable selection |
| selected_correct | Exact match with ground truth |
| test_r2 | Test set R-squared |
| runtime_seconds | Total execution time |
| eq_type | Equation type (power_law or nested_transcendental) |