# 06_PySR - Physics-SR Framework v3.0

## Stage 2.2a: PySR Genetic Programming + Structure Parser

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

Search for symbolic expressions via evolutionary algorithms (genetic programming). PySR explores the space of mathematical expressions to find equations that balance accuracy and complexity.

### Algorithm Overview

1. **Configure PySR** with operators and constraints
2. **Run symbolic search** via genetic programming
3. **Extract Pareto front** (complexity vs accuracy tradeoff)
4. **Parse structure** for downstream UQ

### Pareto Front

PySR returns a set of equations on the Pareto front:
- Each equation is optimal for its complexity level
- Trade-off between simplicity and accuracy
- Selection criteria: accuracy, complexity, or Pareto optimality

### Reference

- Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. *arXiv:2305.01582*.

---
## Section 1: Header and Imports

In [None]:
"""
06_PySR.ipynb - PySR Genetic Programming + Structure Parser
============================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- PySRDiscoverer: Wrapper for PySR symbolic regression
- StructureParser: Parse discovered equations for UQ pipeline
- Pareto front extraction and equation selection
- Graceful fallback when PySR is not available

Algorithm:
    1. Configure PySR with operators and complexity constraints
    2. Run genetic programming search
    3. Extract Pareto front of candidate equations
    4. Parse best equation into feature matrix for Stage 3

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# PySR installation check
PYSR_AVAILABLE = False

try:
    from pysr import PySRRegressor
    PYSR_AVAILABLE = True
    print("06_PySR: PySR is available.")
except ImportError:
    warnings.warn(
        "PySR not available. Install with: pip install pysr\n"
        "PySRDiscoverer will use fallback mode."
    )
    print("06_PySR: PySR not available, fallback mode enabled.")

In [None]:
# Additional imports for Structure Parser
import sympy as sp
from sympy.parsing.sympy_parser import parse_expr
from sympy import symbols, sympify, Add, Mul, Pow, Float, Integer
from typing import Dict, List, Tuple, Optional, Any, Union
import re

print("06_PySR: SymPy imports successful.")

---
## Section 2: Class Definitions

In [None]:
# ==============================================================================
# PYSR DISCOVERER CLASS
# ==============================================================================

class PySRDiscoverer:
    """
    PySR-based Symbolic Regression Discoverer.
    
    Wrapper around PySR that provides:
    - Configurable operator sets and complexity constraints
    - Pareto front extraction
    - Multiple equation selection criteria
    - Graceful fallback when PySR is not installed
    
    Attributes
    ----------
    binary_operators : List[str]
        Binary operators to use (default: ["+", "-", "*", "/", "^"])
    unary_operators : List[str]
        Unary operators to use (default: ["sqrt", "exp", "log"])
    maxsize : int
        Maximum expression complexity (default: 25)
    niterations : int
        Number of iterations (default: 100)
    populations : int
        Number of populations (default: 30)
    procs : int
        Number of parallel processes (default: 4)
    
    Examples
    --------
    >>> discoverer = PySRDiscoverer(maxsize=20, niterations=50)
    >>> result = discoverer.discover(X, y, feature_names)
    >>> print(result['best_equation'])
    """
    
    def __init__(
        self,
        binary_operators: List[str] = None,
        unary_operators: List[str] = None,
        maxsize: int = 25,
        niterations: int = 100,
        populations: int = 30,
        procs: int = 4,
        random_state: int = RANDOM_SEED
    ):
        """
        Initialize PySRDiscoverer.
        
        Parameters
        ----------
        binary_operators : List[str], optional
            Binary operators. Default: ["+", "-", "*", "/", "^"]
        unary_operators : List[str], optional
            Unary operators. Default: ["sqrt", "exp", "log"]
        maxsize : int
            Maximum expression tree size. Default: 25
        niterations : int
            Number of search iterations. Default: 100
        populations : int
            Number of parallel populations. Default: 30
        procs : int
            Number of CPU processes. Default: 4
        random_state : int
            Random seed for reproducibility. Default: 42
        """
        self.binary_operators = binary_operators or ["+", "-", "*", "/", "^"]
        self.unary_operators = unary_operators or ["sqrt", "exp", "log"]
        self.maxsize = maxsize
        self.niterations = niterations
        self.populations = populations
        self.procs = procs
        self.random_state = random_state
        
        # Internal state
        self._model = None
        self._feature_names = None
        self._pareto_front = None
        self._best_equation = None
        self._best_complexity = None
        self._best_loss = None
        self._discovery_complete = False
    
    def discover(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str]
    ) -> Dict[str, Any]:
        """
        Run PySR symbolic regression.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix of shape (n_samples, n_features)
        y : np.ndarray
            Target vector of shape (n_samples,)
        feature_names : List[str]
            Names of features
        
        Returns
        -------
        Dict[str, Any]
            Dictionary containing:
            - best_equation: Best equation string
            - best_complexity: Complexity of best equation
            - best_loss: Loss of best equation
            - pareto_front: DataFrame of Pareto-optimal equations
            - all_equations: List of all candidate equations
            - pysr_available: Whether PySR was used
        """
        self._feature_names = list(feature_names)
        
        if not PYSR_AVAILABLE:
            return self._fallback_discovery(X, y)
        
        # Configure and run PySR
        self._model = self._configure_pysr()
        
        # Run search
        self._model.fit(X, y, variable_names=self._feature_names)
        
        # Extract results
        self._pareto_front = self._extract_pareto_front()
        self._best_equation = self._select_best_equation(criterion='accuracy')
        
        # Get best equation details
        best_idx = self._model.equations_.loss.idxmin()
        self._best_complexity = int(self._model.equations_.loc[best_idx, 'complexity'])
        self._best_loss = float(self._model.equations_.loc[best_idx, 'loss'])
        
        self._discovery_complete = True
        
        return {
            'best_equation': self._best_equation,
            'best_complexity': self._best_complexity,
            'best_loss': self._best_loss,
            'pareto_front': self._pareto_front,
            'all_equations': self.get_equations(),
            'pysr_available': True,
            'n_equations': len(self._model.equations_)
        }
    
    def _configure_pysr(self):
        """
        Configure PySR model with specified parameters.
        
        Returns
        -------
        PySRRegressor
            Configured PySR model
        """
        model = PySRRegressor(
            binary_operators=self.binary_operators,
            unary_operators=self.unary_operators,
            maxsize=self.maxsize,
            niterations=self.niterations,
            populations=self.populations,
            procs=0,  # FIXED: Must be 0 for deterministic mode
            random_state=self.random_state,
            deterministic=True,
            parallelism='serial',  # FIXED: Required for deterministic mode
            verbosity=0,
            progress=False
        )
        return model
    
    def _extract_pareto_front(self) -> pd.DataFrame:
        """
        Extract Pareto front from PySR results.
        
        Returns
        -------
        pd.DataFrame
            DataFrame with columns: complexity, loss, equation
        """
        if self._model is None or self._model.equations_ is None:
            return pd.DataFrame(columns=['complexity', 'loss', 'equation'])
        
        df = self._model.equations_[['complexity', 'loss', 'equation']].copy()
        
        # Filter to Pareto-optimal points
        pareto_mask = np.ones(len(df), dtype=bool)
        for i, (c1, l1) in enumerate(zip(df['complexity'], df['loss'])):
            for j, (c2, l2) in enumerate(zip(df['complexity'], df['loss'])):
                if i != j and c2 <= c1 and l2 <= l1 and (c2 < c1 or l2 < l1):
                    pareto_mask[i] = False
                    break
        
        return df[pareto_mask].sort_values('complexity').reset_index(drop=True)
    
    def _select_best_equation(
        self,
        criterion: str = 'accuracy'
    ) -> str:
        """
        Select best equation based on criterion.
        
        Parameters
        ----------
        criterion : str
            Selection criterion: 'accuracy', 'complexity', or 'pareto'
        
        Returns
        -------
        str
            Best equation string
        """
        if self._model is None or self._model.equations_ is None:
            return ""
        
        df = self._model.equations_
        
        if criterion == 'accuracy':
            best_idx = df['loss'].idxmin()
        elif criterion == 'complexity':
            best_idx = df['complexity'].idxmin()
        else:  # pareto - select knee point
            # Simple heuristic: best loss among equations with complexity < median
            med_complexity = df['complexity'].median()
            simple_df = df[df['complexity'] <= med_complexity]
            if len(simple_df) > 0:
                best_idx = simple_df['loss'].idxmin()
            else:
                best_idx = df['loss'].idxmin()
        
        return str(df.loc[best_idx, 'equation'])
    
    def _fallback_discovery(
        self,
        X: np.ndarray,
        y: np.ndarray
    ) -> Dict[str, Any]:
        """
        Fallback when PySR is not available.
        
        Uses simple polynomial regression to suggest a basic form.
        """
        from sklearn.linear_model import Ridge
        from sklearn.preprocessing import PolynomialFeatures
        
        # Fit simple polynomial
        poly = PolynomialFeatures(degree=2, include_bias=False)
        X_poly = poly.fit_transform(X)
        
        reg = Ridge(alpha=0.1)
        reg.fit(X_poly, y)
        
        # Build equation string
        terms = []
        feature_names_poly = poly.get_feature_names_out(self._feature_names)
        for coef, name in zip(reg.coef_, feature_names_poly):
            if abs(coef) > 0.01:
                terms.append(f"{coef:.4f}*{name}")
        
        if reg.intercept_ != 0:
            equation = f"{reg.intercept_:.4f} + " + " + ".join(terms)
        else:
            equation = " + ".join(terms)
        
        # Compute loss
        y_pred = reg.predict(X_poly)
        loss = np.mean((y - y_pred)**2)
        
        self._best_equation = equation
        self._best_loss = loss
        self._best_complexity = len(terms) + 1
        self._discovery_complete = True
        
        return {
            'best_equation': equation,
            'best_complexity': self._best_complexity,
            'best_loss': loss,
            'pareto_front': pd.DataFrame({
                'complexity': [self._best_complexity],
                'loss': [loss],
                'equation': [equation]
            }),
            'all_equations': [equation],
            'pysr_available': False,
            'n_equations': 1
        }
    
    def get_equations(self) -> List[str]:
        """
        Get all discovered equations.
        
        Returns
        -------
        List[str]
            List of equation strings
        """
        if not self._discovery_complete:
            raise ValueError("Must run discover() first")
        
        if self._model is None:
            return [self._best_equation] if self._best_equation else []
        
        return list(self._model.equations_['equation'].astype(str))
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        Predict using the best discovered equation.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        
        Returns
        -------
        np.ndarray
            Predictions
        """
        if not self._discovery_complete:
            raise ValueError("Must run discover() first")
        
        if self._model is not None and PYSR_AVAILABLE:
            return self._model.predict(X)
        else:
            # Fallback: use StructureParser
            parser = StructureParser()
            Phi, _ = parser.parse_equation(
                self._best_equation, X, self._feature_names
            )
            # Simple linear combination (coefficients embedded in equation)
            return Phi[:, 0] if Phi.shape[1] == 1 else np.sum(Phi, axis=1)
    
    def print_discovery_report(self) -> None:
        """
        Print a detailed discovery report.
        """
        if not self._discovery_complete:
            print("Discovery not yet performed. Run discover() first.")
            return
        
        print("=" * 70)
        print(" PySR Discovery Results")
        print("=" * 70)
        print()
        print(f"Configuration:")
        print(f"  Binary operators: {self.binary_operators}")
        print(f"  Unary operators: {self.unary_operators}")
        print(f"  Max size: {self.maxsize}")
        print(f"  Iterations: {self.niterations}")
        print(f"  PySR available: {PYSR_AVAILABLE}")
        print()
        print("-" * 70)
        print(" Best Equation:")
        print("-" * 70)
        print(f"  {self._best_equation}")
        print(f"  Complexity: {self._best_complexity}")
        print(f"  Loss (MSE): {self._best_loss:.6f}")
        print()
        
        if self._pareto_front is not None and len(self._pareto_front) > 0:
            print("-" * 70)
            print(" Pareto Front:")
            print("-" * 70)
            for _, row in self._pareto_front.iterrows():
                print(f"  C={int(row['complexity']):2d}, L={row['loss']:.6f}: {row['equation']}")
        
        print()
        print("=" * 70)

In [None]:
# ==============================================================================
# STRUCTURE PARSER CLASS
# ==============================================================================

class StructureParser:
    """
    Parse discovered equations to build feature matrices for UQ.
    
    Takes a symbolic equation string and extracts:
    - Individual additive terms
    - Variable dependencies
    - Numerical coefficients
    - Feature matrix for regression
    
    This enables uncertainty quantification on the discovered structure
    by treating the functional form as known and re-estimating coefficients.
    
    Attributes
    ----------
    _parsed_expr : sp.Expr
        SymPy expression of parsed equation
    _terms : List[sp.Expr]
        List of additive terms
    _term_structures : List[Dict]
        Structural information for each term
    
    Examples
    --------
    >>> parser = StructureParser()
    >>> Phi, names = parser.parse_equation("0.89*x**2.47*y**(-1.79)", X, ['x', 'y'])
    >>> print(names)
    ['x^2.47 * y^-1.79']
    """
    
    def __init__(self):
        """
        Initialize StructureParser.
        """
        self._parsed_expr = None
        self._terms = None
        self._term_structures = None
        self._feature_names = None
        self._parse_complete = False
    
    def parse_equation(
        self,
        equation_str: str,
        X: np.ndarray,
        feature_names: List[str]
    ) -> Tuple[np.ndarray, List[str]]:
        """
        Parse equation and build feature matrix.
        
        Parameters
        ----------
        equation_str : str
            Equation string (e.g., "0.89*x**2.47*y**(-1.79)")
        X : np.ndarray
            Feature matrix of shape (n_samples, n_features)
        feature_names : List[str]
            Names of features
        
        Returns
        -------
        Tuple[np.ndarray, List[str]]
            - Feature matrix Phi of shape (n_samples, n_terms)
            - List of term names
        """
        self._feature_names = list(feature_names)
        
        # Parse equation to SymPy
        self._parsed_expr = self._sympify_equation(equation_str)
        
        # Decompose into additive terms
        self._terms = self._decompose_additive_terms(self._parsed_expr)
        
        # Extract structure of each term
        self._term_structures = [
            self._extract_term_structure(term) 
            for term in self._terms
        ]
        
        # Build feature matrix
        n_samples = X.shape[0]
        features = []
        names = []
        
        for term, structure in zip(self._terms, self._term_structures):
            feature_col = self._evaluate_term(structure, X)
            features.append(feature_col)
            names.append(structure['name'])
        
        if len(features) == 0:
            # Return constant term if no features extracted
            return np.ones((n_samples, 1)), ['1']
        
        Phi = np.column_stack(features)
        
        # Check numerical stability
        if not self._check_numerical_stability(Phi):
            warnings.warn("Feature matrix contains NaN or Inf values")
            Phi = np.nan_to_num(Phi, nan=0.0, posinf=1e10, neginf=-1e10)
        
        self._parse_complete = True
        
        return Phi, names
    
    def _sympify_equation(
        self,
        equation_str: str
    ) -> sp.Expr:
        """
        Convert equation string to SymPy expression.
        
        Parameters
        ----------
        equation_str : str
            Equation string
        
        Returns
        -------
        sp.Expr
            SymPy expression
        """
        # Clean up equation string
        equation_str = equation_str.strip()
        
        # Replace ^ with ** for Python/SymPy
        equation_str = equation_str.replace('^', '**')
        
        # Create symbol mapping
        local_dict = {name: sp.Symbol(name) for name in self._feature_names}
        
        # Add common functions
        local_dict.update({
            'sqrt': sp.sqrt,
            'exp': sp.exp,
            'log': sp.log,
            'sin': sp.sin,
            'cos': sp.cos,
            'abs': sp.Abs
        })
        
        try:
            expr = sympify(equation_str, locals=local_dict)
        except Exception as e:
            warnings.warn(f"Failed to parse equation: {e}")
            expr = sp.Float(0)
        
        return expr
    
    def _decompose_additive_terms(
        self,
        expr: sp.Expr
    ) -> List[sp.Expr]:
        """
        Decompose expression into additive terms.
        
        Parameters
        ----------
        expr : sp.Expr
            SymPy expression
        
        Returns
        -------
        List[sp.Expr]
            List of additive terms
        """
        if isinstance(expr, Add):
            return list(expr.args)
        else:
            return [expr]
    
    def _extract_term_structure(
        self,
        term: sp.Expr
    ) -> Dict[str, Any]:
        """
        Extract structural information from a term.
        
        Parameters
        ----------
        term : sp.Expr
            SymPy expression for a single term
        
        Returns
        -------
        Dict[str, Any]
            Dictionary with:
            - coefficient: numerical coefficient
            - variables: dict of variable -> exponent
            - functions: list of applied functions
            - name: human-readable name
        """
        structure = {
            'coefficient': 1.0,
            'variables': {},
            'functions': [],
            'name': '',
            'expr': term
        }
        
        # Extract coefficient and remaining expression
        coef, rest = term.as_coeff_Mul()
        structure['coefficient'] = float(coef)
        
        # Analyze remaining expression
        if rest == 1:
            structure['name'] = '1'
            return structure
        
        # Get bases and exponents
        bases_exps = []
        if isinstance(rest, Mul):
            for factor in rest.args:
                base, exp = factor.as_base_exp()
                bases_exps.append((base, exp))
        else:
            base, exp = rest.as_base_exp()
            bases_exps.append((base, exp))
        
        # Extract variable dependencies
        name_parts = []
        for base, exp in bases_exps:
            base_str = str(base)
            if base_str in self._feature_names:
                exp_val = float(exp) if exp.is_number else 1.0
                structure['variables'][base_str] = exp_val
                if exp_val == 1:
                    name_parts.append(base_str)
                else:
                    name_parts.append(f"{base_str}^{exp_val:.2f}")
            elif base.is_Function:
                structure['functions'].append(str(base.func))
                name_parts.append(str(base))
            else:
                # Try to get free symbols
                for sym in base.free_symbols:
                    sym_name = str(sym)
                    if sym_name in self._feature_names:
                        structure['variables'][sym_name] = 1.0
                name_parts.append(str(base))
        
        structure['name'] = ' * '.join(name_parts) if name_parts else str(rest)
        
        return structure
    
    def _evaluate_term(
        self,
        structure: Dict[str, Any],
        X: np.ndarray
    ) -> np.ndarray:
        """
        Evaluate a term on the data.
        
        Parameters
        ----------
        structure : Dict
            Term structure from _extract_term_structure
        X : np.ndarray
            Feature matrix
        
        Returns
        -------
        np.ndarray
            Evaluated term values
        """
        n_samples = X.shape[0]
        
        # Create variable mapping
        var_map = {name: X[:, i] for i, name in enumerate(self._feature_names)}
        
        # If we have explicit variable powers, compute directly
        if structure['variables']:
            result = np.ones(n_samples)
            for var_name, exp in structure['variables'].items():
                if var_name in var_map:
                    result *= np.power(var_map[var_name], exp)
            return result
        
        # Otherwise, try to evaluate the expression numerically
        try:
            expr = structure['expr']
            symbols_list = [sp.Symbol(name) for name in self._feature_names]
            func = sp.lambdify(symbols_list, expr, 'numpy')
            result = func(*[X[:, i] for i in range(X.shape[1])])
            return np.atleast_1d(result)
        except Exception:
            return np.ones(n_samples)
    
    def _check_numerical_stability(
        self,
        Phi: np.ndarray
    ) -> bool:
        """
        Check if feature matrix is numerically stable.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature matrix
        
        Returns
        -------
        bool
            True if stable (no NaN/Inf)
        """
        return not (np.any(np.isnan(Phi)) or np.any(np.isinf(Phi)))
    
    def get_term_info(self) -> List[Dict[str, Any]]:
        """
        Get detailed information about parsed terms.
        
        Returns
        -------
        List[Dict]
            List of term structure dictionaries
        """
        if not self._parse_complete:
            raise ValueError("Must run parse_equation() first")
        return self._term_structures

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 06_PySR")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: PySR Discovery (or Fallback)
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: PySR Discovery")
    
    # Generate simple test data
    np.random.seed(42)
    n_samples = 200
    
    x = np.random.uniform(0.1, 2, n_samples)
    z = np.random.uniform(0.1, 2, n_samples)
    
    # True equation: y = 0.5*x^2 + 0.3*z
    y = 0.5 * x**2 + 0.3 * z + 0.01 * np.random.randn(n_samples)
    
    X = np.column_stack([x, z])
    feature_names = ['x', 'z']
    
    print(f"True equation: y = 0.5*x^2 + 0.3*z")
    print(f"PySR available: {PYSR_AVAILABLE}")
    print()
    
    # Run discovery
    discoverer = PySRDiscoverer(
        maxsize=15,
        niterations=20  # Reduced for testing
    )
    result = discoverer.discover(X, y, feature_names)
    
    print(f"Best equation: {result['best_equation']}")
    print(f"Complexity: {result['best_complexity']}")
    print(f"Loss: {result['best_loss']:.6f}")
    
    discoverer.print_discovery_report()

In [None]:
# ==============================================================================
# TEST 2: Structure Parser
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Structure Parser")
    
    # Test parsing known equation
    np.random.seed(42)
    n_samples = 100
    
    x0 = np.random.uniform(0.5, 2, n_samples)
    x1 = np.random.uniform(0.5, 2, n_samples)
    
    X = np.column_stack([x0, x1])
    feature_names = ['x0', 'x1']
    
    # Test equation (warm rain form)
    equation = "0.89*x0**2.47*x1**(-1.79)"
    
    print(f"Input equation: {equation}")
    print()
    
    parser = StructureParser()
    Phi, names = parser.parse_equation(equation, X, feature_names)
    
    print(f"Parsed terms: {names}")
    print(f"Feature matrix shape: {Phi.shape}")
    print(f"First 5 values: {Phi[:5, 0]}")
    
    # Verify computation
    expected = 0.89 * x0**2.47 * x1**(-1.79)
    # Note: coefficient is separated, so Phi should be x0^2.47 * x1^-1.79
    
    term_info = parser.get_term_info()
    print(f"\nTerm details:")
    for info in term_info:
        print(f"  Coefficient: {info['coefficient']}")
        print(f"  Variables: {info['variables']}")
        print(f"  Name: {info['name']}")

In [None]:
# ==============================================================================
# TEST 3: Multi-term Parsing
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Multi-term Parsing")
    
    np.random.seed(42)
    n_samples = 100
    
    a = np.random.uniform(0.5, 2, n_samples)
    b = np.random.uniform(0.5, 2, n_samples)
    
    X = np.column_stack([a, b])
    feature_names = ['a', 'b']
    
    # Multi-term equation
    equation = "1.5*a**2 + 0.7*b + 0.3"
    
    print(f"Input equation: {equation}")
    print()
    
    parser = StructureParser()
    Phi, names = parser.parse_equation(equation, X, feature_names)
    
    print(f"Number of terms: {len(names)}")
    print(f"Term names: {names}")
    print(f"Feature matrix shape: {Phi.shape}")
    
    if Phi.shape[1] >= 3:
        print("[PASS] Multiple terms correctly parsed")
    else:
        print(f"[INFO] Parsed {Phi.shape[1]} terms")

In [None]:
# ==============================================================================
# TEST 4: Fallback Mode
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: Fallback Mode")
    
    # Test that fallback works when PySR not available
    np.random.seed(42)
    n_samples = 100
    
    x = np.random.uniform(0, 1, n_samples)
    z = np.random.uniform(0, 1, n_samples)
    y = 2*x + 3*z + 0.1*np.random.randn(n_samples)
    
    X = np.column_stack([x, z])
    feature_names = ['x', 'z']
    
    # Force fallback
    discoverer = PySRDiscoverer()
    result = discoverer._fallback_discovery(X, y)
    
    print(f"Fallback equation: {result['best_equation']}")
    print(f"PySR used: {result['pysr_available']}")
    print(f"Loss: {result['best_loss']:.6f}")
    
    if not result['pysr_available']:
        print("[PASS] Fallback mode working correctly")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 06_PySR.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASSES:")
print("-" * 70)
print()
print("1. PySRDiscoverer")
print("   Purpose: Symbolic regression via genetic programming")
print("   Main Methods:")
print("     discover(X, y, feature_names) - Run PySR search")
print("     get_equations() - Get all discovered equations")
print("     predict(X) - Predict using best equation")
print("     print_discovery_report() - Print detailed results")
print()
print("2. StructureParser")
print("   Purpose: Parse equations for downstream UQ")
print("   Main Methods:")
print("     parse_equation(equation_str, X, feature_names) - Parse and evaluate")
print("     get_term_info() - Get detailed term information")
print()
print(f"PySR Status: {'Available' if PYSR_AVAILABLE else 'Not available (fallback mode)'}")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Discover equations with PySR
discoverer = PySRDiscoverer(
    maxsize=25,
    niterations=100
)
result = discoverer.discover(X, y, feature_names)
print(f"Best equation: {result['best_equation']}")

# Parse for UQ pipeline
parser = StructureParser()
Phi, names = parser.parse_equation(result['best_equation'], X, feature_names)
print(f"Feature matrix shape: {Phi.shape}")
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 06_PySR.ipynb")
print("=" * 70)