# 05_FeatureLibrary - Physics-SR Framework v4.1

## Stage 2.3: Augmented Feature Library Construction

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Contact:** zz3239@columbia.edu  
**Date:** January 2026  
**Version:** 4.1 (5-Layer Physics-Guided Library Construction)

---

### Purpose

Build comprehensive augmented feature library combining physics-guided terms,
PySR discoveries, and baseline features. This is a **MAJOR REDESIGN** from v3.0
to support the new Structure-Guided Discovery pipeline.

### v4.1 Five-Layer Library Construction

| Layer | Source | Priority | Description |
|-------|--------|----------|-------------|
| 0 | `[PowLaw]` | **Highest** | Power-law terms from Stage 1 symmetry (inverse terms) |
| 1 | `[PySR]` | High | Exact terms from PySR Pareto front |
| 2 | `[Var]` | Medium | Variant terms via variable substitution |
| 3 | `[Poly]` | Baseline | Polynomial terms (always included) |
| 4 | `[Op]` | Safety net | Operator-guided simple terms |

### v4.1 Key Enhancement: Layer 0 Power-Law Terms

When Stage 1 symmetry analysis detects negative exponents (e.g., r^-2 in Coulomb's law),
Layer 0 automatically adds inverse terms (1/r, 1/r^2) that standard polynomial libraries lack.
This is **critical** for equations like F = k*q1*q2/r^2.

### Design Philosophy

1. **KEEP:** If PySR found sin(x^2) and it's correct, E-WSINDy will select it
2. **DISCOVER:** If PySR missed x*z, E-WSINDy can find it in polynomial layer
3. **REJECT:** If PySR included spurious term, E-WSINDy's sparsity can exclude it
4. **PHYSICS-GUIDED:** Layer 0 ensures correct basis for power-law physics

### Algorithm Reference

Framework v4.0/v4.1 Section 4.3: Augmented Library Construction

---
## Section 1: Header and Imports

In [None]:
"""
05_FeatureLibrary.ipynb - Augmented Feature Library Construction
=================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v4.1

This module provides:
- AugmentedLibraryBuilder: Build 5-layer feature library from physics + PySR
- Source tagging: [PowLaw], [PySR], [Var], [Poly], [Op] for term attribution
- Power-law guided terms from Stage 1 symmetry analysis
- Lazy variant generation based on PySR R-squared
- Integration with Stage2Results for seamless pipeline

v4.1 Key Changes from v3.0:
- Renamed class: FeatureLibraryBuilder -> AugmentedLibraryBuilder
- New build() signature accepts parsed_terms, detected_operators, estimated_exponents
- 5-layer construction: Layer 0 [PowLaw] for physics-guided inverse terms
- Lazy variant generation based on PySR performance

Output Dictionary Keys (v4.1):
- library_matrix: np.ndarray (n_samples, K)
- library_names: List[str] with source tags
- library_info: Dict with layer composition statistics (n_powerlaw_terms, etc.)

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
Contact: zz3239@columbia.edu
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for Feature Library
from sklearn.preprocessing import StandardScaler
from itertools import combinations_with_replacement
from typing import Dict, List, Tuple, Optional, Any, Set, Callable
from collections import Counter
import sympy as sp
from sympy import Symbol, sympify, lambdify, Add

print("05_FeatureLibrary v4.1: Additional imports successful.")

---
## Section 2: Class Definition

In [None]:
# ==============================================================================
# AUGMENTED LIBRARY BUILDER CLASS (v4.1)
# ==============================================================================

class AugmentedLibraryBuilder:
    """
    Augmented Feature Library Construction (v4.1 Enhanced).
    
    Builds 5-layer feature library combining physics-guided terms with PySR discoveries:
    - Layer 0: Power-Law Guided Terms (from Stage 1 symmetry, highest priority)
    - Layer 1: PySR Exact Terms (second priority)
    - Layer 2: Variant Terms (variable substitution)
    - Layer 3: Polynomial Baseline (always included)
    - Layer 4: Operator-Guided Terms (from detected operators)
    
    v4.1 Enhancement: Layer 0 adds inverse terms (1/x, 1/x^2) when Stage 1 symmetry
    analysis detects negative exponents. This is critical for equations like
    Coulomb's law (F = k*q1*q2/r^2) where standard polynomial terms are insufficient.
    
    Attributes
    ----------
    max_poly_degree : int
        Maximum polynomial degree (default: 3)
    generate_variants : bool
        Whether to generate variant terms (default: True)
    max_variants_per_term : int
        Maximum variants per PySR term (default: 3)
    include_operator_terms : bool
        Whether to include operator-guided terms (default: True)
    normalize : bool
        Whether to standardize features (default: True)
    lazy_variant_threshold : float
        PySR R2 threshold for lazy variant generation (default: 0.95)
    scaler : StandardScaler
        Scaler object (public, for downstream use)
    library_info : Dict
        Library composition statistics (public)
    
    Methods
    -------
    build(X, feature_names, parsed_terms, detected_operators, pysr_r2, estimated_exponents) -> Tuple
        Build complete augmented library
    build_from_pysr_results(X, feature_names, stage2_partial) -> Tuple
        Build from partial Stage2Results containing PySR output
    transform(X_new) -> np.ndarray
        Transform new data using fitted library structure
    print_library_summary() -> None
        Print detailed library composition report
    
    Reference
    ---------
    Framework v4.1 Section 4.3: Augmented Library Construction
    
    Examples
    --------
    >>> builder = AugmentedLibraryBuilder(max_poly_degree=3)
    >>> Phi, names, info = builder.build(
    ...     X, feature_names,
    ...     parsed_terms=[(expr, 'sin(x**2)', func)],
    ...     detected_operators={'sin'},
    ...     pysr_r2=0.85,
    ...     estimated_exponents={'q1': 1.0, 'r': -2.0}  # From Stage 1
    ... )
    >>> print(f"Library size: {Phi.shape[1]} features")
    """
    
    def __init__(
        self,
        max_poly_degree: int = DEFAULT_MAX_POLY_DEGREE,
        generate_variants: bool = True,
        max_variants_per_term: int = 3,
        include_operator_terms: bool = True,
        normalize: bool = True,
        lazy_variant_threshold: float = 0.95
    ):
        """
        Initialize AugmentedLibraryBuilder.
        
        Parameters
        ----------
        max_poly_degree : int
            Maximum polynomial degree. Features up to x^d are included.
            Default: 3
        generate_variants : bool
            Whether to generate Layer 2 variant terms via variable substitution.
            Default: True
        max_variants_per_term : int
            Maximum number of variants to generate per PySR term.
            Default: 3
        include_operator_terms : bool
            Whether to include Layer 4 operator-guided simple terms.
            Default: True
        normalize : bool
            Whether to standardize features to zero mean, unit variance.
            Default: True
        lazy_variant_threshold : float
            If PySR R2 >= threshold, skip variant generation (lazy mode).
            Default: 0.95
        """
        self.max_poly_degree = max_poly_degree
        self.generate_variants = generate_variants
        self.max_variants_per_term = max_variants_per_term
        self.include_operator_terms = include_operator_terms
        self.normalize = normalize
        self.lazy_variant_threshold = lazy_variant_threshold
        
        # Public attributes (v4.1)
        self.scaler = None
        self.library_info = None
        
        # Internal state (private)
        self._feature_names = None
        self._library_names = None
        self._n_input_features = None
        self._n_library_features = None
        self._parsed_terms = None
        self._detected_operators = None
        self._feature_symbols = None
        self._build_complete = False
    
    def build(
        self,
        X: np.ndarray,
        feature_names: List[str],
        parsed_terms: List[Tuple] = None,
        detected_operators: set = None,
        pysr_r2: float = 0.0,
        estimated_exponents: Dict[str, float] = None
    ) -> Tuple[np.ndarray, List[str], Dict]:
        """
        Build complete augmented feature library (v4.1 Enhanced).
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix (n_samples, n_features)
        feature_names : List[str]
            Original feature names
        parsed_terms : List[Tuple], optional
            List of (expr, name, func) from StructureParser.
            If None, only polynomial baseline is built.
        detected_operators : set, optional
            Set of operators found {'sin', 'cos', 'exp', ...}.
            If None, no operator-guided terms are added.
        pysr_r2 : float
            PySR best R-squared (for lazy variant generation).
            Default: 0.0
        estimated_exponents : Dict[str, float], optional
            Power-law exponents from Stage 1 symmetry analysis.
            e.g., {'q1': 1.0, 'q2': 1.0, 'r': -2.0} for Coulomb's law.
            If provided with negative exponents, adds inverse terms (Layer 0).
            
        Returns
        -------
        Phi_aug : np.ndarray
            Augmented feature matrix (n_samples, K)
        library_names : List[str]
            Feature names with source tags [PowLaw], [PySR], [Var], [Poly], [Op]
        library_info : Dict
            Library composition statistics
        """
        self._feature_names = list(feature_names)
        self._n_input_features = X.shape[1]
        self._parsed_terms = parsed_terms or []
        self._detected_operators = detected_operators or set()
        self._estimated_exponents = estimated_exponents or {}
        n_samples = X.shape[0]
        
        # Create SymPy symbols for variable substitution
        self._feature_symbols = [Symbol(name) for name in feature_names]
        
        # Initialize containers
        Phi_columns = []
        Phi_names = []
        
        # Layer 0: Power-Law Guided Terms (highest priority, from Stage 1 symmetry)
        n_powerlaw = 0
        if self._estimated_exponents and len(self._estimated_exponents) > 0:
            self._add_powerlaw_terms(X, Phi_columns, Phi_names, self._estimated_exponents)
            n_powerlaw = len(Phi_names)
        
        # Layer 1: PySR Exact Terms
        self._add_pysr_exact_terms(X, Phi_columns, Phi_names)
        n_pysr = len(Phi_names) - n_powerlaw
        
        # Layer 2: Variant Terms (conditional on PySR R2)
        if self.generate_variants and pysr_r2 < self.lazy_variant_threshold:
            self._add_variant_terms(X, Phi_columns, Phi_names)
        n_variant = len(Phi_names) - n_powerlaw - n_pysr
        
        # Layer 3: Polynomial Baseline (always included)
        self._add_polynomial_terms(X, Phi_columns, Phi_names)
        n_poly = len(Phi_names) - n_powerlaw - n_pysr - n_variant
        
        # Layer 4: Operator-Guided Terms (if operators detected)
        if self.include_operator_terms and len(self._detected_operators) > 0:
            self._add_operator_terms(X, Phi_columns, Phi_names)
        n_op = len(Phi_names) - n_powerlaw - n_pysr - n_variant - n_poly
        
        # Assemble library matrix
        if len(Phi_columns) == 0:
            # Edge case: no features generated, add constant term
            Phi_columns.append(np.ones(n_samples))
            Phi_names.append('[Poly] 1')
            n_poly = 1
        
        Phi_aug = np.column_stack(Phi_columns)
        
        # Handle numerical issues
        Phi_aug = self._handle_numerical_issues(Phi_aug)
        
        # Normalize if requested (skip constant column)
        if self.normalize:
            Phi_aug = self._normalize_features(Phi_aug, Phi_names)
        
        # Record library composition
        self.library_info = {
            'n_powerlaw_terms': n_powerlaw,
            'n_pysr_terms': n_pysr,
            'n_variant_terms': n_variant,
            'n_poly_terms': n_poly,
            'n_op_terms': n_op,
            'total_terms': len(Phi_names),
            'pysr_r2': pysr_r2,
            'lazy_mode': pysr_r2 >= self.lazy_variant_threshold,
            'detected_operators': list(self._detected_operators),
            'estimated_exponents': self._estimated_exponents
        }
        
        self._library_names = Phi_names
        self._n_library_features = Phi_aug.shape[1]
        self._build_complete = True
        
        return Phi_aug, Phi_names, self.library_info
    
    def _add_pysr_exact_terms(
        self,
        X: np.ndarray,
        Phi_columns: List,
        Phi_names: List
    ) -> None:
        """
        Layer 1: Add exact terms from PySR.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        Phi_columns : List
            List to append feature columns
        Phi_names : List
            List to append feature names
        """
        for expr, name, func in self._parsed_terms:
            try:
                # Evaluate the term
                values = func(*[X[:, i] for i in range(self._n_input_features)])
                
                # Validate and add
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[PySR] {name}')
            except Exception:
                # Skip terms that fail evaluation
                continue
    
    def _add_variant_terms(
        self,
        X: np.ndarray,
        Phi_columns: List,
        Phi_names: List
    ) -> None:
        """
        Layer 2: Add variant terms via variable substitution.
        
        For each PySR term, substitute each free variable with other variables
        to create variants that E-WSINDy can evaluate.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        Phi_columns : List
            List to append feature columns
        Phi_names : List
            List to append feature names
        """
        for expr, name, func in self._parsed_terms:
            variants_added = 0
            
            # Get free variables in this expression
            try:
                free_vars = list(expr.free_symbols)
            except Exception:
                continue
            
            # For each free variable, try substituting with other variables
            for var in free_vars:
                if variants_added >= self.max_variants_per_term:
                    break
                
                for other_symbol in self._feature_symbols:
                    if other_symbol == var:
                        continue
                    if variants_added >= self.max_variants_per_term:
                        break
                    
                    try:
                        # Create variant expression
                        variant_expr = expr.subs(var, other_symbol)
                        
                        # Skip if variant is same as original
                        if str(variant_expr) == str(expr):
                            continue
                        
                        # Create evaluation function
                        variant_func = lambdify(
                            self._feature_symbols, 
                            variant_expr, 
                            modules=['numpy']
                        )
                        
                        # Evaluate
                        values = variant_func(*[X[:, i] for i in range(self._n_input_features)])
                        
                        # Validate and add
                        if self._is_valid_feature(values, Phi_columns):
                            Phi_columns.append(values)
                            Phi_names.append(f'[Var] {str(variant_expr)}')
                            variants_added += 1
                    except Exception:
                        continue
    
    def _add_polynomial_terms(
        self,
        X: np.ndarray,
        Phi_columns: List,
        Phi_names: List
    ) -> None:
        """
        Layer 3: Add polynomial baseline terms.
        
        Includes constant, linear, and higher-order polynomial terms.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        Phi_columns : List
            List to append feature columns
        Phi_names : List
            List to append feature names
        """
        n_samples, n_features = X.shape
        
        # Constant term
        const_values = np.ones(n_samples)
        if self._is_valid_feature(const_values, Phi_columns):
            Phi_columns.append(const_values)
            Phi_names.append('[Poly] 1')
        
        # Linear terms
        for i, name in enumerate(self._feature_names):
            values = X[:, i].copy()
            if self._is_valid_feature(values, Phi_columns):
                Phi_columns.append(values)
                Phi_names.append(f'[Poly] {name}')
        
        # Higher degree terms
        for degree in range(2, self.max_poly_degree + 1):
            for combo in combinations_with_replacement(range(n_features), degree):
                # Compute product
                product = np.ones(n_samples)
                for idx in combo:
                    product = product * X[:, idx]
                
                # Build name
                term_name = self._build_polynomial_name(combo)
                
                # Validate and add
                if self._is_valid_feature(product, Phi_columns):
                    Phi_columns.append(product)
                    Phi_names.append(f'[Poly] {term_name}')
    
    def _add_operator_terms(
        self,
        X: np.ndarray,
        Phi_columns: List,
        Phi_names: List
    ) -> None:
        """
        Layer 4: Add operator-guided simple terms.
        
        Based on detected operators from PySR, add simple applications
        of those operators to each variable.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        Phi_columns : List
            List to append feature columns
        Phi_names : List
            List to append feature names
        """
        for var_idx, var_name in enumerate(self._feature_names):
            x = X[:, var_idx]
            
            # sin operator
            if 'sin' in self._detected_operators:
                values = np.sin(x)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[Op] sin({var_name})')
            
            # cos operator
            if 'cos' in self._detected_operators:
                values = np.cos(x)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[Op] cos({var_name})')
            
            # exp operator (both positive and negative)
            if 'exp' in self._detected_operators:
                for sign, sign_str in [(1, ''), (-1, '-')]:
                    # Clip to prevent overflow
                    clipped = np.clip(sign * x, -20, 20)
                    values = np.exp(clipped)
                    if self._is_valid_feature(values, Phi_columns):
                        Phi_columns.append(values)
                        Phi_names.append(f'[Op] exp({sign_str}{var_name})')
            
            # sqrt operator
            if 'sqrt' in self._detected_operators:
                values = np.sqrt(np.abs(x) + 1e-10)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[Op] sqrt(|{var_name}|)')
            
            # log operator
            if 'log' in self._detected_operators:
                values = np.log(np.abs(x) + 1e-10)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[Op] log(|{var_name}|)')
    
    def _add_powerlaw_terms(
        self,
        X: np.ndarray,
        Phi_columns: List,
        Phi_names: List,
        estimated_exponents: Dict[str, float]
    ) -> None:
        """
        Layer 0 (Highest Priority): Add power-law guided terms.
        
        When Stage 1 symmetry analysis detects power-law structure,
        this method adds terms that match the detected exponents.
        This is critical for equations like Coulomb's law (F = k*q1*q2/r^2)
        where the standard polynomial library lacks inverse terms.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix (n_samples, n_features)
        Phi_columns : List
            List to append feature columns
        Phi_names : List
            List to append feature names
        estimated_exponents : Dict[str, float]
            Mapping from variable name to estimated exponent
            e.g., {'q1': 1.0, 'q2': 1.0, 'r': -2.0}
        """
        n_samples = X.shape[0]
        
        # Identify variables with negative exponents (need inverse terms)
        negative_exp_vars = []
        positive_exp_vars = []
        
        for i, name in enumerate(self._feature_names):
            name_str = str(name)
            if name_str in estimated_exponents:
                exp = estimated_exponents[name_str]
                if exp < -0.5:
                    negative_exp_vars.append((i, name_str, exp))
                elif exp > 0.5:
                    positive_exp_vars.append((i, name_str, exp))
        
        # Add inverse terms for negative exponent variables
        for var_idx, var_name, exp in negative_exp_vars:
            x = np.abs(X[:, var_idx])
            x = np.clip(x, 1e-10, None)  # Avoid division by zero
            
            # 1/x term
            values = 1.0 / x
            if self._is_valid_feature(values, Phi_columns):
                Phi_columns.append(values)
                Phi_names.append(f'[PowLaw] 1/{var_name}')
            
            # 1/x^2 term (for exponents near -2)
            if exp < -1.5:
                values = 1.0 / (x ** 2)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[PowLaw] 1/{var_name}^2')
            
            # 1/x^3 term (for exponents near -3)
            if exp < -2.5:
                values = 1.0 / (x ** 3)
                if self._is_valid_feature(values, Phi_columns):
                    Phi_columns.append(values)
                    Phi_names.append(f'[PowLaw] 1/{var_name}^3')
        
        # Build the full power-law interaction term
        # e.g., q1^1 * q2^1 * r^(-2) = q1 * q2 / r^2
        if len(positive_exp_vars) > 0 or len(negative_exp_vars) > 0:
            # Compute product of all terms with their exponents
            power_product = np.ones(n_samples)
            numerator_parts = []
            denominator_parts = []
            
            for var_idx, var_name, exp in positive_exp_vars:
                x = X[:, var_idx]
                rounded_exp = round(exp)
                if rounded_exp == 1:
                    power_product *= x
                    numerator_parts.append(var_name)
                elif rounded_exp == 2:
                    power_product *= x ** 2
                    numerator_parts.append(f'{var_name}^2')
                elif rounded_exp > 0:
                    power_product *= np.power(np.abs(x), rounded_exp)
                    numerator_parts.append(f'{var_name}^{rounded_exp}')
            
            for var_idx, var_name, exp in negative_exp_vars:
                x = np.abs(X[:, var_idx])
                x = np.clip(x, 1e-10, None)
                rounded_exp = round(abs(exp))
                if rounded_exp == 1:
                    power_product /= x
                    denominator_parts.append(var_name)
                elif rounded_exp == 2:
                    power_product /= (x ** 2)
                    denominator_parts.append(f'{var_name}^2')
                else:
                    power_product /= (x ** rounded_exp)
                    denominator_parts.append(f'{var_name}^{rounded_exp}')
            
            # Build readable name
            if len(numerator_parts) > 0 and len(denominator_parts) > 0:
                combined_name = '*'.join(numerator_parts) + '/' + '*'.join(denominator_parts)
            elif len(numerator_parts) > 0:
                combined_name = '*'.join(numerator_parts)
            elif len(denominator_parts) > 0:
                combined_name = '1/' + '*'.join(denominator_parts)
            else:
                combined_name = '1'
            
            # Add the combined term
            if self._is_valid_feature(power_product, Phi_columns):
                Phi_columns.append(power_product)
                Phi_names.append(f'[PowLaw] {combined_name}')
            
            # Also add partial products (for robustness)
            # e.g., q1*q2 alone, without the r^-2
            if len(positive_exp_vars) >= 2:
                partial_product = np.ones(n_samples)
                partial_parts = []
                for var_idx, var_name, exp in positive_exp_vars:
                    x = X[:, var_idx]
                    partial_product *= x
                    partial_parts.append(var_name)
                
                partial_name = '*'.join(partial_parts)
                if self._is_valid_feature(partial_product, Phi_columns):
                    Phi_columns.append(partial_product)
                    Phi_names.append(f'[PowLaw] {partial_name}')
            
            # Add positive vars with just one inverse term (for exploration)
            if len(positive_exp_vars) >= 1 and len(negative_exp_vars) >= 1:
                for neg_idx, neg_name, neg_exp in negative_exp_vars:
                    x_neg = np.abs(X[:, neg_idx])
                    x_neg = np.clip(x_neg, 1e-10, None)
                    
                    for pos_idx, pos_name, pos_exp in positive_exp_vars:
                        x_pos = X[:, pos_idx]
                        
                        # x_pos / x_neg
                        values = x_pos / x_neg
                        term_name = f'{pos_name}/{neg_name}'
                        if self._is_valid_feature(values, Phi_columns):
                            Phi_columns.append(values)
                            Phi_names.append(f'[PowLaw] {term_name}')
                        
                        # x_pos / x_neg^2 (if exp is near -2)
                        if neg_exp < -1.5:
                            values = x_pos / (x_neg ** 2)
                            term_name = f'{pos_name}/{neg_name}^2'
                            if self._is_valid_feature(values, Phi_columns):
                                Phi_columns.append(values)
                                Phi_names.append(f'[PowLaw] {term_name}')
    
    def _build_polynomial_name(
        self,
        indices: Tuple[int, ...]
    ) -> str:
        """
        Build human-readable name for polynomial term.
        
        Parameters
        ----------
        indices : Tuple[int, ...]
            Tuple of feature indices (may have repeats)
        
        Returns
        -------
        str
            Name like "x^2" or "x*y"
        """
        counts = Counter(indices)
        
        parts = []
        for idx in sorted(counts.keys()):
            name = self._feature_names[idx]
            power = counts[idx]
            if power == 1:
                parts.append(name)
            else:
                parts.append(f'{name}^{power}')
        
        return '*'.join(parts)
    
    def _is_valid_feature(
        self,
        values: np.ndarray,
        existing_columns: List[np.ndarray]
    ) -> bool:
        """
        Check if feature values are valid (finite, non-constant, non-duplicate).
        
        Parameters
        ----------
        values : np.ndarray
            Feature values to check
        existing_columns : List[np.ndarray]
            List of existing feature columns
        
        Returns
        -------
        bool
            True if valid, False otherwise
        """
        # Ensure 1D array
        values = np.asarray(values).flatten()
        
        # Check finite
        if not np.all(np.isfinite(values)):
            return False
        
        # Check non-constant
        if np.std(values) < 1e-10:
            return False
        
        # Check non-duplicate
        for existing in existing_columns:
            if np.allclose(values, existing, rtol=1e-5, atol=1e-10):
                return False
        
        return True
    
    def _handle_numerical_issues(
        self,
        Phi: np.ndarray
    ) -> np.ndarray:
        """
        Handle NaN and Inf values in feature matrix.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature matrix
        
        Returns
        -------
        np.ndarray
            Cleaned feature matrix
        """
        # Replace NaN with 0
        Phi = np.nan_to_num(Phi, nan=0.0, posinf=1e10, neginf=-1e10)
        
        # Clip extreme values
        Phi = np.clip(Phi, -1e10, 1e10)
        
        return Phi
    
    def _normalize_features(
        self,
        Phi: np.ndarray,
        Phi_names: List[str]
    ) -> np.ndarray:
        """
        Standardize features to zero mean, unit variance.
        Skips the constant column.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature matrix
        Phi_names : List[str]
            Feature names (to identify constant column)
        
        Returns
        -------
        np.ndarray
            Normalized feature matrix
        """
        # Find constant column index
        const_idx = None
        for i, name in enumerate(Phi_names):
            if name == '[Poly] 1':
                const_idx = i
                break
        
        # Create scaler
        self.scaler = StandardScaler()
        
        if const_idx is not None:
            # Normalize non-constant columns only
            non_const_mask = np.ones(Phi.shape[1], dtype=bool)
            non_const_mask[const_idx] = False
            
            if np.sum(non_const_mask) > 0:
                Phi_normalized = Phi.copy()
                Phi_normalized[:, non_const_mask] = self.scaler.fit_transform(
                    Phi[:, non_const_mask]
                )
                return Phi_normalized
        
        # Normalize all columns if no constant found
        return self.scaler.fit_transform(Phi)
    
    def build_from_pysr_results(
        self,
        X: np.ndarray,
        feature_names: List[str],
        stage2_partial: 'Stage2Results'
    ) -> Tuple[np.ndarray, List[str], Dict]:
        """
        Build augmented library from partial Stage2Results.
        
        Convenience method that extracts parsed_terms and detected_operators
        from a Stage2Results object that has PySR and Structure Parsing completed.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        feature_names : List[str]
            Feature names
        stage2_partial : Stage2Results
            Partially populated Stage2Results with PySR output
        
        Returns
        -------
        Tuple[np.ndarray, List[str], Dict]
            Augmented library matrix, names, and info
        """
        return self.build(
            X=X,
            feature_names=feature_names,
            parsed_terms=stage2_partial.parsed_terms,
            detected_operators=stage2_partial.detected_operators,
            pysr_r2=stage2_partial.best_pysr_r2 or 0.0
        )
    
    def transform(
        self,
        X_new: np.ndarray
    ) -> np.ndarray:
        """
        Transform new data using the fitted library structure.
        
        Parameters
        ----------
        X_new : np.ndarray
            New feature matrix (n_samples_new, n_features)
        
        Returns
        -------
        np.ndarray
            Transformed feature matrix (n_samples_new, K)
        
        Raises
        ------
        RuntimeError
            If build() has not been called
        """
        if not self._build_complete:
            raise RuntimeError('Must call build() before transform()')
        
        # Rebuild library for new data (without fitting scaler)
        n_samples = X_new.shape[0]
        Phi_columns = []
        
        # Regenerate features in same order
        for name in self._library_names:
            values = self._evaluate_term(X_new, name)
            Phi_columns.append(values)
        
        Phi_new = np.column_stack(Phi_columns)
        Phi_new = self._handle_numerical_issues(Phi_new)
        
        # Apply saved scaler
        if self.normalize and self.scaler is not None:
            const_idx = None
            for i, name in enumerate(self._library_names):
                if name == '[Poly] 1':
                    const_idx = i
                    break
            
            if const_idx is not None:
                non_const_mask = np.ones(Phi_new.shape[1], dtype=bool)
                non_const_mask[const_idx] = False
                if np.sum(non_const_mask) > 0:
                    Phi_new[:, non_const_mask] = self.scaler.transform(
                        Phi_new[:, non_const_mask]
                    )
            else:
                Phi_new = self.scaler.transform(Phi_new)
        
        return Phi_new
    
    def _evaluate_term(
        self,
        X: np.ndarray,
        name: str
    ) -> np.ndarray:
        """
        Evaluate a single term by its name.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        name : str
            Term name with source tag
        
        Returns
        -------
        np.ndarray
            Evaluated values
        """
        n_samples = X.shape[0]
        
        # Parse source tag and expression
        if name.startswith('[Poly] 1'):
            return np.ones(n_samples)
        
        # Extract expression after tag (v4.1: Added [PowLaw])
        for tag in ['[PowLaw] ', '[PySR] ', '[Var] ', '[Poly] ', '[Op] ']:
            if name.startswith(tag):
                expr_str = name[len(tag):]
                break
        else:
            expr_str = name
        
        # Try to evaluate using SymPy
        try:
            local_dict = {n: Symbol(n) for n in self._feature_names}
            local_dict.update({
                'sqrt': sp.sqrt, 'exp': sp.exp, 'log': sp.log,
                'sin': sp.sin, 'cos': sp.cos, 'abs': sp.Abs
            })
            
            expr = sympify(expr_str, locals=local_dict)
            func = lambdify(self._feature_symbols, expr, modules=['numpy'])
            return func(*[X[:, i] for i in range(X.shape[1])])
        except Exception:
            # Fallback: return zeros
            return np.zeros(n_samples)
    
    def get_library_names(self) -> List[str]:
        """
        Get list of library feature names.
        
        Returns
        -------
        List[str]
            Feature names with source tags
        
        Raises
        ------
        RuntimeError
            If build() has not been called
        """
        if not self._build_complete:
            raise RuntimeError('Must call build() before getting library names')
        return self._library_names.copy()
    
    def get_feature_count(self) -> int:
        """
        Get total number of library features.
        
        Returns
        -------
        int
            Number of features in library
        
        Raises
        ------
        RuntimeError
            If build() has not been called
        """
        if not self._build_complete:
            raise RuntimeError('Must call build() before getting feature count')
        return self._n_library_features
    
    def print_library_summary(self) -> None:
        """
        Print detailed library composition report in v4.1 format.
        """
        if not self._build_complete:
            print('Library not yet built. Call build() first.')
            return
        
        print('=' * 70)
        print('=== Augmented Library Construction (v4.1 Enhanced) ===')
        print('=' * 70)
        print()
        print('Layer Composition:')
        print(f"  Layer 0 (Power-Law):    {self.library_info.get('n_powerlaw_terms', 0):>4} terms")
        print(f"  Layer 1 (PySR Exact):   {self.library_info['n_pysr_terms']:>4} terms")
        print(f"  Layer 2 (Variants):     {self.library_info['n_variant_terms']:>4} terms")
        print(f"  Layer 3 (Polynomial):   {self.library_info['n_poly_terms']:>4} terms")
        print(f"  Layer 4 (Operator):     {self.library_info['n_op_terms']:>4} terms")
        print('-' * 40)
        print(f"  Total:                  {self.library_info['total_terms']:>4} terms")
        print()
        print('Configuration:')
        print(f"  Max polynomial degree: {self.max_poly_degree}")
        print(f"  PySR R2: {self.library_info['pysr_r2']:.4f}")
        print(f"  Lazy mode: {self.library_info['lazy_mode']}")
        if self.library_info.get('estimated_exponents'):
            print(f"  Power-law exponents: {self.library_info['estimated_exponents']}")
        if self.library_info['detected_operators']:
            print(f"  Detected operators: {self.library_info['detected_operators']}")
        print()
        print('Sample Feature Names:')
        # Show up to 3 from each layer
        for tag, layer_name in [('[PowLaw]', 'Power-Law'), ('[PySR]', 'PySR'), 
                                 ('[Var]', 'Variant'), ('[Poly]', 'Polynomial'), 
                                 ('[Op]', 'Operator')]:
            layer_features = [n for n in self._library_names if n.startswith(tag)]
            if layer_features:
                print(f"  {layer_name}:")
                for name in layer_features[:3]:
                    print(f"    {name}")
                if len(layer_features) > 3:
                    print(f"    ... ({len(layer_features) - 3} more)")
        print()
        print('=' * 70)

print('AugmentedLibraryBuilder class defined.')

In [None]:
# ==============================================================================
# BACKWARD COMPATIBILITY ALIAS
# ==============================================================================

# v3.0 class name alias for backward compatibility
FeatureLibraryBuilder = AugmentedLibraryBuilder

print('FeatureLibraryBuilder alias defined for backward compatibility.')

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print('=' * 70)
    print(' RUNNING INTERNAL TESTS FOR 05_FeatureLibrary v4.1')
    print('=' * 70)

In [None]:
# ==============================================================================
# TEST 1: Layer 1 - PySR Exact Terms
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 1: Layer 1 - PySR Exact Terms')
    
    # Generate test data
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    # Simulate PySR parsed terms: sin(x**2), y*exp(-z), x*y
    x_sym, y_sym, z_sym = Symbol('x'), Symbol('y'), Symbol('z')
    
    term1_expr = sp.sin(x_sym**2)
    term1_func = lambdify([x_sym, y_sym, z_sym], term1_expr, modules=['numpy'])
    
    term2_expr = y_sym * sp.exp(-z_sym)
    term2_func = lambdify([x_sym, y_sym, z_sym], term2_expr, modules=['numpy'])
    
    term3_expr = x_sym * y_sym
    term3_func = lambdify([x_sym, y_sym, z_sym], term3_expr, modules=['numpy'])
    
    parsed_terms = [
        (term1_expr, 'sin(x**2)', term1_func),
        (term2_expr, 'y*exp(-z)', term2_func),
        (term3_expr, 'x*y', term3_func)
    ]
    
    detected_operators = {'sin', 'exp'}
    
    # Build library with only Layer 1 (disable other layers)
    builder = AugmentedLibraryBuilder(
        max_poly_degree=1,  # Minimal polynomial
        generate_variants=False,
        include_operator_terms=False,
        normalize=False
    )
    
    Phi, names, info = builder.build(
        X, feature_names,
        parsed_terms=parsed_terms,
        detected_operators=detected_operators,
        pysr_r2=0.85
    )
    
    print(f'Library shape: {Phi.shape}')
    print(f'Library info: {info}')
    print()
    print('Feature names:')
    for name in names:
        print(f'  {name}')
    
    # Check Layer 1 terms are present
    pysr_names = [n for n in names if n.startswith('[PySR]')]
    print()
    if len(pysr_names) == 3:
        print('[PASS] All 3 PySR exact terms added to library')
    else:
        print(f'[WARNING] Expected 3 PySR terms, got {len(pysr_names)}')

In [None]:
# ==============================================================================
# TEST 2: Layer 2 - Variant Terms Generation
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 2: Layer 2 - Variant Terms Generation')
    
    # Use same data
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    # Single PySR term: sin(x**2)
    x_sym, y_sym, z_sym = Symbol('x'), Symbol('y'), Symbol('z')
    term_expr = sp.sin(x_sym**2)
    term_func = lambdify([x_sym, y_sym, z_sym], term_expr, modules=['numpy'])
    
    parsed_terms = [(term_expr, 'sin(x**2)', term_func)]
    
    # Build with variant generation
    builder = AugmentedLibraryBuilder(
        max_poly_degree=1,
        generate_variants=True,
        max_variants_per_term=3,
        include_operator_terms=False,
        normalize=False
    )
    
    Phi, names, info = builder.build(
        X, feature_names,
        parsed_terms=parsed_terms,
        detected_operators=set(),
        pysr_r2=0.70  # Low R2 triggers variant generation
    )
    
    print(f'Library info: {info}')
    print()
    print('All features:')
    for name in names:
        print(f'  {name}')
    
    # Check variants are generated
    variant_names = [n for n in names if n.startswith('[Var]')]
    print()
    if len(variant_names) > 0:
        print(f'[PASS] Generated {len(variant_names)} variant terms')
        # Check expected variants: sin(y**2), sin(z**2)
        expected_variants = ['sin(y**2)', 'sin(z**2)']
        found = [v for v in expected_variants if any(v in n for n in variant_names)]
        print(f'  Found expected variants: {found}')
    else:
        print('[WARNING] No variant terms generated')

In [None]:
# ==============================================================================
# TEST 3: Layer 3 - Polynomial Baseline
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 3: Layer 3 - Polynomial Baseline')
    
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    # Build with polynomial only (no PySR input)
    builder = AugmentedLibraryBuilder(
        max_poly_degree=2,
        generate_variants=False,
        include_operator_terms=False,
        normalize=False
    )
    
    Phi, names, info = builder.build(
        X, feature_names,
        parsed_terms=None,  # No PySR terms
        detected_operators=None
    )
    
    print(f'Library info: {info}')
    print()
    print('Polynomial features:')
    for name in names:
        print(f'  {name}')
    
    # Expected: 1 + 3 linear + 6 quadratic = 10 terms
    expected_count = 1 + 3 + 6
    print()
    if info['n_poly_terms'] == expected_count:
        print(f'[PASS] Correct polynomial count: {expected_count}')
    else:
        print(f'[WARNING] Expected {expected_count} polynomial terms, got {info["n_poly_terms"]}')

In [None]:
# ==============================================================================
# TEST 4: Layer 4 - Operator-Guided Terms
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 4: Layer 4 - Operator-Guided Terms')
    
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    # Detected operators
    detected_operators = {'sin', 'cos', 'exp'}
    
    # Build with operator terms
    builder = AugmentedLibraryBuilder(
        max_poly_degree=1,
        generate_variants=False,
        include_operator_terms=True,
        normalize=False
    )
    
    Phi, names, info = builder.build(
        X, feature_names,
        parsed_terms=None,
        detected_operators=detected_operators
    )
    
    print(f'Library info: {info}')
    print()
    print('Operator-guided features:')
    op_names = [n for n in names if n.startswith('[Op]')]
    for name in op_names:
        print(f'  {name}')
    
    # Expected: sin(x,y,z) + cos(x,y,z) + exp(x,-x,y,-y,z,-z) = 3 + 3 + 6 = 12
    print()
    if info['n_op_terms'] > 0:
        print(f'[PASS] Generated {info["n_op_terms"]} operator-guided terms')
    else:
        print('[WARNING] No operator-guided terms generated')

In [None]:
# ==============================================================================
# TEST 5: Full 4-Layer Library Construction
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 5: Full 4-Layer Library Construction')
    
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    # Simulate complete PySR output
    x_sym, y_sym, z_sym = Symbol('x'), Symbol('y'), Symbol('z')
    
    term1 = sp.sin(x_sym**2)
    term2 = y_sym * sp.exp(-z_sym)
    
    parsed_terms = [
        (term1, 'sin(x**2)', lambdify([x_sym, y_sym, z_sym], term1, modules=['numpy'])),
        (term2, 'y*exp(-z)', lambdify([x_sym, y_sym, z_sym], term2, modules=['numpy']))
    ]
    
    detected_operators = {'sin', 'exp'}
    
    # Full build
    builder = AugmentedLibraryBuilder(
        max_poly_degree=2,
        generate_variants=True,
        max_variants_per_term=2,
        include_operator_terms=True,
        normalize=True
    )
    
    Phi, names, info = builder.build(
        X, feature_names,
        parsed_terms=parsed_terms,
        detected_operators=detected_operators,
        pysr_r2=0.80
    )
    
    # Print full summary
    builder.print_library_summary()
    
    # Verification
    print()
    print('Verification:')
    all_checks_pass = True
    
    if info['n_pysr_terms'] > 0:
        print(f'  [PASS] Layer 1: {info["n_pysr_terms"]} PySR terms')
    else:
        print('  [WARNING] Layer 1: No PySR terms')
        all_checks_pass = False
    
    if info['n_variant_terms'] > 0:
        print(f'  [PASS] Layer 2: {info["n_variant_terms"]} variant terms')
    else:
        print('  [INFO] Layer 2: No variant terms (may be expected)')
    
    if info['n_poly_terms'] > 0:
        print(f'  [PASS] Layer 3: {info["n_poly_terms"]} polynomial terms')
    else:
        print('  [WARNING] Layer 3: No polynomial terms')
        all_checks_pass = False
    
    if info['n_op_terms'] > 0:
        print(f'  [PASS] Layer 4: {info["n_op_terms"]} operator terms')
    else:
        print('  [INFO] Layer 4: No operator terms')
    
    # Check no duplicates
    has_nan = np.any(np.isnan(Phi))
    has_inf = np.any(np.isinf(Phi))
    if not has_nan and not has_inf:
        print('  [PASS] No NaN/Inf values')
    else:
        print('  [WARNING] Contains invalid values')
        all_checks_pass = False
    
    print()
    if all_checks_pass:
        print('All critical checks passed!')
    else:
        print('Some checks failed - review output above.')

In [None]:
# ==============================================================================
# TEST 6: Lazy Variant Generation (High PySR R2)
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header('Test 6: Lazy Variant Generation (High PySR R2)')
    
    np.random.seed(42)
    n_samples = 100
    X = np.random.uniform(0.1, 2.0, (n_samples, 3))
    feature_names = ['x', 'y', 'z']
    
    x_sym, y_sym, z_sym = Symbol('x'), Symbol('y'), Symbol('z')
    term = sp.sin(x_sym**2)
    parsed_terms = [(term, 'sin(x**2)', lambdify([x_sym, y_sym, z_sym], term, modules=['numpy']))]
    
    # High R2 should skip variants
    builder = AugmentedLibraryBuilder(
        max_poly_degree=1,
        generate_variants=True,  # Enabled but should be skipped
        lazy_variant_threshold=0.95,
        include_operator_terms=False,
        normalize=False
    )
    
    _, _, info = builder.build(
        X, feature_names,
        parsed_terms=parsed_terms,
        pysr_r2=0.98  # High R2 triggers lazy mode
    )
    
    print(f'PySR R2: 0.98')
    print(f'Lazy mode: {info["lazy_mode"]}')
    print(f'Variant terms: {info["n_variant_terms"]}')
    print()
    
    if info['lazy_mode'] and info['n_variant_terms'] == 0:
        print('[PASS] Lazy mode correctly skipped variant generation')
    else:
        print('[WARNING] Lazy mode did not work as expected')

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print('=' * 70)
print(' 05_FeatureLibrary.ipynb v4.1 - Module Summary')
print('=' * 70)
print()
print('CLASS: AugmentedLibraryBuilder')
print('-' * 70)
print()
print('Purpose:')
print('  Build 5-layer augmented feature library combining physics-guided terms,')
print('  PySR discoveries, and baseline polynomial features for E-WSINDy.')
print()
print('5-Layer Architecture:')
print('  Layer 0 [PowLaw]: Power-law guided terms from Stage 1 symmetry (HIGHEST)')
print('  Layer 1 [PySR]:   Exact terms from PySR Pareto front')
print('  Layer 2 [Var]:    Variant terms via variable substitution')
print('  Layer 3 [Poly]:   Polynomial baseline (always included)')
print('  Layer 4 [Op]:     Operator-guided simple terms')
print()
print('Main Methods:')
print('  build(X, feature_names, parsed_terms, detected_operators, pysr_r2, estimated_exponents)')
print('      Build complete augmented library')
print('      Returns: (Phi_aug, library_names, library_info)')
print()
print('  build_from_pysr_results(X, feature_names, stage2_partial)')
print('      Build from partial Stage2Results object')
print()
print('  transform(X_new)')
print('      Transform new data using fitted library structure')
print()
print('  print_library_summary()')
print('      Print detailed library composition report')
print()
print('Key Parameters:')
print('  max_poly_degree: Maximum polynomial degree (default: 3)')
print('  generate_variants: Enable Layer 2 variants (default: True)')
print('  include_operator_terms: Enable Layer 4 (default: True)')
print('  lazy_variant_threshold: Skip variants if R2 >= threshold (default: 0.95)')
print()
print('Output Dictionary (library_info):')
print('  n_powerlaw_terms, n_pysr_terms, n_variant_terms, n_poly_terms, n_op_terms')
print('  total_terms, pysr_r2, lazy_mode, detected_operators, estimated_exponents')
print()
print('Usage Example:')
print('-' * 70)
print("""
# Create builder
builder = AugmentedLibraryBuilder(
    max_poly_degree=3,
    generate_variants=True
)

# Build library with power-law guidance from Stage 1
Phi, names, info = builder.build(
    X, feature_names,
    parsed_terms=[(expr, 'sin(x**2)', func), ...],
    detected_operators={'sin', 'exp'},
    pysr_r2=0.85,
    estimated_exponents={'q1': 1.0, 'r': -2.0}  # From Stage 1 symmetry
)

# Print summary
builder.print_library_summary()
""")
print()
print('Backward Compatibility:')
print('  FeatureLibraryBuilder = AugmentedLibraryBuilder (alias)')
print()
print('=' * 70)
print('Module loaded successfully. Import via: %run 05_FeatureLibrary.ipynb')
print('=' * 70)