# 07_EWSINDy_STLSQ - Physics-SR Framework v4.1

## Stage 2.4: E-WSINDy Sparse Selection on Augmented Library

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Contact:** zz3239@columbia.edu  
**Date:** January 2026  
**Version:** 4.1 (Structure-Guided Feature Library Enhancement + Computational Optimization)

---

### Purpose

Noise-robust equation discovery via weak-form sparse regression on augmented library.
This is a **MODIFIED** module for v4.1 with source attribution.

### v4.1 Modifications

| Feature | v3.0 | v4.1 |
|---------|------|------|
| Input library | Standard | Augmented with source tags |
| Source tracking | None | analyze_selection_sources() |
| Output | Basic | + selection_analysis |
| Column normalization | Fixed | normalize_columns parameter |

### Weak Form Theory

**Strong form** (noise-sensitive):
$$\frac{\partial q}{\partial t} = f(q, \nabla q, \nabla^2 q)$$

**Weak form** (noise-robust): Multiply by test function $\psi$ and integrate by parts:
$$\int \psi \cdot \nabla^2 q \, dx = -\int \nabla\psi \cdot \nabla q \, dx + \text{boundary terms}$$

**Result:** Derivatives transferred from noisy data $q$ to smooth test function $\psi$.

### Key Properties of v4.0/v4.1 Design

1. **Can KEEP correct PySR terms:** If PySR found sin(x^2) and it's correct, E-WSINDy will select it
2. **Can DISCOVER missed terms:** If PySR missed x*z, E-WSINDy can find it in polynomial layer
3. **Can REJECT errors:** If PySR included spurious term, E-WSINDy's sparsity can exclude it

### Reference

- Messenger, D. A., & Bortz, D. M. (2021). Weak SINDy for PDEs. *JCP*, 443, 110525.
- Framework v4.0/v4.1 Section 4.4: E-WSINDy Sparse Selection

---
## Section 1: Header and Imports

In [None]:
"""
07_EWSINDy_STLSQ.ipynb - E-WSINDy with STLSQ on Augmented Library
===================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v4.1

This module provides:
- EWSINDySTLSQ: Weak-form SINDy with STLSQ sparse regression
- Source attribution via analyze_selection_sources()
- 50-1000x noise robustness improvement over finite differences
- Exact sparsity (true zeros) via iterative thresholding

v4.1 Key Changes from v3.0:
- Now accepts library_names with source tags [PySR], [Var], [Poly], [Op]
- New method: analyze_selection_sources()
- Returns selection_analysis in output dictionary
- New parameter: normalize_columns (default: True)

Output Format:
- coefficients, support, equation, r_squared (same as v3.0)
- selection_analysis: Dict with from_pysr, from_variant, from_poly, from_op counts

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
Contact: zz3239@columbia.edu
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for E-WSINDy
from scipy import integrate
from scipy.interpolate import interp1d
from sklearn.linear_model import Lasso, Ridge
from typing import Dict, List, Tuple, Optional, Any

print("07_EWSINDy_STLSQ v4.1: Additional imports successful.")

---
## Section 2: Class Definition

In [None]:
# ==============================================================================
# E-WSINDY STLSQ CLASS (v4.1 MODIFIED)
# ==============================================================================

class EWSINDySTLSQ:
    """
    E-WSINDy with STLSQ: Weak-form Sparse Regression (v4.1 Modified).
    
    Now operates on augmented library from v4.0 and includes
    source attribution for selected terms.
    
    Provides 50-1000x noise improvement over strong-form methods.
    
    Attributes
    ----------
    threshold : float
        STLSQ sparsity threshold (default: 0.1)
    max_iter : int
        Maximum STLSQ iterations (default: 20)
    n_test_functions : int
        Number of test functions for weak form (default: 50)
    test_function_type : str
        Type of test function: 'gaussian' or 'polynomial'
    test_function_width : float
        Width parameter for test functions (default: 0.1)
    use_weak_form : bool
        Whether to use weak form transformation (default: True)
    normalize_columns : bool
        Whether to normalize feature columns (default: True, v4.1)
    
    Methods
    -------
    fit(feature_library, y, library_names, t) -> Dict
        Fit E-WSINDy model using STLSQ
    analyze_selection_sources(support, library_names) -> Dict
        Analyze where selected terms originated (v4.1)
    get_equation() -> str
        Get string representation of equation
    predict(Phi_new) -> np.ndarray
        Make predictions
    
    Reference
    ---------
    Messenger & Bortz (2021). Multiscale Modeling & Simulation.
    Framework v4.0/v4.1 Section 4.4: E-WSINDy Sparse Selection
    
    Examples
    --------
    >>> model = EWSINDySTLSQ(threshold=0.1)
    >>> result = model.fit(Phi_aug, y, library_names=names)
    >>> print(f"Selection analysis: {result['selection_analysis']}")
    """
    
    def __init__(
        self,
        threshold: float = DEFAULT_STLSQ_THRESHOLD,
        max_iter: int = DEFAULT_STLSQ_MAX_ITER,
        n_test_functions: int = 50,
        test_function_type: str = 'gaussian',
        test_function_width: float = 0.1,
        use_weak_form: bool = True,
        normalize_columns: bool = True
    ):
        """
        Initialize EWSINDySTLSQ.
        
        Parameters
        ----------
        threshold : float
            Coefficients with |value| < threshold are set to zero.
            Default: 0.1
        max_iter : int
            Maximum number of STLSQ iterations.
            Default: 20
        n_test_functions : int
            Number of test functions for weak form.
            Default: 50
        test_function_type : str
            'gaussian' for Gaussian bumps, 'polynomial' for polynomial.
            Default: 'gaussian'
        test_function_width : float
            Width of Gaussian bumps (as fraction of domain).
            Default: 0.1
        use_weak_form : bool
            Whether to use weak form (True) or standard form (False).
            Default: True
        normalize_columns : bool
            Whether to normalize feature library columns (v4.1).
            Default: True
        """
        self.threshold = threshold
        self.max_iter = max_iter
        self.n_test_functions = n_test_functions
        self.test_function_type = test_function_type
        self.test_function_width = test_function_width
        self.use_weak_form = use_weak_form
        self.normalize_columns = normalize_columns
        
        # Internal state
        self._coefficients = None
        self._support = None
        self._library_names = None
        self._n_features = None
        self._n_iterations = 0
        self._convergence_history = []
        self._fit_complete = False
        self._r2_score = None
        self._mse = None
        self._selection_analysis = None
        self._column_scales = None
    
    def fit(
        self,
        feature_library: np.ndarray,
        y: np.ndarray,
        t: np.ndarray = None,
        library_names: List[str] = None
    ) -> Dict[str, Any]:
        """
        Fit E-WSINDy model using STLSQ.
        
        Parameters
        ----------
        feature_library : np.ndarray
            Feature library (augmented or standard)
        y : np.ndarray
            Target vector
        t : np.ndarray, optional
            Time vector for weak form (if None, uses indices)
        library_names : List[str], optional
            Feature names with source tags for attribution
            
        Returns
        -------
        Dict
            - coefficients: Sparse coefficient vector
            - support: Boolean mask of active terms
            - equation: Formatted equation string
            - selection_analysis: Source attribution (v4.1)
            - r_squared: Coefficient of determination
            - n_iterations: Convergence iterations
            - weak_form_Q: Weak-form feature matrix (if used)
            - weak_form_b: Weak-form target vector (if used)
        """
        n_samples, n_features = feature_library.shape
        self._n_features = n_features
        
        # Set feature names
        if library_names is None:
            self._library_names = [f'f{i}' for i in range(n_features)]
        else:
            self._library_names = list(library_names)
        
        # Generate time vector if not provided
        if t is None:
            t = self._generate_time_vector(n_samples)
        
        # Normalize columns if requested (v4.1)
        Phi_normalized = feature_library.copy()
        if self.normalize_columns:
            Phi_normalized, self._column_scales = self._normalize_library(feature_library)
        else:
            self._column_scales = np.ones(n_features)
        
        # Apply weak form transformation if enabled
        if self.use_weak_form:
            Q, b = self._weak_form_transform(Phi_normalized, y, t)
        else:
            # Standard form: direct regression
            Q = Phi_normalized
            b = y
        
        # Run STLSQ
        self._coefficients = self._stlsq_iteration(Q, b)
        
        # Rescale coefficients if normalized
        if self.normalize_columns:
            self._coefficients = self._coefficients / self._column_scales
        
        self._support = np.abs(self._coefficients) > 0
        
        # Compute metrics on original scale
        y_pred = feature_library @ self._coefficients
        self._mse = np.mean((y - y_pred)**2)
        ss_tot = np.sum((y - np.mean(y))**2)
        ss_res = np.sum((y - y_pred)**2)
        self._r2_score = 1 - ss_res / ss_tot if ss_tot > 0 else 0.0
        
        # Analyze selection sources (v4.1)
        self._selection_analysis = self.analyze_selection_sources(
            self._support, self._library_names
        )
        
        self._fit_complete = True
        
        result = {
            'coefficients': self._coefficients,
            'support': self._support,
            'equation': self.get_equation(),
            'selection_analysis': self._selection_analysis,
            'n_active_terms': int(np.sum(self._support)),
            'n_iterations': self._n_iterations,
            'r_squared': self._r2_score,
            'r2_score': self._r2_score,  # Alias for compatibility
            'mse': self._mse,
            'convergence_history': self._convergence_history,
            'threshold': self.threshold
        }
        
        # Add weak form matrices if used
        if self.use_weak_form:
            result['weak_form_Q'] = Q
            result['weak_form_b'] = b
        
        return result
    
    def analyze_selection_sources(
        self,
        support: np.ndarray,
        library_names: List[str]
    ) -> Dict[str, int]:
        """
        Analyze where selected terms originated (v4.1).
        
        Parameters
        ----------
        support : np.ndarray
            Boolean mask of selected terms
        library_names : List[str]
            Feature names with source tags
            
        Returns
        -------
        Dict[str, int]
            - from_pysr: Count of [PySR] terms selected
            - from_variant: Count of [Var] terms selected
            - from_poly: Count of [Poly] terms selected
            - from_op: Count of [Op] terms selected
            - total_selected: Total selected terms
        """
        sources = {
            'from_pysr': 0,
            'from_variant': 0,
            'from_poly': 0,
            'from_op': 0,
            'from_unknown': 0,
            'total_selected': 0
        }
        
        selected_indices = np.where(support)[0]
        sources['total_selected'] = len(selected_indices)
        
        for idx in selected_indices:
            if idx >= len(library_names):
                sources['from_unknown'] += 1
                continue
                
            name = library_names[idx]
            if name.startswith('[PySR]'):
                sources['from_pysr'] += 1
            elif name.startswith('[Var]'):
                sources['from_variant'] += 1
            elif name.startswith('[Poly]'):
                sources['from_poly'] += 1
            elif name.startswith('[Op]'):
                sources['from_op'] += 1
            else:
                sources['from_unknown'] += 1
        
        return sources
    
    def _normalize_library(
        self,
        Phi: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Normalize feature library columns to unit variance.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature library matrix
            
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (normalized_Phi, column_scales)
        """
        scales = np.std(Phi, axis=0)
        scales[scales < 1e-10] = 1.0  # Avoid division by zero
        
        normalized = Phi / scales
        return normalized, scales
    
    def _generate_time_vector(
        self,
        n_samples: int
    ) -> np.ndarray:
        """
        Generate uniform time vector.
        
        Parameters
        ----------
        n_samples : int
            Number of samples
        
        Returns
        -------
        np.ndarray
            Time vector from 0 to 1
        """
        return np.linspace(0, 1, n_samples)
    
    def _generate_test_functions(
        self,
        t: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Generate test functions and their derivatives.
        
        Parameters
        ----------
        t : np.ndarray
            Time vector
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            - psi: Test functions of shape (n_test, n_samples)
            - dpsi: Derivatives of shape (n_test, n_samples)
        """
        n_samples = len(t)
        t_min, t_max = t.min(), t.max()
        t_range = t_max - t_min
        
        # Centers for test functions (avoid boundaries)
        centers = np.linspace(
            t_min + 0.1 * t_range,
            t_max - 0.1 * t_range,
            self.n_test_functions
        )
        
        width = self.test_function_width * t_range
        
        psi = np.zeros((self.n_test_functions, n_samples))
        dpsi = np.zeros((self.n_test_functions, n_samples))
        
        for m, center in enumerate(centers):
            if self.test_function_type == 'gaussian':
                psi[m], dpsi[m] = self._gaussian_bump(t, center, width)
            else:
                psi[m], dpsi[m] = self._polynomial_bump(t, center, width)
        
        return psi, dpsi
    
    def _gaussian_bump(
        self,
        t: np.ndarray,
        center: float,
        width: float
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Generate Gaussian bump test function.
        
        psi(t) = exp(-(t - center)^2 / (2 * width^2))
        dpsi(t) = -(t - center) / width^2 * psi(t)
        
        Parameters
        ----------
        t : np.ndarray
            Time vector
        center : float
            Center of Gaussian
        width : float
            Width (standard deviation)
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (psi, dpsi)
        """
        z = (t - center) / width
        psi = np.exp(-0.5 * z**2)
        dpsi = -z / width * psi
        return psi, dpsi
    
    def _polynomial_bump(
        self,
        t: np.ndarray,
        center: float,
        width: float
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Generate polynomial bump test function.
        
        Uses (1 - ((t-center)/width)^2)^4 for compact support.
        
        Parameters
        ----------
        t : np.ndarray
            Time vector
        center : float
            Center of bump
        width : float
            Half-width of support
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (psi, dpsi)
        """
        z = (t - center) / width
        mask = np.abs(z) < 1
        
        psi = np.zeros_like(t)
        dpsi = np.zeros_like(t)
        
        psi[mask] = (1 - z[mask]**2)**4
        dpsi[mask] = -8 * z[mask] / width * (1 - z[mask]**2)**3
        
        return psi, dpsi
    
    def _weak_form_transform(
        self,
        Phi: np.ndarray,
        y: np.ndarray,
        t: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Apply weak form transformation.
        
        Q[m,k] = integral(psi_m * Phi_k) dt
        b[m] = -integral(dpsi_m * y) dt  (integration by parts)
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature library (n_samples, n_features)
        y : np.ndarray
            Target vector (n_samples,)
        t : np.ndarray
            Time vector (n_samples,)
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (Q, b) - Weak form matrices
        """
        n_samples, n_features = Phi.shape
        
        # Generate test functions
        psi, dpsi = self._generate_test_functions(t)
        
        # Compute weak form matrices via numerical integration
        Q = np.zeros((self.n_test_functions, n_features))
        b = np.zeros(self.n_test_functions)
        
        for m in range(self.n_test_functions):
            # Q[m, k] = integral(psi_m * Phi_k)
            for k in range(n_features):
                Q[m, k] = np.trapz(psi[m] * Phi[:, k], t)
            
            # b[m] = -integral(dpsi_m * y) (integration by parts)
            b[m] = -np.trapz(dpsi[m] * y, t)
        
        return Q, b
    
    def _stlsq_iteration(
        self,
        Q: np.ndarray,
        b: np.ndarray
    ) -> np.ndarray:
        """
        Sequentially Thresholded Least Squares (STLSQ).
        
        Algorithm:
            1. Initialize with OLS solution
            2. Threshold small coefficients to zero
            3. Refit OLS on remaining support
            4. Repeat until convergence
        
        Parameters
        ----------
        Q : np.ndarray
            Design matrix (n_equations, n_features)
        b : np.ndarray
            Target vector (n_equations,)
        
        Returns
        -------
        np.ndarray
            Sparse coefficient vector
        """
        n_features = Q.shape[1]
        self._convergence_history = []
        
        # Step 1: Initialize with regularized OLS (for stability)
        try:
            ridge = Ridge(alpha=1e-6, fit_intercept=False)
            ridge.fit(Q, b)
            xi = ridge.coef_
        except Exception:
            xi = np.linalg.lstsq(Q, b, rcond=None)[0]
        
        self._convergence_history.append(xi.copy())
        
        # Step 2-4: Iterative thresholding
        for iteration in range(self.max_iter):
            self._n_iterations = iteration + 1
            
            # Threshold small coefficients
            small_mask = np.abs(xi) < self.threshold
            xi[small_mask] = 0
            
            # Get active indices
            active = ~small_mask
            
            # Check if any coefficients remain
            if not np.any(active):
                break
            
            # Refit on active support
            Q_active = Q[:, active]
            try:
                xi_active = np.linalg.lstsq(Q_active, b, rcond=None)[0]
            except Exception:
                break
            
            # Update full coefficient vector
            xi_new = np.zeros(n_features)
            xi_new[active] = xi_active
            
            self._convergence_history.append(xi_new.copy())
            
            # Check convergence
            if np.allclose(xi, xi_new, rtol=1e-6):
                xi = xi_new
                break
            
            xi = xi_new
        
        return xi
    
    def get_equation(self) -> str:
        """
        Get string representation of discovered equation.
        
        Returns
        -------
        str
            Equation string with source tags
        """
        if not self._fit_complete:
            return ""
        
        terms = []
        for i, (coef, active) in enumerate(zip(self._coefficients, self._support)):
            if active:
                name = self._library_names[i]
                if abs(coef) > 0.001:
                    terms.append(f"{coef:.3f} * {name}")
        
        if len(terms) == 0:
            return "0"
        
        return " + ".join(terms)
    
    def get_active_terms(self) -> List[Tuple[str, float]]:
        """
        Get list of active terms with coefficients.
        
        Returns
        -------
        List[Tuple[str, float]]
            List of (term_name, coefficient) pairs
        """
        if not self._fit_complete:
            return []
        
        active_terms = []
        for i, (coef, active) in enumerate(zip(self._coefficients, self._support)):
            if active:
                active_terms.append((self._library_names[i], coef))
        
        return active_terms
    
    def predict(self, Phi_new: np.ndarray) -> np.ndarray:
        """
        Make predictions using discovered equation.
        
        Parameters
        ----------
        Phi_new : np.ndarray
            New feature library matrix
        
        Returns
        -------
        np.ndarray
            Predictions
        """
        if not self._fit_complete:
            raise RuntimeError("Must call fit() before predict()")
        
        return Phi_new @ self._coefficients
    
    def print_stlsq_report(self) -> None:
        """
        Print detailed STLSQ results report in v4.1 format.
        """
        if not self._fit_complete:
            print("Fit not yet performed. Run fit() first.")
            return
        
        print("=" * 70)
        print("=== E-WSINDy Results (v4.1) ===")
        print("=" * 70)
        print()
        print(f"Selected terms: {np.sum(self._support)}")
        print()
        
        # Print active terms with source tags
        for name, coef in self.get_active_terms():
            print(f"  {coef:8.3f} * {name}")
        print()
        
        # Print selection analysis
        print("Selection Analysis:")
        print(f"  from_pysr: {self._selection_analysis['from_pysr']}")
        print(f"  from_variant: {self._selection_analysis['from_variant']}")
        print(f"  from_poly: {self._selection_analysis['from_poly']}")
        print(f"  from_op: {self._selection_analysis['from_op']}")
        print()
        
        print(f"R-squared: {self._r2_score:.4f}")
        print(f"MSE: {self._mse:.6f}")
        print(f"Iterations: {self._n_iterations}")
        print(f"Threshold: {self.threshold}")
        print()
        print("=" * 70)

print("EWSINDySTLSQ class v4.1 defined.")

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 07_EWSINDy_STLSQ v4.1")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Basic STLSQ with Standard Library
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Basic STLSQ with Standard Library")
    
    np.random.seed(42)
    n_samples = 200
    
    x = np.random.uniform(0.1, 2, n_samples)
    y = 3*x + 2*x**2 + 0.01*np.random.randn(n_samples)
    
    # Standard library (no source tags)
    Phi = np.column_stack([np.ones(n_samples), x, x**2, x**3, x**4])
    library_names = ['1', 'x', 'x^2', 'x^3', 'x^4']
    
    model = EWSINDySTLSQ(threshold=0.1, use_weak_form=False)
    result = model.fit(Phi, y, library_names=library_names)
    
    print(f"True: y = 3*x + 2*x^2")
    print(f"Discovered: {result['equation']}")
    print(f"R-squared: {result['r_squared']:.4f}")
    print(f"Active terms: {result['n_active_terms']}")
    print()
    
    if result['r_squared'] > 0.99:
        print("[PASS] High accuracy achieved")
    else:
        print("[WARNING] Accuracy lower than expected")

In [None]:
# ==============================================================================
# TEST 2: Source Attribution with Augmented Library
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Source Attribution with Augmented Library")
    
    np.random.seed(42)
    n_samples = 200
    
    x = np.random.uniform(0.1, 2, n_samples)
    z = np.random.uniform(0.1, 2, n_samples)
    
    # True equation: y = 0.5*x^2 + sin(z)
    y = 0.5*x**2 + np.sin(z) + 0.01*np.random.randn(n_samples)
    
    # Simulated augmented library with source tags
    Phi = np.column_stack([
        x**2,           # [PySR] x**2
        np.sin(z),      # [PySR] sin(z)
        np.sin(x),      # [Var] sin(x)
        np.ones(n_samples),  # [Poly] 1
        x,              # [Poly] x
        z,              # [Poly] z
        x*z,            # [Poly] x*z
        np.cos(x),      # [Op] cos(x)
        np.cos(z)       # [Op] cos(z)
    ])
    
    library_names = [
        '[PySR] x**2',
        '[PySR] sin(z)',
        '[Var] sin(x)',
        '[Poly] 1',
        '[Poly] x',
        '[Poly] z',
        '[Poly] x*z',
        '[Op] cos(x)',
        '[Op] cos(z)'
    ]
    
    model = EWSINDySTLSQ(threshold=0.1, use_weak_form=False)
    result = model.fit(Phi, y, library_names=library_names)
    
    print(f"True: y = 0.5*x^2 + sin(z)")
    print()
    print("Selected terms:")
    for name, coef in model.get_active_terms():
        print(f"  {coef:8.3f} * {name}")
    print()
    print(f"Selection Analysis: {result['selection_analysis']}")
    print(f"R-squared: {result['r_squared']:.4f}")
    print()
    
    # Check that PySR terms were selected
    analysis = result['selection_analysis']
    if analysis['from_pysr'] >= 2:
        print("[PASS] PySR terms correctly selected")
    else:
        print(f"[INFO] Selected {analysis['from_pysr']} PySR terms")

In [None]:
# ==============================================================================
# TEST 3: Noise Robustness with Weak Form
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Noise Robustness with Weak Form")
    
    np.random.seed(42)
    n_samples = 300
    
    t = np.linspace(0, 1, n_samples)
    x = np.sin(2 * np.pi * t)
    
    # True dynamics: dy/dt = x (approximately)
    y_clean = x.copy()
    
    noise_levels = [0.01, 0.05, 0.10]
    
    Phi = np.column_stack([np.ones(n_samples), x, x**2])
    library_names = ['[Poly] 1', '[Poly] x', '[Poly] x^2']
    
    print(f"{'Noise':<12} {'R2 (strong)':<15} {'R2 (weak)':<15}")
    print("-" * 45)
    
    for noise_level in noise_levels:
        y_noisy = y_clean + noise_level * np.random.randn(n_samples)
        
        # Strong form
        model_strong = EWSINDySTLSQ(threshold=0.1, use_weak_form=False)
        result_strong = model_strong.fit(Phi, y_noisy, library_names=library_names)
        
        # Weak form
        model_weak = EWSINDySTLSQ(threshold=0.1, use_weak_form=True)
        result_weak = model_weak.fit(Phi, y_noisy, t=t, library_names=library_names)
        
        print(f"{noise_level:<12.2f} {result_strong['r_squared']:<15.4f} {result_weak['r_squared']:<15.4f}")
    
    print()
    print("[INFO] Weak form should be more robust at higher noise levels")

In [None]:
# ==============================================================================
# TEST 4: analyze_selection_sources Method
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: analyze_selection_sources Method")
    
    # Create mock support and library names
    support = np.array([True, True, False, True, False, False, True, False])
    library_names = [
        '[PySR] sin(x)',
        '[PySR] x**2',
        '[Var] sin(y)',
        '[Poly] 1',
        '[Poly] x',
        '[Poly] y',
        '[Op] exp(x)',
        '[Op] cos(x)'
    ]
    
    model = EWSINDySTLSQ()
    analysis = model.analyze_selection_sources(support, library_names)
    
    print("Support mask:")
    for i, (s, n) in enumerate(zip(support, library_names)):
        status = "SELECTED" if s else "        "
        print(f"  {i}: {status} {n}")
    print()
    print(f"Analysis result: {analysis}")
    print()
    
    # Expected: 2 PySR, 0 Var, 1 Poly, 1 Op
    expected = {'from_pysr': 2, 'from_variant': 0, 'from_poly': 1, 'from_op': 1}
    
    all_correct = True
    for key, expected_val in expected.items():
        actual_val = analysis[key]
        status = "PASS" if actual_val == expected_val else "FAIL"
        if status == "FAIL":
            all_correct = False
        print(f"  {key}: expected={expected_val}, actual={actual_val} [{status}]")
    
    print()
    if all_correct:
        print("[PASS] All source attribution correct")
    else:
        print("[FAIL] Some source attributions incorrect")

In [None]:
# ==============================================================================
# TEST 5: Full Report Output
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 5: Full Report Output")
    
    np.random.seed(42)
    n_samples = 200
    
    x = np.random.uniform(0.1, 2, n_samples)
    z = np.random.uniform(0.1, 2, n_samples)
    y = 0.5*x**2 + np.sin(z) + 0.01*np.random.randn(n_samples)
    
    # Augmented library
    Phi = np.column_stack([
        x**2,
        np.sin(z),
        np.ones(n_samples),
        x,
        z
    ])
    
    library_names = [
        '[PySR] x**2',
        '[PySR] sin(z)',
        '[Poly] 1',
        '[Poly] x',
        '[Poly] z'
    ]
    
    model = EWSINDySTLSQ(threshold=0.1, use_weak_form=False)
    result = model.fit(Phi, y, library_names=library_names)
    
    # Print full report
    model.print_stlsq_report()

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 07_EWSINDy_STLSQ.ipynb v4.1 - Module Summary")
print("=" * 70)
print()
print("CLASS: EWSINDySTLSQ (v4.1 Modified)")
print("-" * 70)
print()
print("Purpose:")
print("  Noise-robust equation discovery via weak-form sparse regression.")
print("  Now operates on augmented library with source attribution.")
print()
print("v4.1 Modifications:")
print("  - Accepts library_names with source tags [PySR], [Var], [Poly], [Op]")
print("  - New method: analyze_selection_sources()")
print("  - Returns selection_analysis in output dictionary")
print("  - New parameter: normalize_columns (default: True)")
print()
print("Main Methods:")
print("  fit(feature_library, y, library_names=None, t=None) -> Dict")
print("      Returns: coefficients, support, equation, selection_analysis, r_squared")
print()
print("  analyze_selection_sources(support, library_names) -> Dict")
print("      Returns: from_pysr, from_variant, from_poly, from_op counts")
print()
print("  get_equation() -> str")
print("      Get string representation of discovered equation")
print()
print("  get_active_terms() -> List[Tuple[str, float]]")
print("      Get list of active terms with coefficients")
print()
print("  print_stlsq_report()")
print("      Print detailed results with source attribution")
print()
print("Key Parameters:")
print("  threshold: STLSQ sparsity threshold (default: 0.1)")
print("  use_weak_form: Enable weak form (default: True)")
print("  normalize_columns: Normalize library columns (default: True)")
print()
print("Usage Example (with augmented library):")
print("-" * 70)
print("""
# Build augmented library (from 05_FeatureLibrary)
builder = AugmentedLibraryBuilder(max_poly_degree=3)
Phi, names, info = builder.build(
    X, feature_names,
    parsed_terms=unique_terms,
    detected_operators=detected_operators,
    pysr_r2=0.85
)

# Fit E-WSINDy with source attribution
model = EWSINDySTLSQ(threshold=0.1, use_weak_form=False)
result = model.fit(Phi, y, library_names=names)

# Check source attribution
print(f"Selection Analysis: {result['selection_analysis']}")
print(f"R-squared: {result['r_squared']:.4f}")

# Print full report
model.print_stlsq_report()
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 07_EWSINDy_STLSQ.ipynb")
print("=" * 70)