# 12_Full_Pipeline - Physics-SR Framework v4.1

## Complete Pipeline Integration with Adaptive Time Budget

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Contact:** zz3239@columbia.edu  
**Date:** January 2026  
**Version:** 4.1 (Structure-Guided Feature Library Enhancement + Computational Optimization)

---

### Purpose

Integrate all components into a complete, end-to-end symbolic regression pipeline with:

**Stage 1: Variable Selection & Preprocessing**
- Buckingham Pi dimensional analysis
- PAN+SR nonlinear variable screening
- Power-law symmetry detection
- iRF interaction discovery

**Stage 2: Structure-Guided Discovery (v4.0/v4.1 Redesign)**
- PySR structure exploration with adaptive timeout
- Structure parsing (NEW v4.0)
- 4-layer augmented library construction (NEW v4.0)
- E-WSINDy sparse selection with source attribution
- Adaptive Lasso verification (conditional)

**Stage 3: Validation & UQ**
- Model selection (CV + EBIC)
- Physics verification (dimensional + bounds)
- Three-layer bootstrap UQ with adaptive count
- Statistical inference

### v4.1 Key Features

| Feature | Description |
|---------|-------------|
| TimeBudgetManager | Adaptive time allocation across stages |
| Structure-Guided Library | PySR terms feed directly into E-WSINDy |
| Source Attribution | Track which library layer contributed each term |
| Timing Profile | Detailed timing report for optimization |
| Float32 Precision | Reduced memory footprint |
| Adaptive Bootstrap | Bootstrap count adjusts to remaining budget |

---
## Section 1: Header and Imports

In [None]:
"""
12_Full_Pipeline.ipynb - Complete Pipeline Integration v4.1
============================================================

Three-Stage Physics-Informed Symbolic Regression Framework v4.1

This module provides:
- PhysicsSRPipeline: Complete end-to-end pipeline with TimeBudgetManager
- Stage-by-stage execution with timing checkpoints
- Structure-guided discovery (PySR -> Parser -> Library -> E-WSINDy)
- Comprehensive result reporting with source attribution

Usage:
    pipeline = PhysicsSRPipeline(max_time_seconds=180)
    result = pipeline.run(X, y, feature_names, user_inputs)
    print(pipeline.get_timing_report())
    pipeline.print_summary()

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
Contact: zz3239@columbia.edu
"""

print("Loading all component modules...")
print()

In [None]:
# ==============================================================================
# IMPORT ALL COMPONENT MODULES
# ==============================================================================

%run 00_Core.ipynb
%run 01_BuckinghamPi.ipynb
%run 02_VariableScreening.ipynb
%run 03_SymmetryAnalysis.ipynb
%run 04_InteractionDiscovery.ipynb
%run 05_FeatureLibrary.ipynb
%run 06_PySR.ipynb
%run 07_EWSINDy_STLSQ.ipynb
%run 08_AdaptiveLasso.ipynb
%run 09_ModelSelection.ipynb
%run 10_PhysicsVerification.ipynb
%run 11_UQ_Inference.ipynb

print()
print("=" * 70)
print(" All modules loaded successfully! (Physics-SR Framework v4.1)")
print("=" * 70)

In [None]:
# ==============================================================================
# VERIFY ALL CLASSES ARE AVAILABLE
# ==============================================================================

print("Verifying component classes (v4.1)...")
print()

components = [
    # Core
    ('UserInputs', '00_Core'),
    ('Stage1Results', '00_Core'),
    ('Stage2Results', '00_Core'),
    ('Stage3Results', '00_Core'),
    ('TimeBudgetManager', '00_Core'),
    # Stage 1
    ('BuckinghamPiAnalyzer', '01_BuckinghamPi'),
    ('PANSRVariableScreener', '02_VariableScreening'),
    ('SymmetryAnalyzer', '03_SymmetryAnalysis'),
    ('IRFInteractionDiscoverer', '04_InteractionDiscovery'),
    # Stage 2 (v4.0/v4.1 redesign)
    ('AugmentedLibraryBuilder', '05_FeatureLibrary'),
    ('PySRDiscoverer', '06_PySR'),
    ('StructureParser', '06_PySR'),
    ('EWSINDySTLSQ', '07_EWSINDy_STLSQ'),
    ('AdaptiveLassoSelector', '08_AdaptiveLasso'),
    # Stage 3
    ('ModelSelector', '09_ModelSelection'),
    ('PhysicsVerifier', '10_PhysicsVerification'),
    ('BootstrapUQ', '11_UQ_Inference'),
    ('StatisticalInference', '11_UQ_Inference')
]

all_loaded = True
for class_name, module in components:
    if class_name in dir():
        print(f"  [OK] {class_name} from {module}")
    else:
        print(f"  [MISSING] {class_name} from {module}")
        all_loaded = False

print()
if all_loaded:
    print("All components verified successfully!")
else:
    print("WARNING: Some components missing!")

---
## Section 2: PhysicsSRPipeline Class Definition

In [None]:
# ==============================================================================
# PHYSICS SR PIPELINE CLASS (v4.1)
# ==============================================================================

class PhysicsSRPipeline:
    """
    Complete Physics-Informed Symbolic Regression Pipeline (v4.1).
    
    Orchestrates all stages with:
    - Adaptive time budget management (v4.1)
    - Structure-guided feature library (v4.0)
    - Source attribution for selected terms
    - Comprehensive timing profile
    
    Pipeline Flow (v4.1):
    ----------------------
    Stage 1: Variable Selection
        1.1 Buckingham Pi -> dimensional reduction
        1.2 PAN+SR -> variable importance
        1.3 Symmetry -> power-law detection
        1.4 iRF -> interaction discovery
    
    Stage 2: Structure-Guided Discovery (v4.0/v4.1)
        2.1 PySR -> structure exploration (adaptive timeout)
        2.2 Parser -> extract terms and operators
        2.3 Library -> 4-layer augmented construction
        2.4 E-WSINDy -> sparse selection with source attribution
        2.5 ALasso -> verification (conditional on budget)
    
    Stage 3: Validation & UQ
        3.1 Selection -> CV + EBIC comparison
        3.2 Physics -> dimensional + bounds verification
        3.3 Bootstrap -> three-layer UQ (adaptive count)
        3.4 Inference -> hypothesis testing
    
    Attributes
    ----------
    config : Dict
        Pipeline configuration parameters
    budget_manager : TimeBudgetManager
        Time budget controller (v4.1)
    stage1_results : Stage1Results
        Results from Stage 1
    stage2_results : Stage2Results
        Results from Stage 2 (v4.1 enhanced)
    stage3_results : Stage3Results
        Results from Stage 3
    
    Examples
    --------
    >>> pipeline = PhysicsSRPipeline(max_time_seconds=180)
    >>> result = pipeline.run(X, y, feature_names, user_inputs)
    >>> print(pipeline.get_timing_report())
    >>> print(f"Final equation: {pipeline.get_final_equation()}")
    
    Reference
    ---------
    Framework v4.1 Section 6: Pipeline Integration
    """
    
    def __init__(
        self,
        config: Dict = None,
        max_time_seconds: float = DEFAULT_RUNTIME_BUDGET
    ):
        """
        Initialize PhysicsSRPipeline.
        
        Parameters
        ----------
        config : Dict, optional
            Configuration overrides. If None, uses defaults.
        max_time_seconds : float
            Total runtime budget in seconds (v4.1).
            Default: 180 (optimized for Google Colab Pro)
        """
        self.config = self._merge_config(config or {})
        self.budget_manager = TimeBudgetManager(max_time_seconds)
        
        # Results storage
        self.stage1_results = None
        self.stage2_results = None
        self.stage3_results = None
        self._final_equation = None
        self._run_complete = False
        
        # Working data (cleared after run)
        self._X_work = None
        self._feature_names_work = None
    
    def _merge_config(self, user_config: Dict) -> Dict:
        """
        Merge user config with defaults.
        
        Parameters
        ----------
        user_config : Dict
            User-provided configuration
            
        Returns
        -------
        Dict
            Complete configuration with defaults
        """
        defaults = {
            # Stage 1
            'max_exponent': DEFAULT_MAX_EXPONENT,
            'importance_threshold': DEFAULT_IMPORTANCE_THRESHOLD,
            'powerlaw_r2_threshold': DEFAULT_POWERLAW_R2_THRESHOLD,
            'stability_threshold': DEFAULT_STABILITY_THRESHOLD,
            
            # Stage 2
            'pysr_mode': 'standard',
            'max_poly_degree': DEFAULT_MAX_POLY_DEGREE,
            'stlsq_threshold': DEFAULT_STLSQ_THRESHOLD,
            'alasso_gamma': DEFAULT_ALASSO_GAMMA,
            'generate_variants': True,
            'include_operator_terms': True,
            
            # Stage 3
            'cv_folds': DEFAULT_CV_FOLDS,
            'ebic_gamma': DEFAULT_EBIC_GAMMA,
            'n_bootstrap': DEFAULT_N_BOOTSTRAP,
            'confidence_level': DEFAULT_CONFIDENCE_LEVEL,
            'dim_tolerance': DEFAULT_DIM_TOLERANCE,
            
            # v4.1 Computational
            'n_jobs': DEFAULT_PROCS,
            'precision': DEFAULT_PRECISION,
            'skip_alasso_if_tight': True,
            'verbose': True
        }
        
        defaults.update(user_config)
        return defaults
    
    def run(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str],
        user_inputs: UserInputs = None
    ) -> Dict[str, Any]:
        """
        Execute complete pipeline.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix (n_samples, n_features)
        y : np.ndarray
            Target vector (n_samples,)
        feature_names : List[str]
            Feature names
        user_inputs : UserInputs, optional
            User-provided physics information
        
        Returns
        -------
        Dict[str, Any]
            Complete pipeline results including:
            - stage1: Stage1Results
            - stage2: Stage2Results (v4.1 enhanced)
            - stage3: Stage3Results
            - final_equation: str
            - timing: Dict[str, float]
        """
        if self.config['verbose']:
            print("=" * 70)
            print(" Physics-SR Pipeline v4.1")
            print("=" * 70)
            print(f"Total budget: {self.budget_manager.total_budget}s")
            print()
        
        # Convert to Float32 for memory efficiency (v4.1)
        X, y = convert_to_float32(X, y)
        feature_names = list(feature_names)
        
        # =====================================================================
        # STAGE 1: Variable Selection & Preprocessing
        # =====================================================================
        if self.config['verbose']:
            print("STAGE 1: Variable Selection & Preprocessing")
            print("-" * 70)
        
        self.stage1_results = self._run_stage1(X, y, feature_names, user_inputs)
        self.budget_manager.record_stage('Stage1')
        cleanup_memory()
        
        if self.config['verbose']:
            print(f"  [Timing] Stage 1: {self.budget_manager.get_stage_duration('Stage1'):.1f}s")
            print()
        
        # Get working features based on Stage 1 results
        self._X_work, self._feature_names_work = self._select_working_features(
            X, feature_names, self.stage1_results
        )
        
        # =====================================================================
        # STAGE 2: Structure-Guided Discovery (v4.0/v4.1)
        # =====================================================================
        if self.config['verbose']:
            print("STAGE 2: Structure-Guided Discovery")
            print("-" * 70)
        
        self.stage2_results = self._run_stage2(
            self._X_work, y, self._feature_names_work, self.stage1_results
        )
        self.budget_manager.record_stage('Stage2')
        cleanup_memory()
        
        if self.config['verbose']:
            print(f"  [Timing] Stage 2: {self.budget_manager.get_stage_duration('Stage2'):.1f}s")
            print()
        
        # =====================================================================
        # STAGE 3: Validation & Uncertainty Quantification
        # =====================================================================
        if self.config['verbose']:
            print("STAGE 3: Validation & Uncertainty Quantification")
            print("-" * 70)
        
        self.stage3_results = self._run_stage3(
            self._X_work, y, self._feature_names_work,
            self.stage2_results, user_inputs
        )
        self.budget_manager.record_stage('Stage3')
        
        if self.config['verbose']:
            print(f"  [Timing] Stage 3: {self.budget_manager.get_stage_duration('Stage3'):.1f}s")
            print()
        
        # =====================================================================
        # FINAL: Extract equation and compile results
        # =====================================================================
        self._final_equation = self._extract_final_equation()
        self._run_complete = True
        
        # Clear working data
        self._X_work = None
        self._feature_names_work = None
        cleanup_memory()
        
        if self.config['verbose']:
            print("=" * 70)
            print(" Pipeline Complete!")
            print("=" * 70)
            print()
            print(self.get_timing_report())
        
        return {
            'stage1': self.stage1_results,
            'stage2': self.stage2_results,
            'stage3': self.stage3_results,
            'final_equation': self._final_equation,
            'timing': self.budget_manager.to_dict()
        }

In [None]:
# ==============================================================================
# STAGE 1 IMPLEMENTATION
# ==============================================================================

def _run_stage1(
    self,
    X: np.ndarray,
    y: np.ndarray,
    feature_names: List[str],
    user_inputs: UserInputs
) -> Stage1Results:
    """
    Execute Stage 1: Variable Selection & Preprocessing.
    
    Sub-stages:
    - 1.1 Buckingham Pi dimensional analysis
    - 1.2 PAN+SR variable screening
    - 1.3 Power-law symmetry detection
    - 1.4 iRF interaction discovery
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix
    y : np.ndarray
        Target vector
    feature_names : List[str]
        Feature names
    user_inputs : UserInputs
        User-provided physics information
        
    Returns
    -------
    Stage1Results
        Complete Stage 1 results
    """
    timing = {}
    t0 = time.time()
    
    # Initialize results
    results = Stage1Results()
    
    # -------------------------------------------------------------------------
    # 1.1 Buckingham Pi Analysis
    # -------------------------------------------------------------------------
    if user_inputs and user_inputs.variable_dimensions:
        if self.config['verbose']:
            print("  1.1 Buckingham Pi Analysis...")
        
        pi_analyzer = BuckinghamPiAnalyzer(
            max_exponent=self.config['max_exponent']
        )
        
        # Filter dimensions to only include available features
        available_dims = {
            k: v for k, v in user_inputs.variable_dimensions.items()
            if k in feature_names
        }
        
        if len(available_dims) >= 2:
            pi_result = pi_analyzer.analyze(available_dims)
            
            results.pi_groups = pi_result.get('pi_groups')
            results.pi_exponents = pi_result.get('pi_exponents')
            results.pi_group_names = pi_result.get('pi_group_names')
            results.X_transformed = pi_result.get('X_transformed')
            results.all_pi_candidates = pi_result.get('all_pi_candidates')
            
            if self.config['verbose']:
                n_groups = pi_result.get('n_pi_groups', 0)
                print(f"      Reduced from {len(feature_names)} to {n_groups} Pi groups")
        else:
            if self.config['verbose']:
                print("      Skipped (insufficient dimensional info)")
    else:
        if self.config['verbose']:
            print("  1.1 Buckingham Pi: Skipped (no dimensions provided)")
    
    timing['buckingham_pi'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 1.2 PAN+SR Variable Screening
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  1.2 PAN+SR Variable Screening...")
    
    screener = PANSRVariableScreener(
        importance_threshold=self.config['importance_threshold']
    )
    screening_result = screener.screen(X, y, feature_names)
    
    results.selected_indices = screening_result.get('selected_indices', [])
    results.selected_names = screening_result.get('selected_names', [])
    results.importance_scores = screening_result.get('importance_scores', {})
    
    if self.config['verbose']:
        n_selected = len(results.selected_names)
        print(f"      Selected {n_selected} of {len(feature_names)} variables")
    
    timing['screening'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 1.3 Symmetry Analysis (Power-Law Detection)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  1.3 Power-Law Symmetry Detection...")
    
    symmetry_analyzer = SymmetryAnalyzer(
        r2_threshold=self.config['powerlaw_r2_threshold']
    )
    symmetry_result = symmetry_analyzer.analyze(X, y, feature_names)
    
    results.is_power_law = symmetry_result.get('is_power_law', False)
    results.estimated_exponents = symmetry_result.get('exponents', {})
    results.power_law_r2 = symmetry_result.get('r_squared', 0.0)
    results.structural_hints = symmetry_result.get('structural_hints', {})
    
    if self.config['verbose']:
        print(f"      Power-law detected: {results.is_power_law}")
        if results.is_power_law and results.power_law_r2:
            print(f"      R-squared: {results.power_law_r2:.4f}")
    
    timing['symmetry'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 1.4 iRF Interaction Discovery
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  1.4 iRF Interaction Discovery...")
    
    interaction_discoverer = IRFInteractionDiscoverer(
        stability_threshold=self.config['stability_threshold']
    )
    interaction_result = interaction_discoverer.discover(X, y, feature_names)
    
    results.stable_interactions = interaction_result.get('stable_interactions', [])
    results.interaction_stability = interaction_result.get('interaction_stability', {})
    results.soft_weights = interaction_result.get('soft_weights')
    
    if self.config['verbose']:
        n_interactions = len(results.stable_interactions) if results.stable_interactions else 0
        print(f"      Found {n_interactions} stable interactions")
    
    timing['interaction'] = time.time() - t0
    
    # Store timing
    results.timing = timing
    
    return results

# Attach method to class
PhysicsSRPipeline._run_stage1 = _run_stage1

In [None]:
# ==============================================================================
# FEATURE SELECTION HELPER
# ==============================================================================

def _select_working_features(
    self,
    X: np.ndarray,
    feature_names: List[str],
    stage1_results: Stage1Results
) -> Tuple[np.ndarray, List[str]]:
    """
    Select working features based on Stage 1 results.
    
    Selection Priority:
    1. If power-law with R2 > 0.95: use active variables from symmetry
    2. If screening selected variables: use those
    3. Fallback to all features
    
    Additionally applies Buckingham Pi dimensional filter if available.
    
    Parameters
    ----------
    X : np.ndarray
        Original feature matrix
    feature_names : List[str]
        Original feature names
    stage1_results : Stage1Results
        Results from Stage 1
        
    Returns
    -------
    Tuple[np.ndarray, List[str]]
        (X_work, feature_names_work) - working feature matrix and names
    """
    selected = []
    selection_source = 'fallback'
    
    # Build dimensional filter from Buckingham Pi
    dimensional_filter = None
    if stage1_results.pi_exponents is not None:
        try:
            pi_exp = np.array(stage1_results.pi_exponents)
            if pi_exp.size > 0:
                relevance = np.sum(np.abs(pi_exp), axis=0)
                dimensional_filter = set(
                    feature_names[i] for i in range(len(feature_names))
                    if i < len(relevance) and relevance[i] > 0
                )
        except Exception:
            pass
    
    # Priority 1: Power-law symmetry with high R2
    if stage1_results.is_power_law:
        r2 = stage1_results.power_law_r2 or 0.0
        if r2 > 0.95:
            hints = stage1_results.structural_hints or {}
            active_vars = hints.get('active_variables', [])
            if len(active_vars) > 0:
                selected = [str(v) for v in active_vars]
                selection_source = 'symmetry'
    
    # Priority 2: Screening results
    if len(selected) == 0 and stage1_results.selected_names:
        selected = [str(f) for f in stage1_results.selected_names]
        selection_source = 'screening'
    
    # Priority 3: Fallback to all features
    if len(selected) == 0:
        selected = [str(f) for f in feature_names]
        selection_source = 'fallback'
        if self.config['verbose']:
            print("      [Note] No features selected by screening or symmetry, using all features")
    
    # Apply dimensional filter if available
    if dimensional_filter is not None and len(dimensional_filter) > 0:
        original_count = len(selected)
        selected = [f for f in selected if f in dimensional_filter]
        
        # Ensure we have at least some features
        if len(selected) == 0:
            selected = list(dimensional_filter)
        
        if self.config['verbose'] and len(selected) < original_count:
            print(f"      [Buckingham Pi] Filtered {original_count} -> {len(selected)} variables")
    
    # Get indices for selected features
    feature_names_str = [str(f) for f in feature_names]
    indices = [feature_names_str.index(f) for f in selected if f in feature_names_str]
    
    if len(indices) == 0:
        # Safety fallback - use all features
        if self.config['verbose']:
            print("      [Warning] No valid indices, using all features")
        return X, feature_names_str
    
    if self.config['verbose']:
        print(f"      [Selection] Working features from {selection_source}: {selected}")
    
    return X[:, indices], [feature_names_str[i] for i in indices]

# Attach method to class
PhysicsSRPipeline._select_working_features = _select_working_features

In [None]:
# ==============================================================================
# STAGE 2 IMPLEMENTATION (v4.0/v4.1 REDESIGNED FLOW)
# ==============================================================================

def _run_stage2(
    self,
    X: np.ndarray,
    y: np.ndarray,
    feature_names: List[str],
    stage1_results: Stage1Results
) -> Stage2Results:
    """
    Execute Stage 2: Structure-Guided Discovery (v4.0/v4.1).
    
    v4.1 Flow:
    ----------
    2.1 PySR Discovery (adaptive timeout)
        |-> Pareto front equations
    2.2 Structure Parsing
        |-> parsed_terms, detected_operators
    2.3 Augmented Library Construction (4-layer)
        |-> Layer 1: PySR exact terms
        |-> Layer 2: Variant terms
        |-> Layer 3: Polynomial baseline
        |-> Layer 4: Operator-guided terms
    2.4 E-WSINDy Sparse Selection
        |-> support, coefficients, selection_analysis
    2.5 Adaptive Lasso (conditional on time budget)
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix
    y : np.ndarray
        Target vector
    feature_names : List[str]
        Feature names
    stage1_results : Stage1Results
        Results from Stage 1
        
    Returns
    -------
    Stage2Results
        Complete Stage 2 results with v4.1 enhancements
    """
    timing = {}
    t0 = time.time()
    
    # Initialize results
    results = Stage2Results()
    
    # -------------------------------------------------------------------------
    # 2.1 PySR Structure Exploration (with adaptive timeout)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  2.1 PySR Structure Exploration...")
    
    # Calculate adaptive timeout based on remaining budget
    pysr_timeout = self.budget_manager.allocate_pysr_time()
    if self.config['verbose']:
        print(f"      [TimeBudget] PySR timeout: {pysr_timeout}s")
    
    # Initialize PySR discoverer
    pysr_discoverer = PySRDiscoverer(
        mode=self.config['pysr_mode'],
        timeout_seconds=pysr_timeout,
        precision=self.config['precision']
    )
    
    pysr_result = pysr_discoverer.discover(X, y, feature_names)
    
    results.pysr_equations = pysr_result.get('all_equations', [])
    results.pysr_pareto = pysr_result.get('pareto_front')
    results.best_pysr_equation = pysr_result.get('best_equation', '')
    results.best_pysr_sympy = pysr_result.get('best_equation_sympy')
    results.best_pysr_r2 = pysr_result.get('best_r2', 0.0)
    results.pysr_elapsed_time = pysr_result.get('elapsed_time', 0.0)
    
    if self.config['verbose']:
        r2_str = f"{results.best_pysr_r2:.4f}" if results.best_pysr_r2 else "N/A"
        eq_preview = results.best_pysr_equation[:50] if results.best_pysr_equation else "N/A"
        print(f"      Best equation: {eq_preview}... (R2={r2_str})")
        print(f"      Elapsed: {results.pysr_elapsed_time:.1f}s")
    
    timing['pysr'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 2.2 Structure Parsing (NEW v4.0)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  2.2 Structure Parsing...")
    
    parser = StructureParser()
    pareto_equations = pysr_discoverer.get_pareto_equations()
    
    parsed_terms, detected_operators, term_map = parser.parse_pareto_equations(
        pareto_equations, feature_names, X
    )
    
    results.parsed_terms = parsed_terms
    results.detected_operators = detected_operators
    results.term_to_equation_map = term_map
    
    if self.config['verbose']:
        n_terms = len(parsed_terms)
        ops_str = ', '.join(sorted(detected_operators)) if detected_operators else 'none'
        print(f"      Extracted {n_terms} unique terms")
        print(f"      Detected operators: {ops_str}")
    
    timing['parsing'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 2.3 Augmented Library Construction (4-Layer, NEW v4.0)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  2.3 Augmented Library Construction (4-Layer)...")
    
    library_builder = AugmentedLibraryBuilder(
        max_poly_degree=self.config['max_poly_degree'],
        generate_variants=self.config['generate_variants'],
        include_operator_terms=self.config['include_operator_terms']
    )
    
    augmented_library, library_names, library_info = library_builder.build(
        X, feature_names,
        parsed_terms=parsed_terms,
        detected_operators=detected_operators,
        pysr_r2=results.best_pysr_r2 or 0.0
    )
    
    results.augmented_library = augmented_library
    results.library_names = library_names
    results.library_info = library_info
    
    if self.config['verbose']:
        info = library_info or {}
        print(f"      Library: {info.get('total_terms', 0)} features")
        print(f"        [PySR]: {info.get('n_pysr_terms', 0)}")
        print(f"        [Var]:  {info.get('n_variant_terms', 0)}")
        print(f"        [Poly]: {info.get('n_poly_terms', 0)}")
        print(f"        [Op]:   {info.get('n_op_terms', 0)}")
    
    timing['library'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 2.4 E-WSINDy Sparse Selection (with source attribution)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  2.4 E-WSINDy Sparse Selection...")
    
    ewsindy = EWSINDySTLSQ(
        threshold=self.config['stlsq_threshold']
    )
    
    ewsindy_result = ewsindy.fit(
        augmented_library, y, library_names=library_names
    )
    
    results.ewsindy_coefficients = ewsindy_result.get('coefficients')
    results.ewsindy_support = ewsindy_result.get('support')
    results.ewsindy_equation = ewsindy_result.get('equation', '')
    results.ewsindy_r2 = ewsindy_result.get('r_squared', 0.0)
    results.selection_analysis = ewsindy_result.get('selection_analysis', {})
    
    if self.config['verbose']:
        n_active = ewsindy_result.get('n_active_terms', 0)
        analysis = results.selection_analysis or {}
        r2_str = f"{results.ewsindy_r2:.4f}" if results.ewsindy_r2 else "N/A"
        print(f"      Selected {n_active} terms (R2={r2_str})")
        print(f"        from [PySR]: {analysis.get('from_pysr', 0)}")
        print(f"        from [Var]:  {analysis.get('from_variant', 0)}")
        print(f"        from [Poly]: {analysis.get('from_poly', 0)}")
        print(f"        from [Op]:   {analysis.get('from_op', 0)}")
    
    timing['ewsindy'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 2.5 Adaptive Lasso (conditional on time budget)
    # -------------------------------------------------------------------------
    skip_alasso = (
        self.config['skip_alasso_if_tight'] and
        self.budget_manager.should_skip_optional(min_required=20)
    )
    
    if not skip_alasso:
        if self.config['verbose']:
            print("  2.5 Adaptive Lasso Verification...")
        
        alasso = AdaptiveLassoSelector(
            gamma=self.config['alasso_gamma']
        )
        
        alasso_result = alasso.fit(augmented_library, y, library_names)
        
        results.alasso_coefficients = alasso_result.get('coefficients')
        results.alasso_support = alasso_result.get('support')
        results.alasso_r2 = alasso_result.get('r_squared', 0.0)
        
        if self.config['verbose']:
            n_alasso = alasso_result.get('n_active_terms', 0)
            r2_str = f"{results.alasso_r2:.4f}" if results.alasso_r2 else "N/A"
            print(f"      Selected {n_alasso} terms (R2={r2_str})")
    else:
        if self.config['verbose']:
            print("  2.5 Adaptive Lasso: Skipped (time budget constraint)")
        results.alasso_coefficients = None
        results.alasso_support = None
        results.alasso_r2 = None
    
    timing['alasso'] = time.time() - t0
    
    # Store timing
    results.timing = timing
    
    return results

# Attach method to class
PhysicsSRPipeline._run_stage2 = _run_stage2

In [None]:
# ==============================================================================
# STAGE 3 IMPLEMENTATION
# ==============================================================================

def _run_stage3(
    self,
    X: np.ndarray,
    y: np.ndarray,
    feature_names: List[str],
    stage2_results: Stage2Results,
    user_inputs: UserInputs
) -> Stage3Results:
    """
    Execute Stage 3: Validation & Uncertainty Quantification.
    
    Sub-stages:
    - 3.1 Model Selection (CV + EBIC)
    - 3.2 Physics Verification (dimensional + bounds)
    - 3.3 Bootstrap UQ (three-layer, adaptive count)
    - 3.4 Statistical Inference
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix
    y : np.ndarray
        Target vector
    feature_names : List[str]
        Feature names
    stage2_results : Stage2Results
        Results from Stage 2
    user_inputs : UserInputs
        User-provided physics information
        
    Returns
    -------
    Stage3Results
        Complete Stage 3 results
    """
    timing = {}
    t0 = time.time()
    
    # Initialize results
    results = Stage3Results()
    
    # Get library from Stage 2
    Phi = stage2_results.augmented_library
    library_names = stage2_results.library_names
    
    # -------------------------------------------------------------------------
    # 3.1 Model Selection (CV + EBIC)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  3.1 Model Selection (CV + EBIC)...")
    
    # Prepare candidate models
    candidates = {
        'ewsindy': (Phi, stage2_results.ewsindy_support)
    }
    
    if stage2_results.alasso_support is not None:
        candidates['alasso'] = (Phi, stage2_results.alasso_support)
    
    model_selector = ModelSelector(
        n_folds=self.config['cv_folds'],
        ebic_gamma=self.config['ebic_gamma']
    )
    
    selection_result = model_selector.compare_models(
        candidates, y, p_total=Phi.shape[1]
    )
    
    results.cv_scores = selection_result.get('cv_results', {})
    results.ebic_scores = selection_result.get('ebic_results', {})
    results.best_model = selection_result.get('best_model_cv', 'ewsindy')
    
    if self.config['verbose']:
        print(f"      Best by CV: {selection_result.get('best_model_cv', 'N/A')}")
        print(f"      Best by EBIC: {selection_result.get('best_model_ebic', 'N/A')}")
    
    timing['selection'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 3.2 Physics Verification (dimensional + bounds)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  3.2 Physics Verification...")
    
    # Get best model support and predictions
    best_model = results.best_model
    if best_model == 'ewsindy':
        best_support = stage2_results.ewsindy_support
        best_coefs = stage2_results.ewsindy_coefficients
    else:
        best_support = stage2_results.alasso_support
        best_coefs = stage2_results.alasso_coefficients
    
    y_pred = Phi @ best_coefs if best_coefs is not None else None
    
    # Build equation terms for dimensional analysis
    equation_terms = []
    if best_support is not None and library_names is not None:
        for idx in np.where(best_support)[0]:
            if idx < len(library_names):
                term_name = library_names[idx]
                # Parse the term to extract variable exponents
                equation_terms.append({
                    'name': term_name,
                    'variables': {}  # Would need parsing for full dimensional analysis
                })
    
    # Get variable dimensions and target dimensions
    variable_dims = {}
    target_dims = [0, 0, 0, 0]
    physical_bounds = None
    
    if user_inputs:
        variable_dims = user_inputs.variable_dimensions or {}
        target_dims = user_inputs.target_dimensions or [0, 0, 0, 0]
        physical_bounds = user_inputs.physical_bounds
    
    physics_verifier = PhysicsVerifier(
        dim_tolerance=self.config['dim_tolerance']
    )
    
    physics_result = physics_verifier.verify(
        equation_terms, variable_dims, target_dims,
        y_pred=y_pred, physical_bounds=physical_bounds
    )
    
    results.dim_consistent = physics_result.get('dim_consistent', True)
    results.dim_details = physics_result.get('dim_details')
    results.bounds_violations = physics_result.get('bounds_violations')
    results.physics_score = physics_result.get('physics_score', 1.0)
    
    if self.config['verbose']:
        status = 'PASS' if physics_result.get('overall_valid', True) else 'FAIL'
        score = physics_result.get('physics_score', 1.0)
        print(f"      Physics: {status} (score: {score:.2f})")
    
    timing['physics'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 3.3 Bootstrap UQ (three-layer, adaptive count)
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  3.3 Bootstrap Uncertainty Quantification...")
    
    # Calculate adaptive bootstrap count
    n_bootstrap = self.budget_manager.allocate_bootstrap_count()
    if self.config['verbose']:
        print(f"      [TimeBudget] Bootstrap count: {n_bootstrap}")
    
    bootstrap_uq = BootstrapUQ(
        n_bootstrap=n_bootstrap,
        confidence_level=self.config['confidence_level'],
        n_jobs=self.config['n_jobs']
    )
    
    uq_result = bootstrap_uq.run(Phi, y, library_names)
    
    results.inclusion_probabilities = uq_result.get('inclusion_probabilities')
    results.structural_confidence = uq_result.get('structural_confidence')
    results.bootstrap_supports = uq_result.get('bootstrap_supports')
    results.coefficient_estimates = uq_result.get('coefficient_estimates')
    results.coefficient_CI = uq_result.get('coefficient_CI')
    results.coefficient_SE = uq_result.get('coefficient_SE')
    results.bootstrap_coefficients = uq_result.get('bootstrap_coefficients')
    results.residual_variance = uq_result.get('residual_variance')
    
    if self.config['verbose']:
        inc_probs = uq_result.get('inclusion_probabilities', [])
        n_high = sum(1 for p in inc_probs if p is not None and p > 0.9) if inc_probs is not None else 0
        print(f"      High-confidence terms (P>0.9): {n_high}")
    
    timing['bootstrap'] = time.time() - t0
    t0 = time.time()
    
    # -------------------------------------------------------------------------
    # 3.4 Statistical Inference
    # -------------------------------------------------------------------------
    if self.config['verbose']:
        print("  3.4 Statistical Inference...")
    
    inference = StatisticalInference(
        alpha=1.0 - self.config['confidence_level']
    )
    
    coef_samples = uq_result.get('coef_samples')
    support_samples = uq_result.get('support_samples')
    
    if coef_samples is not None and support_samples is not None:
        inference_result = inference.test_coefficients(
            coef_samples, support_samples, library_names
        )
        
        results.p_values = inference_result.get('p_values', {})
        results.significant_terms = inference_result.get('significant_terms', [])
        results.test_statistics = inference_result.get('test_statistics', {})
        
        if self.config['verbose']:
            n_sig = len(results.significant_terms) if results.significant_terms else 0
            print(f"      Significant terms (alpha=0.05): {n_sig}")
    else:
        if self.config['verbose']:
            print("      Skipped (insufficient data)")
    
    timing['inference'] = time.time() - t0
    
    # -------------------------------------------------------------------------
    # Build Final Equation
    # -------------------------------------------------------------------------
    # Use best model equation
    if best_model == 'ewsindy' and stage2_results.ewsindy_equation:
        results.final_equation = stage2_results.ewsindy_equation
    elif best_model == 'alasso':
        # Build equation from alasso coefficients
        results.final_equation = self._build_equation_string(
            stage2_results.alasso_coefficients,
            stage2_results.alasso_support,
            library_names
        )
    else:
        results.final_equation = stage2_results.ewsindy_equation or "N/A"
    
    # Build coefficient dictionary
    if best_coefs is not None and best_support is not None:
        final_coefs = {}
        for idx in np.where(best_support)[0]:
            if idx < len(library_names):
                final_coefs[library_names[idx]] = float(best_coefs[idx])
        results.final_coefficients = final_coefs
    
    # Store timing
    results.timing = timing
    
    return results

# Attach method to class
PhysicsSRPipeline._run_stage3 = _run_stage3

In [None]:
# ==============================================================================
# HELPER METHODS
# ==============================================================================

def _build_equation_string(
    self,
    coefficients: np.ndarray,
    support: np.ndarray,
    library_names: List[str]
) -> str:
    """
    Build equation string from coefficients and support.
    
    Parameters
    ----------
    coefficients : np.ndarray
        Coefficient vector
    support : np.ndarray
        Boolean support mask
    library_names : List[str]
        Feature names
        
    Returns
    -------
    str
        Equation string
    """
    if coefficients is None or support is None:
        return "N/A"
    
    terms = []
    for idx in np.where(support)[0]:
        if idx < len(library_names) and idx < len(coefficients):
            coef = coefficients[idx]
            name = library_names[idx]
            # Remove source tag for cleaner equation
            clean_name = name
            for tag in ['[PySR] ', '[Var] ', '[Poly] ', '[Op] ']:
                clean_name = clean_name.replace(tag, '')
            
            if abs(coef) > 1e-10:
                terms.append(f"{coef:.4f} * {clean_name}")
    
    if len(terms) == 0:
        return "0"
    
    return " + ".join(terms)

def _extract_final_equation(self) -> str:
    """
    Extract final equation from results.
    
    Priority:
    1. If symmetry detected power-law with R2 > 0.99, use symmetry equation
    2. Otherwise use Stage 3 final equation
    
    Returns
    -------
    str
        Final equation string
    """
    # Check if symmetry provides a high-quality power-law equation
    if self.stage1_results and self.stage1_results.is_power_law:
        r2 = self.stage1_results.power_law_r2 or 0.0
        if r2 > 0.99:
            hints = self.stage1_results.structural_hints or {}
            suggested_form = hints.get('suggested_form', '')
            if suggested_form:
                return f"[Symmetry R2={r2:.4f}] {suggested_form}"
    
    # Fallback to Stage 3 final equation
    if self.stage3_results and self.stage3_results.final_equation:
        return self.stage3_results.final_equation
    
    return "N/A"

def get_final_equation(self) -> str:
    """
    Get the final discovered equation.
    
    Returns
    -------
    str
        Final equation string
        
    Raises
    ------
    ValueError
        If pipeline has not been run
    """
    if not self._run_complete:
        raise ValueError("Must run pipeline first")
    return self._final_equation

def get_timing_report(self) -> str:
    """
    Get detailed timing report.
    
    Returns
    -------
    str
        Timing report string
    """
    return self.budget_manager.report()

# Attach methods to class
PhysicsSRPipeline._build_equation_string = _build_equation_string
PhysicsSRPipeline._extract_final_equation = _extract_final_equation
PhysicsSRPipeline.get_final_equation = get_final_equation
PhysicsSRPipeline.get_timing_report = get_timing_report

In [None]:
# ==============================================================================
# PRINT SUMMARY METHOD
# ==============================================================================

def print_summary(self) -> None:
    """
    Print comprehensive pipeline summary.
    
    Displays results from all three stages with v4.1 enhancements.
    """
    if not self._run_complete:
        print("Pipeline not yet executed. Call run() first.")
        return
    
    print()
    print("=" * 70)
    print(" PHYSICS-SR PIPELINE v4.1 SUMMARY")
    print("=" * 70)
    print()
    
    # Stage 1 Summary
    print("STAGE 1: Variable Selection")
    print("-" * 40)
    s1 = self.stage1_results
    if s1:
        selected = s1.selected_names or []
        print(f"  Selected variables: {selected}")
        print(f"  Power-law detected: {s1.is_power_law}")
        if s1.is_power_law:
            print(f"  Power-law R2: {s1.power_law_r2:.4f}" if s1.power_law_r2 else "")
        interactions = s1.stable_interactions or []
        print(f"  Stable interactions: {len(interactions)}")
    print()
    
    # Stage 2 Summary (v4.1 enhanced)
    print("STAGE 2: Structure-Guided Discovery (v4.1)")
    print("-" * 40)
    s2 = self.stage2_results
    if s2:
        lib_info = s2.library_info or {}
        print(f"  PySR best equation: {(s2.best_pysr_equation or 'N/A')[:50]}...")
        print(f"  PySR R2: {s2.best_pysr_r2:.4f}" if s2.best_pysr_r2 else "  PySR R2: N/A")
        print(f"  Library size: {lib_info.get('total_terms', 0)} features")
        print(f"    [PySR]: {lib_info.get('n_pysr_terms', 0)}")
        print(f"    [Var]:  {lib_info.get('n_variant_terms', 0)}")
        print(f"    [Poly]: {lib_info.get('n_poly_terms', 0)}")
        print(f"    [Op]:   {lib_info.get('n_op_terms', 0)}")
        
        # Selection analysis
        analysis = s2.selection_analysis or {}
        n_ewsindy = analysis.get('total_selected', 0)
        print(f"  E-WSINDy selected: {n_ewsindy} terms")
        if n_ewsindy > 0:
            print(f"    from [PySR]: {analysis.get('from_pysr', 0)}")
            print(f"    from [Var]:  {analysis.get('from_variant', 0)}")
            print(f"    from [Poly]: {analysis.get('from_poly', 0)}")
            print(f"    from [Op]:   {analysis.get('from_op', 0)}")
        print(f"  E-WSINDy R2: {s2.ewsindy_r2:.4f}" if s2.ewsindy_r2 else "")
    print()
    
    # Stage 3 Summary
    print("STAGE 3: Validation & UQ")
    print("-" * 40)
    s3 = self.stage3_results
    if s3:
        print(f"  Best model: {s3.best_model or 'N/A'}")
        print(f"  Physics score: {s3.physics_score:.2f}" if s3.physics_score else "  Physics score: N/A")
        
        # UQ summary
        inc_probs = s3.inclusion_probabilities
        if inc_probs is not None:
            n_high = sum(1 for p in inc_probs if p > 0.9)
            n_med = sum(1 for p in inc_probs if 0.5 < p <= 0.9)
            print(f"  High-confidence terms (P>0.9): {n_high}")
            print(f"  Medium-confidence terms (0.5<P<=0.9): {n_med}")
        
        sig_terms = s3.significant_terms or []
        print(f"  Significant terms: {len(sig_terms)}")
    print()
    
    # Final Equation
    print("FINAL EQUATION")
    print("-" * 40)
    print(f"  {self._final_equation}")
    print()
    
    # Timing Profile
    print("TIMING PROFILE (v4.1)")
    print("-" * 40)
    timing = self.budget_manager.to_dict()
    for stage in ['Stage1', 'Stage2', 'Stage3']:
        if stage in timing:
            print(f"  {stage}: {timing[stage]:.1f}s")
    print(f"  ---")
    print(f"  Total: {timing.get('total', 0):.1f}s / {self.budget_manager.total_budget}s")
    print(f"  Remaining: {timing.get('remaining', 0):.1f}s")
    print()
    print("=" * 70)

# Attach method to class
PhysicsSRPipeline.print_summary = print_summary

print("PhysicsSRPipeline class v4.1 defined.")

---
## Section 3: Quick Pipeline Function

In [None]:
# ==============================================================================
# QUICK PIPELINE FUNCTION
# ==============================================================================

def run_physics_sr(
    X: np.ndarray,
    y: np.ndarray,
    feature_names: List[str],
    variable_dimensions: Dict[str, List[float]] = None,
    target_dimensions: List[float] = None,
    target_name: str = 'y',
    max_time_seconds: float = 180,
    config: Dict = None,
    verbose: bool = True
) -> Dict[str, Any]:
    """
    Run complete Physics-SR pipeline with minimal setup.
    
    This is a convenience wrapper around PhysicsSRPipeline.
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix (n_samples, n_features)
    y : np.ndarray
        Target vector (n_samples,)
    feature_names : List[str]
        Feature names
    variable_dimensions : Dict[str, List[float]], optional
        Variable dimensions {var_name: [M, L, T, Theta]}
    target_dimensions : List[float], optional
        Target dimensions [M, L, T, Theta]
    target_name : str
        Name of target variable
    max_time_seconds : float
        Total runtime budget (default: 180s)
    config : Dict, optional
        Pipeline configuration overrides
    verbose : bool
        Whether to print progress (default: True)
        
    Returns
    -------
    Dict[str, Any]
        Complete pipeline results
        
    Examples
    --------
    >>> results = run_physics_sr(
    ...     X, y, ['q_c', 'N_d'],
    ...     variable_dimensions={'q_c': [0,0,0,0], 'N_d': [0,-3,0,0]},
    ...     target_dimensions=[0,0,-1,0],
    ...     target_name='dq_r/dt'
    ... )
    >>> print(results['final_equation'])
    """
    # Create UserInputs
    user_inputs = UserInputs(
        variable_dimensions=variable_dimensions,
        target_dimensions=target_dimensions,
        target_name=target_name
    )
    
    # Merge config
    full_config = config or {}
    full_config['verbose'] = verbose
    
    # Create and run pipeline
    pipeline = PhysicsSRPipeline(
        config=full_config,
        max_time_seconds=max_time_seconds
    )
    
    results = pipeline.run(X, y, feature_names, user_inputs)
    
    # Add pipeline reference for further inspection
    results['pipeline'] = pipeline
    
    return results

print("run_physics_sr convenience function defined.")

---
## Section 4: Demonstration

In [None]:
# ==============================================================================
# DEMO CONTROL FLAG
# ==============================================================================

_RUN_DEMO = False  # Set to True to run demonstration

if _RUN_DEMO:
    print("=" * 70)
    print(" RUNNING END-TO-END DEMONSTRATION (v4.1)")
    print("=" * 70)

In [None]:
# ==============================================================================
# DEMO: Generate Warm Rain Test Data
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Generate Test Data")
    
    # Generate warm rain data
    np.random.seed(RANDOM_SEED)
    warm_rain = generate_warm_rain_data(n_samples=500, noise_level=0.01)
    
    X = warm_rain['X']
    y = warm_rain['y']
    feature_names = warm_rain['feature_names']
    
    print(f"Generated {len(y)} samples")
    print(f"Features: {feature_names}")
    print(f"True equation: {warm_rain['true_equation']}")

In [None]:
# ==============================================================================
# DEMO: Create User Inputs
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Create User Inputs")
    
    user_inputs = UserInputs(
        variable_dimensions={
            'q_c': [0, 0, 0, 0],    # dimensionless (mixing ratio)
            'N_d': [0, -3, 0, 0],   # m^-3 (number concentration)
            'r_eff': [0, 1, 0, 0],  # m (effective radius)
            'LWC': [1, -3, 0, 0]    # kg/m^3 (liquid water content)
        },
        target_dimensions=[0, 0, -1, 0],  # s^-1 (rate)
        target_name='dq_r/dt',
        expected_form='power_law',
        physical_bounds={'min': 0.0}  # Non-negative rate
    )
    
    print("User inputs configured:")
    print(f"  Target: {user_inputs.target_name}")
    print(f"  Expected form: {user_inputs.expected_form}")
    print(f"  Target dimensions: {user_inputs.target_dimensions}")

In [None]:
# ==============================================================================
# DEMO: Run Complete Pipeline
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Run Complete Pipeline")
    
    # Initialize pipeline with faster settings for demo
    pipeline = PhysicsSRPipeline(
        config={
            'importance_threshold': 0.01,
            'powerlaw_r2_threshold': 0.9,
            'stability_threshold': 0.5,
            'pysr_mode': 'fast',
            'max_poly_degree': 3,
            'stlsq_threshold': 0.1,
            'cv_folds': 5,
            'ebic_gamma': 0.5,
            'n_bootstrap': 100,
            'confidence_level': 0.95,
            'n_jobs': 2,
            'verbose': True
        },
        max_time_seconds=180
    )
    
    # Run pipeline
    results = pipeline.run(X, y, feature_names, user_inputs)

In [None]:
# ==============================================================================
# DEMO: Display Results
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Results Summary")
    
    pipeline.print_summary()
    
    print()
    print("Comparison with true equation:")
    print(f"  True:       {warm_rain['true_equation']}")
    print(f"  Discovered: {pipeline.get_final_equation()[:60]}...")

In [None]:
# ==============================================================================
# DEMO: Quick Pipeline Alternative
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Quick Pipeline (Alternative)")
    
    # Using the convenience function
    quick_results = run_physics_sr(
        X, y, feature_names,
        variable_dimensions={
            'q_c': [0, 0, 0, 0],
            'N_d': [0, -3, 0, 0]
        },
        target_dimensions=[0, 0, -1, 0],
        target_name='dq_r/dt',
        max_time_seconds=120,
        config={'pysr_mode': 'fast'},
        verbose=True
    )
    
    print(f"Quick pipeline result: {quick_results['final_equation'][:50]}...")

---
## Section 5: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 12_Full_Pipeline.ipynb - Module Summary (v4.1)")
print("=" * 70)
print()
print("CLASS: PhysicsSRPipeline")
print("-" * 70)
print()
print("Purpose:")
print("  Complete end-to-end physics-informed symbolic regression pipeline.")
print("  Integrates all three stages with v4.1 enhancements:")
print("  - TimeBudgetManager for adaptive time allocation")
print("  - Structure-guided feature library (PySR -> E-WSINDy)")
print("  - Source attribution for selected terms")
print("  - Comprehensive timing profile")
print()
print("Main Methods:")
print("  __init__(config, max_time_seconds)")
print("      Initialize pipeline with configuration")
print()
print("  run(X, y, feature_names, user_inputs) -> Dict")
print("      Execute complete pipeline")
print("      Returns: {stage1, stage2, stage3, final_equation, timing}")
print()
print("  get_final_equation() -> str")
print("      Get the discovered equation")
print()
print("  get_timing_report() -> str")
print("      Get detailed timing profile (v4.1)")
print()
print("  print_summary() -> None")
print("      Print comprehensive results summary")
print()
print("Pipeline Stages (v4.1):")
print("-" * 70)
print("  Stage 1: Variable Selection")
print("    - 1.1 Buckingham Pi dimensional analysis")
print("    - 1.2 PAN+SR variable screening")
print("    - 1.3 Power-law symmetry detection")
print("    - 1.4 iRF interaction discovery")
print()
print("  Stage 2: Structure-Guided Discovery (v4.0/v4.1)")
print("    - 2.1 PySR structure exploration (adaptive timeout)")
print("    - 2.2 Structure parsing (NEW v4.0)")
print("    - 2.3 4-layer augmented library (NEW v4.0)")
print("    - 2.4 E-WSINDy sparse selection (source attribution)")
print("    - 2.5 Adaptive Lasso (conditional on budget)")
print()
print("  Stage 3: Validation & UQ")
print("    - 3.1 Model selection (CV + EBIC)")
print("    - 3.2 Physics verification (dimensional + bounds)")
print("    - 3.3 Bootstrap UQ (three-layer, adaptive count)")
print("    - 3.4 Statistical inference")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Create user inputs with physics information
user_inputs = UserInputs(
    variable_dimensions={'x': [0, 1, 0, 0], 'v': [0, 1, -1, 0]},
    target_dimensions=[0, 2, -2, 0],
    target_name='energy'
)

# Run complete pipeline with time budget
pipeline = PhysicsSRPipeline(max_time_seconds=180)
results = pipeline.run(X, y, feature_names, user_inputs)

# Get results
print(f"Equation: {pipeline.get_final_equation()}")
print(pipeline.get_timing_report())
pipeline.print_summary()
""")
print()
print("Quick Pipeline Function:")
print("-" * 70)
print("""
# One-liner for simple usage
results = run_physics_sr(
    X, y, ['q_c', 'N_d'],
    variable_dimensions={'q_c': [0,0,0,0], 'N_d': [0,-3,0,0]},
    target_dimensions=[0,0,-1,0],
    target_name='dq_r/dt',
    max_time_seconds=180
)
print(results['final_equation'])
""")
print()
print("=" * 70)
print(" PHYSICS-SR FRAMEWORK v4.1 - COMPLETE")
print("=" * 70)
print()
print("All 12 notebooks loaded successfully.")
print("Ready for physics-informed symbolic regression!")
print()
print("v4.1 Key Features:")
print("  - TimeBudgetManager for adaptive time allocation")
print("  - Structure-guided library (PySR terms feed E-WSINDy)")
print("  - Source attribution ([PySR], [Var], [Poly], [Op])")
print("  - Float32 precision for memory efficiency")
print("  - Adaptive bootstrap count based on remaining budget")
print()
print("=" * 70)