# 12_Full_Pipeline - Physics-SR Framework v3.0

## Complete Pipeline Integration + End-to-End Demonstration

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

Integrate all components into a complete, end-to-end symbolic regression pipeline:

**Stage 1: Variable Selection & Preprocessing**
- Buckingham Pi dimensional analysis
- PAN+SR nonlinear variable screening
- Power-law symmetry detection
- iRF interaction discovery

**Stage 2: Structure Discovery**
- Feature library construction
- PySR genetic programming
- E-WSINDy with STLSQ
- Adaptive Lasso

**Stage 3: Validation & UQ**
- Model selection (CV + EBIC)
- Physics verification
- Three-layer bootstrap UQ

### Framework Summary

| Stage | Component | Purpose |
|-------|-----------|--------|
| 1.1 | Buckingham Pi | Dimensional reduction |
| 1.2 | PAN+SR | Variable screening |
| 1.3 | Symmetry | Power-law detection |
| 1.4 | iRF | Interaction discovery |
| 2.1 | Feature Library | Candidate terms |
| 2.2a | PySR | Genetic programming |
| 2.2b | E-WSINDy | Sparse regression |
| 2.2c | Adaptive Lasso | Oracle property |
| 3.1 | Model Selection | CV + EBIC |
| 3.2 | Physics Check | Dimensional + bounds |
| 3.3 | Bootstrap UQ | Three-layer UQ |

---
## Section 1: Header and Imports

In [None]:
"""
12_Full_Pipeline.ipynb - Complete Pipeline Integration
=======================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- PhysicsSRPipeline: Complete end-to-end pipeline
- Stage-by-stage execution with checkpoints
- Comprehensive result reporting
- Visualization utilities

Usage:
    pipeline = PhysicsSRPipeline()
    result = pipeline.run(X, y, feature_names, user_inputs)
    pipeline.print_summary()

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

print("Loading all component modules...")
print()

In [None]:
# Import all component modules
%run 00_Core.ipynb
%run 01_BuckinghamPi.ipynb
%run 02_VariableScreening.ipynb
%run 03_SymmetryAnalysis.ipynb
%run 04_InteractionDiscovery.ipynb
%run 05_FeatureLibrary.ipynb
%run 06_PySR.ipynb
%run 07_EWSINDy_STLSQ.ipynb
%run 08_AdaptiveLasso.ipynb
%run 09_ModelSelection.ipynb
%run 10_PhysicsVerification.ipynb
%run 11_UQ_Inference.ipynb

print()
print("=" * 70)
print(" All modules loaded successfully!")
print("=" * 70)

In [None]:
# Verify all classes are available
print("Verifying component classes...")
print()

components = [
    ('BuckinghamPiAnalyzer', '01_BuckinghamPi'),
    ('PANSRScreener', '02_VariableScreening'),
    ('PowerLawDetector', '03_SymmetryAnalysis'),
    ('IRFInteractionDiscoverer', '04_InteractionDiscovery'),
    ('FeatureLibraryBuilder', '05_FeatureLibrary'),
    ('PySRDiscoverer', '06_PySR'),
    ('StructureParser', '06_PySR'),
    ('EWSINDySTLSQ', '07_EWSINDy_STLSQ'),
    ('AdaptiveLassoSelector', '08_AdaptiveLasso'),
    ('ModelSelector', '09_ModelSelection'),
    ('PhysicsVerifier', '10_PhysicsVerification'),
    ('BootstrapUQ', '11_UQ_Inference'),
    ('StatisticalInference', '11_UQ_Inference')
]

all_loaded = True
for class_name, module in components:
    if class_name in dir():
        print(f"  [OK] {class_name} from {module}")
    else:
        print(f"  [MISSING] {class_name} from {module}")
        all_loaded = False

print()
if all_loaded:
    print("All components verified!")
else:
    print("WARNING: Some components missing!")

---
## Section 2: Pipeline Class Definition

In [None]:
# ==============================================================================
# PHYSICS SR PIPELINE CLASS
# ==============================================================================

class PhysicsSRPipeline:
    """
    Complete Physics-Informed Symbolic Regression Pipeline.
    
    Integrates all three stages:
    - Stage 1: Variable selection and preprocessing
    - Stage 2: Structure discovery (parallel pathways)
    - Stage 3: Validation and uncertainty quantification
    
    Attributes
    ----------
    config : Dict
        Pipeline configuration
    
    Examples
    --------
    >>> pipeline = PhysicsSRPipeline()
    >>> result = pipeline.run(X, y, feature_names, user_inputs)
    >>> print(pipeline.get_final_equation())
    """
    
    def __init__(self, config: Dict = None):
        """
        Initialize PhysicsSRPipeline.
        
        Parameters
        ----------
        config : Dict, optional
            Pipeline configuration. Uses defaults if not specified.
        """
        self.config = config or self._default_config()
        
        # Results storage
        self._stage1_results = None
        self._stage2_results = None
        self._stage3_results = None
        self._final_equation = None
        self._run_complete = False
    
    def _default_config(self) -> Dict:
        """
        Get default pipeline configuration.
        """
        return {
            # Stage 1
            'screening_threshold': 0.8,
            'power_law_r2_threshold': 0.9,
            'interaction_stability': 0.5,
            'buckingham_max_exponent': 2,  # ADDED: Buckingham Pi max exponent (2 for speed, 4 for completeness)
            
            # Stage 2
            'max_poly_degree': 3,
            'stlsq_threshold': 0.1,
            'pysr_maxsize': 25,
            'pysr_niterations': 50,
            
            # Stage 3
            'cv_folds': 5,
            'ebic_gamma': 0.5,
            'n_bootstrap': 100,
            'confidence_level': 0.95
        }
    
    def run(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str],
        user_inputs: UserInputs = None
    ) -> Dict[str, Any]:
        """
        Run complete pipeline.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix (n_samples, n_features)
        y : np.ndarray
            Target vector
        feature_names : List[str]
            Feature names
        user_inputs : UserInputs, optional
            User-provided physics information
        
        Returns
        -------
        Dict[str, Any]
            Complete pipeline results
        """
        print("=" * 70)
        print(" Physics-SR Pipeline v3.0")
        print("=" * 70)
        print()
        
        # Stage 1: Variable Selection
        print("STAGE 1: Variable Selection & Preprocessing")
        print("-" * 70)
        self._stage1_results = self._run_stage1(X, y, feature_names, user_inputs)
        print()
        
        # Get working features
        X_work, feature_names_work = self._select_working_features(
            X, feature_names, self._stage1_results
        )
        
        # Stage 2: Structure Discovery
        print("STAGE 2: Structure Discovery")
        print("-" * 70)
        self._stage2_results = self._run_stage2(X_work, y, feature_names_work)
        print()
        
        # Stage 3: Validation & UQ
        print("STAGE 3: Validation & Uncertainty Quantification")
        print("-" * 70)
        self._stage3_results = self._run_stage3(
            X_work, y, feature_names_work, 
            self._stage2_results, user_inputs
        )
        print()
        
        # Extract final equation
        self._final_equation = self._extract_final_equation()
        
        self._run_complete = True
        
        print("=" * 70)
        print(" Pipeline Complete!")
        print("=" * 70)
        
        return {
            'stage1': self._stage1_results,
            'stage2': self._stage2_results,
            'stage3': self._stage3_results,
            'final_equation': self._final_equation
        }
    
    def _run_stage1(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str],
        user_inputs: UserInputs
    ) -> Dict[str, Any]:
        """
        Run Stage 1: Variable Selection.
        """
        results = {}
        
        # 1.1 Buckingham Pi (if dimensions provided)
        if user_inputs and user_inputs.variable_dimensions:
            print("  1.1 Buckingham Pi Analysis...")
            pi_analyzer = BuckinghamPiAnalyzer(
                max_exponent=self.config.get('buckingham_max_exponent', 2)  # ADDED: configurable
            )
            results['buckingham_pi'] = pi_analyzer.analyze(
                user_inputs.variable_dimensions  # FIXED: removed feature_names parameter
            )
            print(f"      Reduced from {len(feature_names)} to "
                  f"{results['buckingham_pi']['n_pi_groups']} Pi groups")
        else:
            print("  1.1 Buckingham Pi: Skipped (no dimensions provided)")
            results['buckingham_pi'] = None
        
        # 1.2 Variable Screening
        print("  1.2 PAN+SR Variable Screening...")
        screener = PANSRScreener(threshold=self.config['screening_threshold'])
        results['screening'] = screener.screen(X, y, feature_names)
        print(f"      Selected {results['screening']['n_selected']} of "
              f"{len(feature_names)} variables")
        
        # 1.3 Symmetry Analysis
        print("  1.3 Power-Law Symmetry Detection...")
        detector = PowerLawDetector(
            r2_threshold=self.config['power_law_r2_threshold']
        )
        results['symmetry'] = detector.detect(X, y, feature_names)
        print(f"      Power-law detected: {results['symmetry']['power_law_detected']}")
        
        # 1.4 Interaction Discovery
        print("  1.4 iRF Interaction Discovery...")
        discoverer = IRFInteractionDiscoverer(
            stability_threshold=self.config['interaction_stability']
        )
        results['interactions'] = discoverer.discover(X, y, feature_names)
        print(f"      Found {results['interactions']['n_stable_interactions']} "
              f"stable interactions")
        
        return results
    
    def _select_working_features(
        self,
        X: np.ndarray,
        feature_names: List[str],
        stage1_results: Dict
    ) -> Tuple[np.ndarray, List[str]]:
        """
        Select working features based on Stage 1.
        """
        # Use screening results
        selected = stage1_results['screening']['selected_features']
        
        if len(selected) == 0:
            # Fallback: use all features
            return X, feature_names
        
        # Get indices
        indices = [feature_names.index(f) for f in selected if f in feature_names]
        
        if len(indices) == 0:
            return X, feature_names
        
        return X[:, indices], [feature_names[i] for i in indices]
    
    def _run_stage2(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str]
    ) -> Dict[str, Any]:
        """
        Run Stage 2: Structure Discovery.
        """
        results = {}
        
        # 2.1 Build Feature Library
        print("  2.1 Building Feature Library...")
        builder = FeatureLibraryBuilder(
            max_poly_degree=self.config['max_poly_degree']
        )
        Phi, library_names = builder.build(X, feature_names)
        results['feature_library'] = {
            'Phi': Phi,
            'library_names': library_names,
            'n_features': Phi.shape[1]
        }
        print(f"      Library size: {Phi.shape[1]} features")
        
        # 2.2a PySR
        print("  2.2a PySR Genetic Programming...")
        pysr_discoverer = PySRDiscoverer(
            maxsize=self.config['pysr_maxsize'],
            niterations=self.config['pysr_niterations']
        )
        results['pysr'] = pysr_discoverer.discover(X, y, feature_names)
        print(f"      Best equation: {results['pysr']['best_equation'][:50]}...")
        
        # 2.2b E-WSINDy with STLSQ
        print("  2.2b E-WSINDy with STLSQ...")
        stlsq = EWSINDySTLSQ(
            threshold=self.config['stlsq_threshold'],
            use_weak_form=False
        )
        results['stlsq'] = stlsq.fit(Phi, y, feature_names=library_names)
        print(f"      Active terms: {results['stlsq']['n_active_terms']}")
        
        # 2.2c Adaptive Lasso
        print("  2.2c Adaptive Lasso...")
        alasso = AdaptiveLassoSelector()
        results['alasso'] = alasso.fit(Phi, y, feature_names=library_names)
        print(f"      Active terms: {results['alasso']['n_active_terms']}")
        
        return results
    
    def _run_stage3(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str],
        stage2_results: Dict,
        user_inputs: UserInputs
    ) -> Dict[str, Any]:
        """
        Run Stage 3: Validation & UQ.
        """
        results = {}
        Phi = stage2_results['feature_library']['Phi']
        library_names = stage2_results['feature_library']['library_names']
        
        # 3.1 Model Selection
        print("  3.1 Model Selection (CV + EBIC)...")
        
        # Prepare candidates
        candidates = {
            'stlsq': (Phi, stage2_results['stlsq']['support']),
            'alasso': (Phi, stage2_results['alasso']['support'])
        }
        
        selector = ModelSelector(
            n_folds=self.config['cv_folds'],
            ebic_gamma=self.config['ebic_gamma']
        )
        results['model_selection'] = selector.compare_models(
            candidates, y, p_total=Phi.shape[1]
        )
        print(f"      Best by CV: {results['model_selection']['best_model_cv']}")
        print(f"      Best by EBIC: {results['model_selection']['best_model_ebic']}")
        
        # 3.2 Physics Verification
        print("  3.2 Physics Verification...")
        verifier = PhysicsVerifier()
        
        # Get predictions from best model
        best_model = results['model_selection']['best_model_cv']
        best_coefs = stage2_results[best_model]['coefficients']
        y_pred = Phi @ best_coefs
        
        # Define bounds if available
        bounds = {'min': 0} if user_inputs else None
        
        results['physics'] = verifier.verify(
            equation_terms=[],  # Would need parsing
            variable_dims=user_inputs.variable_dimensions if user_inputs else {},
            target_dims=user_inputs.target_dimensions if user_inputs else [0,0,0,0],
            y_pred=y_pred,
            physical_bounds=bounds
        )
        print(f"      Bounds satisfied: {results['physics']['bounds_satisfied']}")
        
        # 3.3 Bootstrap UQ
        print("  3.3 Bootstrap Uncertainty Quantification...")
        uq = BootstrapUQ(
            n_bootstrap=self.config['n_bootstrap'],
            confidence_level=self.config['confidence_level']
        )
        results['uq'] = uq.run(Phi, y, library_names)
        
        n_high = sum(1 for p in results['uq']['inclusion_probs'] if p > 0.9)
        print(f"      High-confidence terms: {n_high}")
        
        return results
    
    def _extract_final_equation(self) -> str:
        """
        Extract final equation from results.
        """
        best_model = self._stage3_results['model_selection']['best_model_cv']
        return self._stage2_results[best_model]['equation']
    
    def get_final_equation(self) -> str:
        """
        Get the final discovered equation.
        """
        if not self._run_complete:
            raise ValueError("Must run pipeline first")
        return self._final_equation
    
    def print_summary(self) -> None:
        """
        Print complete pipeline summary.
        """
        if not self._run_complete:
            print("Pipeline not yet executed. Call run() first.")
            return
        
        print()
        print("=" * 70)
        print(" PHYSICS-SR PIPELINE SUMMARY")
        print("=" * 70)
        print()
        
        # Stage 1
        print("STAGE 1: Variable Selection")
        print("-" * 40)
        s1 = self._stage1_results
        print(f"  Selected variables: {s1['screening']['selected_features']}")
        print(f"  Power-law: {s1['symmetry']['power_law_detected']}")
        print()
        
        # Stage 2
        print("STAGE 2: Structure Discovery")
        print("-" * 40)
        s2 = self._stage2_results
        print(f"  Library size: {s2['feature_library']['n_features']}")
        print(f"  PySR: {s2['pysr']['best_equation'][:40]}...")
        print(f"  STLSQ active: {s2['stlsq']['n_active_terms']}")
        print(f"  ALasso active: {s2['alasso']['n_active_terms']}")
        print()
        
        # Stage 3
        print("STAGE 3: Validation & UQ")
        print("-" * 40)
        s3 = self._stage3_results
        print(f"  Best model: {s3['model_selection']['best_model_cv']}")
        print(f"  Physics valid: {s3['physics']['bounds_satisfied']}")
        print()
        
        # Final
        print("FINAL EQUATION:")
        print("-" * 40)
        print(f"  {self._final_equation}")
        print()
        print("=" * 70)

---
## Section 3: Complete Demonstration

In [None]:
# ==============================================================================
# DEMO CONTROL FLAG
# ==============================================================================

_RUN_DEMO = False  # Set to True to run demo

if _RUN_DEMO:
    print("=" * 70)
    print(" RUNNING END-TO-END DEMONSTRATION")
    print("=" * 70)

In [None]:
# ==============================================================================
# DEMO: Generate Warm Rain Test Data
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Generate Test Data")
    
    # Generate warm rain data
    np.random.seed(42)
    warm_rain = generate_warm_rain_data(n_samples=500, noise_level=0.01)
    
    X = warm_rain['X']
    y = warm_rain['y']
    feature_names = warm_rain['feature_names']
    
    print(f"Generated {len(y)} samples")
    print(f"Features: {feature_names}")
    print(f"True equation: {warm_rain['true_equation']}")

In [None]:
# ==============================================================================
# DEMO: Create User Inputs
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Create User Inputs")
    
    user_inputs = UserInputs(
        variable_dimensions={
            'q_c': [0, 0, 0, 0],    # dimensionless
            'N_d': [0, -3, 0, 0],   # m^-3
            'r_eff': [0, 1, 0, 0],  # m
            'LWC': [1, -3, 0, 0]    # kg/m^3
        },
        target_dimensions=[0, 0, -1, 0],  # s^-1
        target_name='dq_r/dt',
        expected_form='power_law'
    )
    
    print("User inputs configured:")
    print(f"  Target: {user_inputs.target_name}")
    print(f"  Expected form: {user_inputs.expected_form}")

In [None]:
# ==============================================================================
# DEMO: Run Complete Pipeline
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Run Complete Pipeline")
    
    # Initialize pipeline with faster settings for demo
    pipeline = PhysicsSRPipeline(config={
        'screening_threshold': 0.8,
        'power_law_r2_threshold': 0.9,
        'interaction_stability': 0.5,
        'max_poly_degree': 3,
        'stlsq_threshold': 0.1,
        'pysr_maxsize': 20,
        'pysr_niterations': 20,
        'cv_folds': 5,
        'ebic_gamma': 0.5,
        'n_bootstrap': 50,
        'confidence_level': 0.95
    })
    
    # Run
    results = pipeline.run(X, y, feature_names, user_inputs)

In [None]:
# ==============================================================================
# DEMO: Display Results
# ==============================================================================

if _RUN_DEMO:
    print()
    print_section_header("Demo: Results Summary")
    
    pipeline.print_summary()
    
    print()
    print("Comparison with true equation:")
    print(f"  True: {warm_rain['true_equation']}")
    print(f"  Discovered: {pipeline.get_final_equation()[:60]}...")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 12_Full_Pipeline.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASS: PhysicsSRPipeline")
print("-" * 70)
print()
print("Purpose:")
print("  Complete end-to-end physics-informed symbolic regression pipeline.")
print("  Integrates all three stages with comprehensive reporting.")
print()
print("Main Methods:")
print("  run(X, y, feature_names, user_inputs)")
print("      Execute complete pipeline")
print("      Returns: dict with stage1, stage2, stage3, final_equation")
print()
print("  get_final_equation()")
print("      Get the discovered equation")
print()
print("  print_summary()")
print("      Print comprehensive results summary")
print()
print("Pipeline Stages:")
print("  Stage 1: Variable Selection")
print("    - Buckingham Pi dimensional analysis")
print("    - PAN+SR variable screening")
print("    - Power-law symmetry detection")
print("    - iRF interaction discovery")
print()
print("  Stage 2: Structure Discovery")
print("    - Feature library construction")
print("    - PySR genetic programming")
print("    - E-WSINDy with STLSQ")
print("    - Adaptive Lasso")
print()
print("  Stage 3: Validation & UQ")
print("    - Model selection (CV + EBIC)")
print("    - Physics verification")
print("    - Three-layer bootstrap UQ")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Create user inputs with physics information
user_inputs = UserInputs(
    variable_dimensions={'x': [0, 1, 0, 0], 'v': [0, 1, -1, 0]},
    target_dimensions=[0, 2, -2, 0],
    target_name='energy'
)

# Run complete pipeline
pipeline = PhysicsSRPipeline()
results = pipeline.run(X, y, feature_names, user_inputs)

# Get results
print(f"Equation: {pipeline.get_final_equation()}")
pipeline.print_summary()
""")
print()
print("=" * 70)
print(" PHYSICS-SR FRAMEWORK v3.0 - COMPLETE")
print("=" * 70)
print()
print("All 13 notebooks loaded successfully.")
print("Ready for physics-informed symbolic regression!")
print()
print("=" * 70)