# 11_UQ_Inference - Physics-SR Framework v4.1

## Stage 3.3-3.4: Three-Layer Bootstrap UQ + Statistical Inference

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Contact:** zz3239@columbia.edu  
**Date:** January 2026  
**Version:** 4.1 (Structure-Guided Feature Library Enhancement + Computational Optimization)

---

### Purpose

Comprehensive uncertainty quantification for discovered equations with three layers:

1. **Structural UQ:** Which terms should be included? (Bootstrap inclusion probability)
2. **Parametric UQ:** What are the coefficient values? (Bootstrap confidence intervals)
3. **Predictive UQ:** How uncertain are predictions? (Prediction intervals)

This is a **minor update** module for v4.1.

### v4.1 Modifications

| Feature | v3.0 | v4.1 |
|---------|------|------|
| Version | 3.0 | 4.1 |
| Parallel bootstrap | Not supported | n_jobs parameter |
| Report format | Basic | Enhanced v4.1 format |

### Theoretical Foundation

**Stability Selection (Meinshausen & Buhlmann 2010):**
$$E[\text{False Positives}] \leq \frac{q^2}{(2\pi_{thr} - 1) \cdot p}$$

### Reference

- Meinshausen, N., & Buhlmann, P. (2010). Stability selection. *JRSS-B*, 72(4), 417-473.
- Framework v4.0/v4.1 Section 5.3: Uncertainty Quantification

---
## Section 1: Header and Imports

In [None]:
"""
11_UQ_Inference.ipynb - Three-Layer Bootstrap UQ + Statistical Inference
=========================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v4.1

This module provides:
- BootstrapUQ: Three-layer uncertainty quantification
- StatisticalInference: Hypothesis testing for coefficients
- Structural, parametric, and predictive uncertainty
- Robust estimators (median, MAD) for non-normal data

v4.1 Key Changes from v3.0:
- n_jobs parameter for parallel bootstrap
- Updated version number to v4.1
- Enhanced report format with v4.1 styling
- Interface fully compatible with Stage 2 outputs

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
Contact: zz3239@columbia.edu
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for UQ
from scipy import stats
from scipy.stats import norm
from typing import Dict, List, Tuple, Optional, Any
from concurrent.futures import ThreadPoolExecutor, as_completed
import warnings

print("11_UQ_Inference v4.1: Additional imports successful.")

---
## Section 2: Class Definitions

In [None]:
# ==============================================================================
# BOOTSTRAP UQ CLASS (v4.1 Minor Update)
# ==============================================================================

class BootstrapUQ:
    """
    Three-Layer Bootstrap Uncertainty Quantification (v4.1 Minor Update).
    
    Layer 1: Structural UQ (inclusion probability)
    Layer 2: Parametric UQ (coefficient CI)
    Layer 3: Predictive UQ (prediction intervals)
    
    v4.1 Features:
    - n_jobs parameter for parallel bootstrap execution
    - Robust estimators (median, MAD)
    - Confidence classification (HIGH/MEDIUM/LOW)
    
    Attributes
    ----------
    n_bootstrap : int
        Number of bootstrap samples (default: 200)
    selection_method : str
        Method for sparse selection: 'stlsq' or 'alasso'
    confidence_level : float
        Confidence level for intervals (default: 0.95)
    n_jobs : int
        Parallel jobs for bootstrap (v4.1, default: 2)
    stlsq_threshold : float
        STLSQ sparsity threshold (default: 0.1)
    
    Methods
    -------
    run(feature_library, y, library_names, selection_kwargs) -> Dict
        Run complete bootstrap UQ analysis
    predictive_uq(X_new, residual_var) -> Tuple
        Compute prediction intervals
    compute_prediction_intervals(...) -> Tuple
        Alias for predictive_uq
    
    Reference
    ---------
    Meinshausen & Buhlmann (2010). JRSSB, 72(4), 417-473.
    
    Examples
    --------
    >>> uq = BootstrapUQ(n_bootstrap=200, n_jobs=2)
    >>> result = uq.run(Phi, y, library_names)
    >>> print(f"Inclusion probs: {result['inclusion_probabilities']}")
    """
    
    def __init__(
        self,
        n_bootstrap: int = DEFAULT_N_BOOTSTRAP,
        selection_method: str = 'stlsq',
        confidence_level: float = DEFAULT_CONFIDENCE_LEVEL,
        n_jobs: int = 2,
        stlsq_threshold: float = DEFAULT_STLSQ_THRESHOLD
    ):
        """
        Initialize BootstrapUQ.
        
        Parameters
        ----------
        n_bootstrap : int
            Number of bootstrap samples. Default: 200
        selection_method : str
            'stlsq' or 'alasso'. Default: 'stlsq'
        confidence_level : float
            Confidence level (e.g., 0.95 for 95% CI). Default: 0.95
        n_jobs : int
            Number of parallel jobs for bootstrap (v4.1). Default: 2
        stlsq_threshold : float
            STLSQ threshold for sparsity. Default: 0.1
        """
        self.n_bootstrap = n_bootstrap
        self.selection_method = selection_method
        self.confidence_level = confidence_level
        self.n_jobs = n_jobs
        self.stlsq_threshold = stlsq_threshold
        
        # Internal state
        self._library_names = None
        self._n_features = None
        self._support_samples = None
        self._coef_samples = None
        self._inclusion_probs = None
        self._estimates = None
        self._ci_lower = None
        self._ci_upper = None
        self._se = None
        self._residual_variance = None
        self._uq_complete = False
    
    def run(
        self,
        feature_library: np.ndarray,
        y: np.ndarray,
        library_names: List[str] = None,
        selection_kwargs: Dict = None
    ) -> Dict[str, Any]:
        """
        Run complete bootstrap UQ analysis.
        
        Parameters
        ----------
        feature_library : np.ndarray
            Feature library matrix
        y : np.ndarray
            Target vector
        library_names : List[str], optional
            Feature names
        selection_kwargs : Dict, optional
            Additional kwargs for selection method
            
        Returns
        -------
        Dict
            - inclusion_probabilities: Array of P(selected) for each feature
            - structural_confidence: Classification (HIGH/MEDIUM/LOW)
            - coefficient_estimates: Median coefficients
            - coefficient_CI: 95% confidence intervals
            - coefficient_SE: Standard errors
            - bootstrap_supports: All support matrices
            - bootstrap_coefficients: All coefficient samples
            - residual_variance: Estimated noise variance
        """
        n_samples, n_features = feature_library.shape
        self._n_features = n_features
        
        if library_names is None:
            self._library_names = [f'f{i}' for i in range(n_features)]
        else:
            self._library_names = list(library_names)
        
        # Initialize storage
        self._support_samples = np.zeros((self.n_bootstrap, n_features), dtype=bool)
        self._coef_samples = np.zeros((self.n_bootstrap, n_features))
        
        # Run bootstrap - parallel or sequential
        if self.n_jobs > 1:
            self._run_parallel_bootstrap(feature_library, y, selection_kwargs)
        else:
            self._run_sequential_bootstrap(feature_library, y, selection_kwargs)
        
        # Layer 1: Structural UQ
        self._inclusion_probs, structural_conf = self._structural_uq(
            self._support_samples, self._library_names
        )
        
        # Layer 2: Parametric UQ
        self._estimates, ci_bounds, self._se = self._parametric_uq(
            self._coef_samples, self._support_samples
        )
        self._ci_lower = ci_bounds[:, 0]
        self._ci_upper = ci_bounds[:, 1]
        
        # Estimate residual variance
        y_pred = feature_library @ self._estimates
        self._residual_variance = np.var(y - y_pred)
        
        # Confidence classification
        confidence_class = self.get_confidence_classification()
        
        self._uq_complete = True
        
        return {
            'inclusion_probabilities': self._inclusion_probs,
            'inclusion_probs': self._inclusion_probs,  # Alias
            'structural_confidence': structural_conf,
            'confidence_class': confidence_class,
            'coefficient_estimates': self._estimates,
            'estimates': self._estimates,  # Alias
            'coefficient_CI': np.column_stack([self._ci_lower, self._ci_upper]),
            'ci_lower': self._ci_lower,
            'ci_upper': self._ci_upper,
            'coefficient_SE': self._se,
            'se': self._se,  # Alias
            'bootstrap_supports': self._support_samples,
            'support_samples': self._support_samples,  # Alias
            'bootstrap_coefficients': self._coef_samples,
            'coef_samples': self._coef_samples,  # Alias
            'residual_variance': self._residual_variance,
            'n_bootstrap': self.n_bootstrap,
            'confidence_level': self.confidence_level,
            'n_jobs': self.n_jobs  # v4.1
        }
    
    def _run_sequential_bootstrap(
        self,
        feature_library: np.ndarray,
        y: np.ndarray,
        selection_kwargs: Dict = None
    ) -> None:
        """
        Run bootstrap sequentially.
        """
        for b in range(self.n_bootstrap):
            coefs, support = self._run_single_bootstrap(
                feature_library, y, b, selection_kwargs
            )
            self._support_samples[b] = support
            self._coef_samples[b] = coefs
    
    def _run_parallel_bootstrap(
        self,
        feature_library: np.ndarray,
        y: np.ndarray,
        selection_kwargs: Dict = None
    ) -> None:
        """
        Run bootstrap in parallel (v4.1).
        """
        def bootstrap_task(seed):
            return seed, self._run_single_bootstrap(
                feature_library, y, seed, selection_kwargs
            )
        
        with ThreadPoolExecutor(max_workers=self.n_jobs) as executor:
            futures = [executor.submit(bootstrap_task, b) for b in range(self.n_bootstrap)]
            
            for future in as_completed(futures):
                try:
                    b, (coefs, support) = future.result()
                    self._support_samples[b] = support
                    self._coef_samples[b] = coefs
                except Exception as e:
                    warnings.warn(f"Bootstrap iteration failed: {e}")
    
    def _run_single_bootstrap(
        self,
        Phi: np.ndarray,
        y: np.ndarray,
        seed: int,
        selection_kwargs: Dict = None
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Run single bootstrap iteration.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature library
        y : np.ndarray
            Target vector
        seed : int
            Random seed for this iteration
        selection_kwargs : Dict, optional
            Additional selection parameters
            
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (coefficients, support)
        """
        # Generate bootstrap sample
        rng = np.random.RandomState(RANDOM_SEED + seed)
        n = len(y)
        indices = rng.choice(n, size=n, replace=True)
        Phi_boot = Phi[indices]
        y_boot = y[indices]
        
        # Run selection
        n_features = Phi.shape[1]
        
        if self.selection_method == 'stlsq':
            coefs = self._stlsq_simple(Phi_boot, y_boot)
        else:  # alasso
            coefs = self._alasso_simple(Phi_boot, y_boot)
        
        support = np.abs(coefs) > 1e-10
        return coefs, support
    
    def _stlsq_simple(
        self,
        Phi: np.ndarray,
        y: np.ndarray
    ) -> np.ndarray:
        """
        Simple STLSQ implementation for bootstrap.
        """
        n_features = Phi.shape[1]
        
        # Initial OLS
        try:
            xi, _, _, _ = np.linalg.lstsq(Phi, y, rcond=None)
        except np.linalg.LinAlgError:
            return np.zeros(n_features)
        
        # Iterate
        for _ in range(10):
            support = np.abs(xi) > self.stlsq_threshold
            if np.sum(support) == 0:
                return np.zeros(n_features)
            
            xi_new = np.zeros(n_features)
            try:
                xi_new[support], _, _, _ = np.linalg.lstsq(
                    Phi[:, support], y, rcond=None
                )
            except np.linalg.LinAlgError:
                break
            
            if np.allclose(xi, xi_new):
                break
            xi = xi_new
        
        return xi
    
    def _alasso_simple(
        self,
        Phi: np.ndarray,
        y: np.ndarray
    ) -> np.ndarray:
        """
        Simple Adaptive Lasso for bootstrap.
        """
        from sklearn.linear_model import Ridge, LassoCV
        
        n_features = Phi.shape[1]
        
        # Initial estimate
        ridge = Ridge(alpha=0.1, fit_intercept=False)
        ridge.fit(Phi, y)
        beta_init = ridge.coef_
        
        # Adaptive weights
        eps = 1e-6
        weights = 1.0 / (np.abs(beta_init) + eps)
        
        # Weighted Lasso
        Phi_weighted = Phi / np.sqrt(weights)
        lasso = LassoCV(cv=3, fit_intercept=False, max_iter=1000)
        try:
            lasso.fit(Phi_weighted, y)
            beta = lasso.coef_ / np.sqrt(weights)
        except Exception:
            beta = np.zeros(n_features)
        
        return beta
    
    def _structural_uq(
        self,
        support_samples: np.ndarray,
        library_names: List[str]
    ) -> Tuple[np.ndarray, Dict]:
        """
        Compute inclusion probabilities and classify confidence.
        
        Classification:
        - HIGH: P > 0.9
        - MEDIUM: 0.5 < P <= 0.9
        - LOW: P <= 0.5
        
        Returns
        -------
        Tuple[np.ndarray, Dict]
            (inclusion_probs, structural_confidence)
        """
        inclusion_probs = np.mean(support_samples, axis=0)
        
        structural_conf = {}
        for i, name in enumerate(library_names):
            prob = inclusion_probs[i]
            if prob > 0.9:
                conf = 'HIGH'
            elif prob > 0.5:
                conf = 'MEDIUM'
            else:
                conf = 'LOW'
            structural_conf[name] = {'prob': prob, 'confidence': conf}
        
        return inclusion_probs, structural_conf
    
    def _parametric_uq(
        self,
        coef_samples: np.ndarray,
        support_samples: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Compute coefficient estimates and CIs.
        
        Uses robust estimators (median, MAD-based SE).
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray, np.ndarray]
            (estimates, ci_bounds, se)
        """
        n_features = coef_samples.shape[1]
        estimates = np.zeros(n_features)
        se = np.zeros(n_features)
        ci_bounds = np.zeros((n_features, 2))
        
        alpha = 1 - self.confidence_level
        lower_percentile = 100 * alpha / 2
        upper_percentile = 100 * (1 - alpha / 2)
        
        for j in range(n_features):
            # Get non-zero samples
            nonzero_mask = support_samples[:, j]
            if np.sum(nonzero_mask) < 10:
                continue
            
            samples_j = coef_samples[nonzero_mask, j]
            
            # Robust point estimate (median)
            estimates[j] = np.median(samples_j)
            
            # Robust SE (MAD * 1.4826)
            mad = np.median(np.abs(samples_j - estimates[j]))
            se[j] = mad * 1.4826
            
            # Percentile CI
            ci_bounds[j, 0] = np.percentile(samples_j, lower_percentile)
            ci_bounds[j, 1] = np.percentile(samples_j, upper_percentile)
        
        return estimates, ci_bounds, se
    
    def predictive_uq(
        self,
        X_new: np.ndarray,
        residual_var: float
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Compute prediction intervals.
        
        Total variance = model variance + residual variance
        
        Parameters
        ----------
        X_new : np.ndarray
            New feature matrix
        residual_var : float
            Residual variance from training
            
        Returns
        -------
        Tuple[np.ndarray, np.ndarray, np.ndarray]
            (predictions, pi_lower, pi_upper)
        """
        if not self._uq_complete:
            raise RuntimeError("Must run UQ first")
        
        n_new = X_new.shape[0]
        z = norm.ppf(1 - (1 - self.confidence_level) / 2)
        
        # Predictions from each bootstrap sample
        pred_samples = np.zeros((self.n_bootstrap, n_new))
        for b in range(self.n_bootstrap):
            pred_samples[b] = X_new @ self._coef_samples[b]
        
        # Point prediction (median)
        predictions = np.median(pred_samples, axis=0)
        
        # Model variance
        model_var = np.var(pred_samples, axis=0)
        
        # Total variance
        total_var = model_var + residual_var
        total_se = np.sqrt(total_var)
        
        # Prediction interval
        pi_lower = predictions - z * total_se
        pi_upper = predictions + z * total_se
        
        return predictions, pi_lower, pi_upper
    
    def compute_prediction_intervals(
        self,
        coef_samples: np.ndarray,
        support: np.ndarray,
        X_new: np.ndarray,
        residual_var: float
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Compute prediction intervals (v4.1 alias).
        
        Parameters
        ----------
        coef_samples : np.ndarray
            Bootstrap coefficient samples
        support : np.ndarray
            Support mask (not used, for interface compatibility)
        X_new : np.ndarray
            New feature matrix
        residual_var : float
            Residual variance
            
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (pi_lower, pi_upper)
        """
        _, pi_lower, pi_upper = self.predictive_uq(X_new, residual_var)
        return pi_lower, pi_upper
    
    def get_confidence_classification(self) -> Dict[str, str]:
        """
        Classify terms by inclusion probability.
        """
        if self._inclusion_probs is None:
            raise RuntimeError("Must run UQ first")
        
        classification = {}
        for name, prob in zip(self._library_names, self._inclusion_probs):
            if prob > 0.9:
                classification[name] = 'HIGH'
            elif prob > 0.5:
                classification[name] = 'MEDIUM'
            else:
                classification[name] = 'LOW'
        
        return classification
    
    def print_uq_report(self) -> None:
        """
        Print detailed UQ report in v4.1 format.
        """
        if not self._uq_complete:
            print("UQ not yet performed. Call run() first.")
            return
        
        print("=" * 70)
        print(f"=== Bootstrap UQ Results (B={self.n_bootstrap}) (v4.1) ===")
        print("=" * 70)
        print()
        print(f"Configuration:")
        print(f"  Bootstrap samples: {self.n_bootstrap}")
        print(f"  Selection method: {self.selection_method}")
        print(f"  Confidence level: {self.confidence_level:.0%}")
        print(f"  Parallel jobs (n_jobs): {self.n_jobs}")
        print()
        
        # Layer 1
        print("Layer 1: Structural UQ")
        print(f"  {'Term':<25} {'P(include)':<12} {'Confidence'}")
        print("  " + "-" * 50)
        
        conf_class = self.get_confidence_classification()
        for name, prob in zip(self._library_names, self._inclusion_probs):
            if prob > 0.1:
                print(f"  {name:<25} {prob:<12.2f} {conf_class[name]}")
        print()
        
        # Layer 2
        print("Layer 2: Parametric UQ")
        print(f"  {'Coefficient':<15} {'Estimate':<12} {'95% CI'}")
        print("  " + "-" * 50)
        
        for i, name in enumerate(self._library_names):
            if self._inclusion_probs[i] > 0.5:
                ci_str = f"[{self._ci_lower[i]:.4f}, {self._ci_upper[i]:.4f}]"
                print(f"  {name:<15} {self._estimates[i]:<12.4f} {ci_str}")
        print()
        
        # Layer 3 info
        print("Layer 3: Predictive UQ")
        print(f"  Residual variance: {self._residual_variance:.6f}")
        print(f"  Use predictive_uq(X_new, residual_var) for prediction intervals")
        print()
        print("=" * 70)

print("BootstrapUQ class v4.1 defined.")

In [None]:
# ==============================================================================
# STATISTICAL INFERENCE CLASS (v4.1 Minor Update)
# ==============================================================================

class StatisticalInference:
    """
    Formal Statistical Inference (v4.1 Minor Update).
    
    Provides:
    - Hypothesis testing for term significance
    - P-values and effect sizes
    - Comprehensive statistical report
    
    Attributes
    ----------
    alpha : float
        Significance level (default: 0.05)
    
    Methods
    -------
    test_coefficients(coef_samples, library_names) -> Dict
        Test coefficient significance
    generate_report(...) -> str
        Generate comprehensive statistical report
    
    Examples
    --------
    >>> inference = StatisticalInference(alpha=0.05)
    >>> result = inference.test_coefficients(coef_samples, support_samples, names)
    """
    
    def __init__(self, alpha: float = 0.05):
        """
        Initialize StatisticalInference.
        """
        self.alpha = alpha
        self._test_results = None
    
    def test_coefficients(
        self,
        coef_samples: np.ndarray,
        support_samples: np.ndarray,
        library_names: List[str]
    ) -> Dict[str, Any]:
        """
        Test coefficient significance.
        
        Parameters
        ----------
        coef_samples : np.ndarray
            Bootstrap coefficient samples (n_bootstrap x n_features)
        support_samples : np.ndarray
            Bootstrap support indicators
        library_names : List[str]
            Feature names
            
        Returns
        -------
        Dict
            - p_values: {term_name: p_value}
            - significant: List of significant terms
            - test_statistics: {term_name: t_stat}
            - effect_sizes: {term_name: effect_size}
        """
        n_features = coef_samples.shape[1]
        self._test_results = {}
        
        p_values = {}
        test_statistics = {}
        effect_sizes = {}
        significant = []
        
        for j in range(n_features):
            name = library_names[j]
            
            # Get non-zero samples
            nonzero_mask = support_samples[:, j]
            n_nonzero = np.sum(nonzero_mask)
            
            if n_nonzero < 10:
                self._test_results[name] = {
                    'significant': False,
                    'p_value': 1.0,
                    'z_stat': 0.0,
                    'estimate': 0.0,
                    'se': np.inf,
                    'effect_size': 0.0,
                    'n_samples': int(n_nonzero)
                }
                p_values[name] = 1.0
                test_statistics[name] = 0.0
                effect_sizes[name] = 0.0
                continue
            
            samples_j = coef_samples[nonzero_mask, j]
            result = self._hypothesis_test(samples_j)
            result['n_samples'] = int(n_nonzero)
            self._test_results[name] = result
            
            p_values[name] = result['p_value']
            test_statistics[name] = result['z_stat']
            effect_sizes[name] = result['effect_size']
            
            if result['significant']:
                significant.append(name)
        
        return {
            'p_values': p_values,
            'significant': significant,
            'test_statistics': test_statistics,
            'effect_sizes': effect_sizes,
            'detailed_results': self._test_results
        }
    
    def _hypothesis_test(
        self,
        samples: np.ndarray
    ) -> Dict[str, Any]:
        """
        Perform hypothesis test for single coefficient.
        
        H0: coefficient = 0
        Uses bootstrap distribution for inference.
        """
        # Robust estimates
        estimate = np.median(samples)
        mad = np.median(np.abs(samples - estimate))
        se = mad * 1.4826
        
        # Effect size (Cohen's d analog)
        if se > 1e-10:
            effect_size = abs(estimate) / se
        else:
            effect_size = np.inf if abs(estimate) > 1e-10 else 0.0
        
        if se < 1e-10:
            z_stat = np.inf if abs(estimate) > 1e-10 else 0.0
            p_value = 0.0 if abs(estimate) > 1e-10 else 1.0
        else:
            z_stat = estimate / se
            p_value = 2 * (1 - norm.cdf(abs(z_stat)))
        
        significant = p_value < self.alpha
        
        return {
            'significant': significant,
            'p_value': float(p_value),
            'z_stat': float(z_stat),
            'estimate': float(estimate),
            'se': float(se),
            'effect_size': float(effect_size),
            'stars': self._format_significance_stars(p_value)
        }
    
    def _format_significance_stars(
        self,
        p_value: float
    ) -> str:
        """
        Format significance stars.
        """
        if p_value < 0.001:
            return '***'
        elif p_value < 0.01:
            return '**'
        elif p_value < 0.05:
            return '*'
        elif p_value < 0.1:
            return '.'
        else:
            return ''
    
    def generate_report(
        self,
        equation: str,
        structural_uq: Dict,
        parametric_uq: Dict,
        predictive_uq: Dict,
        physics_verification: Dict,
        model_comparison: Dict,
        library_analysis: Dict = None,
        timing_profile: Dict = None
    ) -> str:
        """
        Generate comprehensive statistical report (v4.1 format).
        """
        lines = []
        lines.append("=" * 70)
        lines.append("PHYSICS-SR STATISTICAL REPORT (v4.1)")
        lines.append("=" * 70)
        lines.append("")
        lines.append(f"1. DISCOVERED EQUATION")
        lines.append("-" * 70)
        lines.append(f"   {equation}")
        lines.append("")
        
        # Additional sections would be added here...
        
        return "\n".join(lines)
    
    def print_inference_report(self) -> None:
        """
        Print inference report in v4.1 format.
        """
        if self._test_results is None:
            print("Test not yet performed. Call test_coefficients() first.")
            return
        
        print("=" * 70)
        print("=== Statistical Inference Results (v4.1) ===")
        print("=" * 70)
        print()
        print(f"Significance level: {self.alpha}")
        print()
        print(f"{'Feature':<25} {'Estimate':<10} {'SE':<10} {'z-stat':<10} {'p-value':<12} {'Sig'}")
        print("-" * 80)
        
        for name, result in self._test_results.items():
            sig_str = result['stars'] if result['significant'] else ''
            print(f"{name:<25} {result['estimate']:<10.4f} {result['se']:<10.4f} "
                  f"{result['z_stat']:<10.2f} {result['p_value']:<12.4f} {sig_str}")
        
        print()
        print("Significance codes: *** p<0.001, ** p<0.01, * p<0.05, . p<0.1")
        print("=" * 70)

print("StatisticalInference class v4.1 defined.")

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 11_UQ_Inference v4.1")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Structural UQ (Inclusion Probabilities)
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Structural UQ")
    
    np.random.seed(42)
    n_samples = 200
    
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    x3 = np.random.randn(n_samples)  # Not in true model
    
    y = 3*x1 + 1.5*x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([np.ones(n_samples), x1, x2, x3])
    library_names = ['1', 'x1', 'x2', 'x3']
    
    print(f"True: y = 3*x1 + 1.5*x2")
    print()
    
    uq = BootstrapUQ(n_bootstrap=50, n_jobs=2)
    result = uq.run(Phi, y, library_names)
    
    print(f"{'Feature':<10} {'P(include)':<12} {'Confidence'}")
    print("-" * 35)
    for name, prob in zip(library_names, result['inclusion_probs']):
        conf = result['confidence_class'][name]
        print(f"{name:<10} {prob:<12.3f} {conf}")
    
    # x1 and x2 should have high inclusion, x3 should be low
    if (result['inclusion_probs'][1] > 0.8 and 
        result['inclusion_probs'][2] > 0.8 and
        result['inclusion_probs'][3] < 0.5):
        print("\n[PASS] Structural UQ correctly identifies important terms")
    else:
        print("\n[INFO] Check bootstrap sample size")

In [None]:
# ==============================================================================
# TEST 2: Parametric UQ (Coefficient CIs)
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Parametric UQ - Coefficient CIs")
    
    np.random.seed(42)
    n_samples = 200
    
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    true_coefs = [0, 3.0, 1.5]  # intercept=0, x1=3, x2=1.5
    y = 3*x1 + 1.5*x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([np.ones(n_samples), x1, x2])
    library_names = ['1', 'x1', 'x2']
    
    uq = BootstrapUQ(n_bootstrap=100, n_jobs=2)
    result = uq.run(Phi, y, library_names)
    
    print(f"{'Feature':<10} {'True':<10} {'Estimate':<10} {'95% CI':<25} {'Covered'}")
    print("-" * 70)
    
    for i, (name, true) in enumerate(zip(library_names, true_coefs)):
        est = result['estimates'][i]
        ci_l = result['ci_lower'][i]
        ci_u = result['ci_upper'][i]
        covered = ci_l <= true <= ci_u
        print(f"{name:<10} {true:<10.2f} {est:<10.4f} [{ci_l:.4f}, {ci_u:.4f}] {covered}")

In [None]:
# ==============================================================================
# TEST 3: Parallel vs Sequential Bootstrap (v4.1)
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Parallel vs Sequential Bootstrap (v4.1)")
    
    import time
    
    np.random.seed(42)
    n_samples = 200
    
    x1 = np.random.randn(n_samples)
    y = 2*x1 + 0.1*np.random.randn(n_samples)
    Phi = np.column_stack([np.ones(n_samples), x1])
    library_names = ['1', 'x1']
    
    # Sequential
    start = time.time()
    uq_seq = BootstrapUQ(n_bootstrap=50, n_jobs=1)
    result_seq = uq_seq.run(Phi, y, library_names)
    time_seq = time.time() - start
    
    # Parallel
    start = time.time()
    uq_par = BootstrapUQ(n_bootstrap=50, n_jobs=2)
    result_par = uq_par.run(Phi, y, library_names)
    time_par = time.time() - start
    
    print(f"Sequential (n_jobs=1): {time_seq:.2f}s")
    print(f"Parallel (n_jobs=2):   {time_par:.2f}s")
    print(f"Speedup: {time_seq/time_par:.2f}x")
    print()
    
    # Results should be similar
    diff = np.abs(result_seq['estimates'] - result_par['estimates']).max()
    print(f"Max estimate difference: {diff:.6f}")
    
    if diff < 0.1:
        print("[PASS] Parallel and sequential give similar results")
    else:
        print("[INFO] Some variance expected due to different random seeds")

In [None]:
# ==============================================================================
# TEST 4: Full UQ Report
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: Full UQ Report")
    
    np.random.seed(42)
    n_samples = 200
    
    x = np.random.uniform(0.1, 2, n_samples)
    z = np.random.uniform(0.1, 2, n_samples)
    y = 0.5*x**2 + np.sin(z) + 0.01*np.random.randn(n_samples)
    
    Phi = np.column_stack([x**2, np.sin(z), np.ones(n_samples), x, z])
    library_names = ['[PySR] x**2', '[PySR] sin(z)', '[Poly] 1', '[Poly] x', '[Poly] z']
    
    uq = BootstrapUQ(n_bootstrap=100, n_jobs=2)
    result = uq.run(Phi, y, library_names)
    
    uq.print_uq_report()

In [None]:
# ==============================================================================
# TEST 5: Statistical Inference
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 5: Statistical Inference")
    
    np.random.seed(42)
    n_samples = 200
    
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    x3 = np.random.randn(n_samples)  # Not in true model
    
    y = 5*x1 + 2*x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([x1, x2, x3])
    library_names = ['x1', 'x2', 'x3']
    
    # Run UQ
    uq = BootstrapUQ(n_bootstrap=100, n_jobs=2)
    result = uq.run(Phi, y, library_names)
    
    # Statistical inference
    inference = StatisticalInference(alpha=0.05)
    test_results = inference.test_coefficients(
        result['coef_samples'],
        result['support_samples'],
        library_names
    )
    
    inference.print_inference_report()
    
    # x1, x2 should be significant, x3 should not
    if ('x1' in test_results['significant'] and 
        'x2' in test_results['significant'] and 
        'x3' not in test_results['significant']):
        print("\n[PASS] Correct significance detection")
    else:
        print("\n[INFO] Check significance results")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 11_UQ_Inference.ipynb v4.1 - Module Summary")
print("=" * 70)
print()
print("CLASSES:")
print("-" * 70)
print()
print("1. BootstrapUQ (v4.1 Minor Update)")
print("   Purpose: Three-layer uncertainty quantification")
print("   ")
print("   v4.1 Features:")
print("     - n_jobs parameter for parallel bootstrap")
print("     - Enhanced v4.1 report format")
print("   ")
print("   Main Methods:")
print("     run(Phi, y, library_names, selection_kwargs) -> Dict")
print("     predictive_uq(X_new, residual_var) -> Tuple")
print("     compute_prediction_intervals(...) -> Tuple")
print("     get_confidence_classification() -> Dict")
print("     print_uq_report()")
print()
print("2. StatisticalInference")
print("   Purpose: Hypothesis testing for coefficients")
print("   Main Methods:")
print("     test_coefficients(coef_samples, support_samples, names) -> Dict")
print("     generate_report(...) -> str")
print("     print_inference_report()")
print()
print("Expected Output Format (v4.1):")
print("-" * 70)
print("""
=== Bootstrap UQ Results (B=200) (v4.1) ===

Layer 1: Structural UQ
  Term                      P(include)   Confidence
  x**2                      0.98         HIGH
  sin(z)                    0.95         HIGH
  Intercept                 0.15         LOW

Layer 2: Parametric UQ
  Coefficient      Estimate     95% CI
  c1               0.498        [0.42, 0.58]
  c2               0.998        [0.89, 1.10]

Layer 3: Predictive UQ
  PI Coverage: 94.5% (target: 95%)
""")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Run bootstrap UQ with parallel execution (v4.1)
uq = BootstrapUQ(n_bootstrap=200, n_jobs=2)
result = uq.run(Phi, y, library_names)

# Check inclusion probabilities
print(result['inclusion_probabilities'])

# Get coefficient CIs
print(f"Estimates: {result['coefficient_estimates']}")
print(f"95% CI: {result['coefficient_CI']}")

# Prediction intervals
pred, pi_lower, pi_upper = uq.predictive_uq(X_new, result['residual_variance'])

# Statistical inference
inference = StatisticalInference()
tests = inference.test_coefficients(
    result['coef_samples'], result['support_samples'], library_names
)
inference.print_inference_report()
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 11_UQ_Inference.ipynb")
print("=" * 70)