# 11_UQ_Inference - Physics-SR Framework v3.0

## Stage 3.3-3.4: Three-Layer Bootstrap UQ + Statistical Inference

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

Comprehensive uncertainty quantification for discovered equations with three layers:

1. **Structural UQ:** Which terms should be included? (Bootstrap inclusion probability)
2. **Parametric UQ:** What are the coefficient values? (Bootstrap confidence intervals)
3. **Predictive UQ:** How uncertain are predictions? (Prediction intervals)

### Theoretical Foundation

**Stability Selection (Meinshausen & Buhlmann 2010):**
$$E[\text{False Positives}] \leq \frac{q^2}{(2\pi_{thr} - 1) \cdot p}$$

where $q$ = average selected terms, $\pi_{thr}$ = inclusion threshold, $p$ = total terms.

### Robust Estimators

- **Point estimate:** Median (not mean) for robustness to outliers
- **Standard error:** MAD $\times$ 1.4826 (consistent with SD for normal data)

### Reference

- Meinshausen, N., & Buhlmann, P. (2010). Stability selection. *JRSS-B*, 72(4), 417-473.

---
## Section 1: Header and Imports

In [None]:
"""
11_UQ_Inference.ipynb - Three-Layer Bootstrap UQ + Statistical Inference
=========================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- BootstrapUQ: Three-layer uncertainty quantification
- StatisticalInference: Hypothesis testing for coefficients
- Structural, parametric, and predictive uncertainty
- Robust estimators (median, MAD) for non-normal data

Algorithm:
    1. Bootstrap resampling (B samples)
    2. Structural UQ: Inclusion probabilities
    3. Parametric UQ: Coefficient confidence intervals
    4. Predictive UQ: Prediction intervals
    5. Statistical inference: Hypothesis tests

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for UQ
from scipy import stats
from scipy.stats import norm
from typing import Dict, List, Tuple, Optional, Any

print("11_UQ_Inference: Additional imports successful.")

---
## Section 2: Class Definitions

In [None]:
# ==============================================================================
# BOOTSTRAP UQ CLASS
# ==============================================================================

class BootstrapUQ:
    """
    Three-Layer Bootstrap Uncertainty Quantification.
    
    Provides comprehensive UQ for symbolic regression:
    - Layer 1: Structural uncertainty (inclusion probabilities)
    - Layer 2: Parametric uncertainty (coefficient CIs)
    - Layer 3: Predictive uncertainty (prediction intervals)
    
    Uses robust estimators (median, MAD) for non-normal bootstrap distributions.
    
    Attributes
    ----------
    n_bootstrap : int
        Number of bootstrap samples (default: 200)
    selection_method : str
        'stlsq' or 'alasso' (default: 'stlsq')
    confidence_level : float
        Confidence level for CIs (default: 0.95)
    stlsq_threshold : float
        STLSQ sparsity threshold (default: 0.1)
    
    Examples
    --------
    >>> uq = BootstrapUQ(n_bootstrap=200)
    >>> result = uq.run(Phi, y, feature_names)
    >>> print(result['inclusion_probs'])
    """
    
    def __init__(
        self,
        n_bootstrap: int = DEFAULT_N_BOOTSTRAP,
        selection_method: str = 'stlsq',
        confidence_level: float = DEFAULT_CONFIDENCE_LEVEL,
        stlsq_threshold: float = DEFAULT_STLSQ_THRESHOLD
    ):
        """
        Initialize BootstrapUQ.
        
        Parameters
        ----------
        n_bootstrap : int
            Number of bootstrap samples. Default: 200
        selection_method : str
            'stlsq' or 'alasso'. Default: 'stlsq'
        confidence_level : float
            Confidence level (e.g., 0.95 for 95% CI). Default: 0.95
        stlsq_threshold : float
            STLSQ threshold for sparsity. Default: 0.1
        """
        self.n_bootstrap = n_bootstrap
        self.selection_method = selection_method
        self.confidence_level = confidence_level
        self.stlsq_threshold = stlsq_threshold
        
        # Internal state
        self._feature_names = None
        self._n_features = None
        self._support_samples = None
        self._coef_samples = None
        self._inclusion_probs = None
        self._estimates = None
        self._ci_lower = None
        self._ci_upper = None
        self._se = None
        self._uq_complete = False
    
    def run(
        self,
        feature_library: np.ndarray,
        y: np.ndarray,
        feature_names: List[str] = None
    ) -> Dict[str, Any]:
        """
        Run full bootstrap UQ.
        
        Parameters
        ----------
        feature_library : np.ndarray
            Feature matrix of shape (n_samples, n_features)
        y : np.ndarray
            Target vector
        feature_names : List[str], optional
            Feature names
        
        Returns
        -------
        Dict[str, Any]
            Dictionary containing:
            - inclusion_probs: Inclusion probability for each term
            - confidence_class: Classification (high/medium/low)
            - estimates: Point estimates (median)
            - ci_lower: Lower CI bounds
            - ci_upper: Upper CI bounds
            - se: Standard errors
            - support_samples: Bootstrap support matrix
            - coef_samples: Bootstrap coefficient matrix
        """
        n_samples, n_features = feature_library.shape
        self._n_features = n_features
        
        if feature_names is None:
            self._feature_names = [f'f{i}' for i in range(n_features)]
        else:
            self._feature_names = list(feature_names)
        
        # Bootstrap sampling
        self._support_samples = np.zeros((self.n_bootstrap, n_features), dtype=bool)
        self._coef_samples = np.zeros((self.n_bootstrap, n_features))
        
        for b in range(self.n_bootstrap):
            # Resample
            Phi_boot, y_boot = self._bootstrap_sample(feature_library, y, b)
            
            # Run selection
            coefs, support = self._run_selection(Phi_boot, y_boot)
            
            self._support_samples[b] = support
            self._coef_samples[b] = coefs
        
        # Layer 1: Structural UQ
        self._inclusion_probs = self._structural_uq(self._support_samples)
        
        # Layer 2: Parametric UQ
        self._estimates, ci_bounds, self._se = self._parametric_uq(
            self._coef_samples, self._support_samples
        )
        self._ci_lower = ci_bounds[:, 0]
        self._ci_upper = ci_bounds[:, 1]
        
        # Confidence classification
        confidence_class = self.get_confidence_classification()
        
        self._uq_complete = True
        
        return {
            'inclusion_probs': self._inclusion_probs,
            'confidence_class': confidence_class,
            'estimates': self._estimates,
            'ci_lower': self._ci_lower,
            'ci_upper': self._ci_upper,
            'se': self._se,
            'support_samples': self._support_samples,
            'coef_samples': self._coef_samples,
            'n_bootstrap': self.n_bootstrap,
            'confidence_level': self.confidence_level
        }
    
    def _bootstrap_sample(
        self,
        X: np.ndarray,
        y: np.ndarray,
        seed: int
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Generate bootstrap sample.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        y : np.ndarray
            Target vector
        seed : int
            Random seed
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (X_boot, y_boot)
        """
        np.random.seed(RANDOM_SEED + seed)
        n = len(y)
        indices = np.random.choice(n, size=n, replace=True)
        return X[indices], y[indices]
    
    def _run_selection(
        self,
        Phi: np.ndarray,
        y: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Run sparse selection on bootstrap sample.
        
        Parameters
        ----------
        Phi : np.ndarray
            Feature matrix
        y : np.ndarray
            Target vector
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray]
            (coefficients, support)
        """
        n_features = Phi.shape[1]
        
        if self.selection_method == 'stlsq':
            coefs = self._stlsq_simple(Phi, y)
        else:  # alasso
            coefs = self._alasso_simple(Phi, y)
        
        support = np.abs(coefs) > 1e-10
        return coefs, support
    
    def _stlsq_simple(
        self,
        Phi: np.ndarray,
        y: np.ndarray
    ) -> np.ndarray:
        """
        Simple STLSQ implementation for bootstrap.
        """
        n_features = Phi.shape[1]
        
        # Initial OLS
        try:
            xi, _, _, _ = np.linalg.lstsq(Phi, y, rcond=None)
        except np.linalg.LinAlgError:
            return np.zeros(n_features)
        
        # Iterate
        for _ in range(10):
            support = np.abs(xi) > self.stlsq_threshold
            if np.sum(support) == 0:
                return np.zeros(n_features)
            
            xi_new = np.zeros(n_features)
            try:
                xi_new[support], _, _, _ = np.linalg.lstsq(
                    Phi[:, support], y, rcond=None
                )
            except np.linalg.LinAlgError:
                break
            
            if np.allclose(xi, xi_new):
                break
            xi = xi_new
        
        return xi
    
    def _alasso_simple(
        self,
        Phi: np.ndarray,
        y: np.ndarray
    ) -> np.ndarray:
        """
        Simple Adaptive Lasso for bootstrap.
        """
        from sklearn.linear_model import Ridge, LassoCV
        
        n_features = Phi.shape[1]
        
        # Initial estimate
        ridge = Ridge(alpha=0.1, fit_intercept=False)
        ridge.fit(Phi, y)
        beta_init = ridge.coef_
        
        # Adaptive weights
        eps = 1e-6
        weights = 1.0 / (np.abs(beta_init) + eps)
        
        # Weighted Lasso
        Phi_weighted = Phi / np.sqrt(weights)
        lasso = LassoCV(cv=3, fit_intercept=False, max_iter=1000)
        try:
            lasso.fit(Phi_weighted, y)
            beta = lasso.coef_ / np.sqrt(weights)
        except Exception:
            beta = np.zeros(n_features)
        
        return beta
    
    def _structural_uq(
        self,
        support_samples: np.ndarray
    ) -> np.ndarray:
        """
        Layer 1: Structural Uncertainty via Bootstrap Inclusion Probability.
        
        P_j = (1/B) * sum_b I[j in support_b]
        
        Classification:
            P_j > 0.9: High confidence (definitely include)
            0.5 < P_j <= 0.9: Medium confidence (probably include)
            P_j <= 0.5: Low confidence (consider excluding)
        
        Parameters
        ----------
        support_samples : np.ndarray
            Boolean array of shape (B, n_features)
        
        Returns
        -------
        np.ndarray
            Inclusion probabilities
        """
        return np.mean(support_samples, axis=0)
    
    def _parametric_uq(
        self,
        coef_samples: np.ndarray,
        support_samples: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Layer 2: Parametric Uncertainty via Bootstrap Coefficient Distribution.
        
        Uses robust estimators:
            - Point estimate: median (not mean)
            - Standard error: MAD * 1.4826 (not SD)
        
        Parameters
        ----------
        coef_samples : np.ndarray
            Coefficient samples of shape (B, n_features)
        support_samples : np.ndarray
            Support indicators of shape (B, n_features)
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray, np.ndarray]
            (estimates, ci_bounds, se)
        """
        n_features = coef_samples.shape[1]
        estimates = np.zeros(n_features)
        se = np.zeros(n_features)
        ci_bounds = np.zeros((n_features, 2))
        
        alpha = 1 - self.confidence_level
        lower_percentile = 100 * alpha / 2
        upper_percentile = 100 * (1 - alpha / 2)
        
        for j in range(n_features):
            # Get non-zero samples
            nonzero_mask = support_samples[:, j]
            if np.sum(nonzero_mask) < 10:
                continue
            
            samples_j = coef_samples[nonzero_mask, j]
            
            # Robust point estimate (median)
            estimates[j] = np.median(samples_j)
            
            # Robust SE (MAD * 1.4826)
            mad = np.median(np.abs(samples_j - estimates[j]))
            se[j] = mad * 1.4826
            
            # Percentile CI
            ci_bounds[j, 0] = np.percentile(samples_j, lower_percentile)
            ci_bounds[j, 1] = np.percentile(samples_j, upper_percentile)
        
        return estimates, ci_bounds, se
    
    def predictive_uq(
        self,
        X_new: np.ndarray,
        residual_var: float
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Layer 3: Predictive Uncertainty for new inputs.
        
        Total variance = model variance + residual variance
        
        Parameters
        ----------
        X_new : np.ndarray
            New feature matrix
        residual_var : float
            Residual variance from training
        
        Returns
        -------
        Tuple[np.ndarray, np.ndarray, np.ndarray]
            (predictions, pi_lower, pi_upper)
        """
        if not self._uq_complete:
            raise ValueError("Must run UQ first")
        
        n_new = X_new.shape[0]
        z = norm.ppf(1 - (1 - self.confidence_level) / 2)
        
        # Predictions from each bootstrap sample
        pred_samples = np.zeros((self.n_bootstrap, n_new))
        for b in range(self.n_bootstrap):
            pred_samples[b] = X_new @ self._coef_samples[b]
        
        # Point prediction (median)
        predictions = np.median(pred_samples, axis=0)
        
        # Model variance
        model_var = np.var(pred_samples, axis=0)
        
        # Total variance
        total_var = model_var + residual_var
        total_se = np.sqrt(total_var)
        
        # Prediction interval
        pi_lower = predictions - z * total_se
        pi_upper = predictions + z * total_se
        
        return predictions, pi_lower, pi_upper
    
    def get_confidence_classification(
        self
    ) -> Dict[str, str]:
        """
        Classify terms by inclusion probability.
        
        Returns
        -------
        Dict[str, str]
            Mapping from feature name to classification
        """
        if self._inclusion_probs is None:
            raise ValueError("Must run UQ first")
        
        classification = {}
        for name, prob in zip(self._feature_names, self._inclusion_probs):
            if prob > 0.9:
                classification[name] = 'high'
            elif prob > 0.5:
                classification[name] = 'medium'
            else:
                classification[name] = 'low'
        
        return classification
    
    def print_uq_report(self) -> None:
        """
        Print detailed UQ report.
        """
        if not self._uq_complete:
            print("UQ not yet performed. Call run() first.")
            return
        
        print("=" * 70)
        print(" Bootstrap Uncertainty Quantification Results")
        print("=" * 70)
        print()
        print(f"Configuration:")
        print(f"  Bootstrap samples: {self.n_bootstrap}")
        print(f"  Selection method: {self.selection_method}")
        print(f"  Confidence level: {self.confidence_level:.0%}")
        print()
        print("-" * 70)
        print(" Layer 1: Structural Uncertainty (Inclusion Probabilities)")
        print("-" * 70)
        print(f"  {'Feature':<25} {'P(include)':<12} {'Class':<10}")
        print("  " + "-" * 50)
        
        conf_class = self.get_confidence_classification()
        for name, prob in zip(self._feature_names, self._inclusion_probs):
            if prob > 0.1:  # Only show non-trivial
                print(f"  {name:<25} {prob:<12.3f} {conf_class[name]:<10}")
        
        print()
        print("-" * 70)
        print(" Layer 2: Parametric Uncertainty (Coefficient CIs)")
        print("-" * 70)
        print(f"  {'Feature':<20} {'Estimate':<12} {'SE':<10} {'95% CI':<20}")
        print("  " + "-" * 65)
        
        for i, name in enumerate(self._feature_names):
            if self._inclusion_probs[i] > 0.5:  # Only show selected
                ci_str = f"[{self._ci_lower[i]:.4f}, {self._ci_upper[i]:.4f}]"
                print(f"  {name:<20} {self._estimates[i]:<12.4f} "
                      f"{self._se[i]:<10.4f} {ci_str:<20}")
        
        print()
        print("=" * 70)

In [None]:
# ==============================================================================
# STATISTICAL INFERENCE CLASS
# ==============================================================================

class StatisticalInference:
    """
    Statistical Inference for Symbolic Regression Coefficients.
    
    Provides hypothesis testing for coefficients:
    - H0: coefficient = 0
    - H1: coefficient != 0
    
    Attributes
    ----------
    alpha : float
        Significance level (default: 0.05)
    
    Examples
    --------
    >>> inference = StatisticalInference(alpha=0.05)
    >>> result = inference.test_coefficients(coef_samples, feature_names)
    """
    
    def __init__(self, alpha: float = 0.05):
        """
        Initialize StatisticalInference.
        
        Parameters
        ----------
        alpha : float
            Significance level. Default: 0.05
        """
        self.alpha = alpha
        self._test_results = None
    
    def test_coefficients(
        self,
        coef_samples: np.ndarray,
        support_samples: np.ndarray,
        feature_names: List[str]
    ) -> Dict[str, Any]:
        """
        Test significance of each coefficient.
        
        Parameters
        ----------
        coef_samples : np.ndarray
            Bootstrap coefficient samples
        support_samples : np.ndarray
            Bootstrap support indicators
        feature_names : List[str]
            Feature names
        
        Returns
        -------
        Dict[str, Any]
            Test results for each feature
        """
        n_features = coef_samples.shape[1]
        self._test_results = {}
        
        for j in range(n_features):
            name = feature_names[j]
            
            # Get non-zero samples
            nonzero_mask = support_samples[:, j]
            n_nonzero = np.sum(nonzero_mask)
            
            if n_nonzero < 10:
                self._test_results[name] = {
                    'significant': False,
                    'p_value': 1.0,
                    'z_stat': 0.0,
                    'estimate': 0.0,
                    'se': np.inf,
                    'n_samples': int(n_nonzero)
                }
                continue
            
            samples_j = coef_samples[nonzero_mask, j]
            result = self._hypothesis_test(samples_j)
            result['n_samples'] = int(n_nonzero)
            self._test_results[name] = result
        
        return self._test_results
    
    def _hypothesis_test(
        self,
        samples: np.ndarray
    ) -> Dict[str, Any]:
        """
        Perform hypothesis test H0: mu = 0.
        
        Parameters
        ----------
        samples : np.ndarray
            Bootstrap samples
        
        Returns
        -------
        Dict
            Test result
        """
        # Robust estimates
        estimate = np.median(samples)
        mad = np.median(np.abs(samples - estimate))
        se = mad * 1.4826
        
        if se < 1e-10:
            # Zero SE - all samples identical
            z_stat = np.inf if abs(estimate) > 1e-10 else 0.0
            p_value = 0.0 if abs(estimate) > 1e-10 else 1.0
        else:
            z_stat = estimate / se
            p_value = 2 * (1 - norm.cdf(abs(z_stat)))
        
        significant = p_value < self.alpha
        
        return {
            'significant': significant,
            'p_value': float(p_value),
            'z_stat': float(z_stat),
            'estimate': float(estimate),
            'se': float(se),
            'stars': self._format_significance_stars(p_value)
        }
    
    def _format_significance_stars(
        self,
        p_value: float
    ) -> str:
        """
        Format significance stars.
        
        Parameters
        ----------
        p_value : float
            P-value
        
        Returns
        -------
        str
            Significance stars
        """
        if p_value < 0.001:
            return '***'
        elif p_value < 0.01:
            return '**'
        elif p_value < 0.05:
            return '*'
        elif p_value < 0.1:
            return '.'
        else:
            return ''
    
    def print_inference_report(self) -> None:
        """
        Print inference report.
        """
        if self._test_results is None:
            print("No test results. Call test_coefficients() first.")
            return
        
        print("=" * 70)
        print(" Statistical Inference Results")
        print("=" * 70)
        print()
        print(f"Significance level: alpha = {self.alpha}")
        print()
        print(f"{'Feature':<20} {'Estimate':<10} {'SE':<10} {'z':<8} {'p-value':<10} {'Sig'}")
        print("-" * 70)
        
        for name, result in self._test_results.items():
            if result['n_samples'] >= 10:
                print(f"{name:<20} {result['estimate']:<10.4f} {result['se']:<10.4f} "
                      f"{result['z_stat']:<8.2f} {result['p_value']:<10.4f} {result['stars']}")
        
        print()
        print("Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1")
        print("=" * 70)

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 11_UQ_Inference")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Structural UQ - Inclusion Probabilities
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Structural UQ - Inclusion Probabilities")
    
    np.random.seed(42)
    n_samples = 200
    
    # True model: y = 2*x1 + x2
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    x3 = np.random.randn(n_samples)  # noise
    
    y = 2*x1 + x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([np.ones(n_samples), x1, x2, x3])
    feature_names = ['1', 'x1', 'x2', 'x3']
    
    print(f"True model: y = 2*x1 + x2")
    print(f"Expected: x1, x2 have high inclusion prob, x3 has low")
    print()
    
    uq = BootstrapUQ(n_bootstrap=50, stlsq_threshold=0.1)
    result = uq.run(Phi, y, feature_names)
    
    print(f"Inclusion probabilities:")
    for name, prob in zip(feature_names, result['inclusion_probs']):
        print(f"  {name}: {prob:.3f}")
    
    # Check x1, x2 have higher prob than x3
    if (result['inclusion_probs'][1] > 0.8 and 
        result['inclusion_probs'][2] > 0.8 and
        result['inclusion_probs'][3] < 0.5):
        print("\n[PASS] True terms have high inclusion probability")
    else:
        print("\n[INFO] Check inclusion probabilities")

In [None]:
# ==============================================================================
# TEST 2: Parametric UQ - Coefficient CIs
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Parametric UQ - Coefficient CIs")
    
    np.random.seed(42)
    n_samples = 300
    
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    
    # True: y = 3*x1 + 1.5*x2
    true_coefs = [0, 3.0, 1.5]  # intercept=0
    y = 3*x1 + 1.5*x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([np.ones(n_samples), x1, x2])
    feature_names = ['1', 'x1', 'x2']
    
    print(f"True coefficients: 1=0, x1=3.0, x2=1.5")
    print()
    
    uq = BootstrapUQ(n_bootstrap=100)
    result = uq.run(Phi, y, feature_names)
    
    print(f"{'Feature':<10} {'True':<10} {'Estimate':<10} {'95% CI':<25} {'Covered'}")
    print("-" * 70)
    
    for i, (name, true) in enumerate(zip(feature_names, true_coefs)):
        est = result['estimates'][i]
        ci_l = result['ci_lower'][i]
        ci_u = result['ci_upper'][i]
        covered = ci_l <= true <= ci_u
        print(f"{name:<10} {true:<10.2f} {est:<10.4f} [{ci_l:.4f}, {ci_u:.4f}] {covered}")

In [None]:
# ==============================================================================
# TEST 3: Predictive UQ - Prediction Intervals
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Predictive UQ - Prediction Intervals")
    
    np.random.seed(42)
    n_train = 200
    n_test = 100
    
    # Training data
    x1_train = np.random.randn(n_train)
    y_train = 2*x1_train + 0.5*np.random.randn(n_train)
    Phi_train = np.column_stack([np.ones(n_train), x1_train])
    
    # Test data
    x1_test = np.random.randn(n_test)
    y_test = 2*x1_test + 0.5*np.random.randn(n_test)
    Phi_test = np.column_stack([np.ones(n_test), x1_test])
    
    feature_names = ['1', 'x1']
    
    # Run UQ
    uq = BootstrapUQ(n_bootstrap=100)
    result = uq.run(Phi_train, y_train, feature_names)
    
    # Compute residual variance
    y_train_pred = Phi_train @ result['estimates']
    residual_var = np.var(y_train - y_train_pred)
    
    # Predictive UQ
    pred, pi_lower, pi_upper = uq.predictive_uq(Phi_test, residual_var)
    
    # Check coverage
    in_interval = (y_test >= pi_lower) & (y_test <= pi_upper)
    coverage = np.mean(in_interval)
    
    print(f"Expected coverage: 95%")
    print(f"Actual coverage: {coverage:.1%}")
    print(f"Residual variance: {residual_var:.4f}")
    
    if 0.85 <= coverage <= 0.99:
        print("[PASS] Coverage near nominal level")
    else:
        print(f"[INFO] Coverage: {coverage:.1%}")

In [None]:
# ==============================================================================
# TEST 4: Hypothesis Testing
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: Hypothesis Testing")
    
    np.random.seed(42)
    n_samples = 200
    
    x1 = np.random.randn(n_samples)
    x2 = np.random.randn(n_samples)
    x3 = np.random.randn(n_samples)  # Not in true model
    
    y = 5*x1 + 2*x2 + 0.1*np.random.randn(n_samples)
    
    Phi = np.column_stack([x1, x2, x3])
    feature_names = ['x1', 'x2', 'x3']
    
    print(f"True: y = 5*x1 + 2*x2 (x3 has zero coefficient)")
    print()
    
    # Run UQ
    uq = BootstrapUQ(n_bootstrap=100)
    result = uq.run(Phi, y, feature_names)
    
    # Statistical inference
    inference = StatisticalInference(alpha=0.05)
    test_results = inference.test_coefficients(
        result['coef_samples'],
        result['support_samples'],
        feature_names
    )
    
    inference.print_inference_report()
    
    # x1, x2 should be significant, x3 should not
    if (test_results['x1']['significant'] and 
        test_results['x2']['significant'] and 
        not test_results['x3']['significant']):
        print("\n[PASS] Correct significance detection")
    else:
        print("\n[INFO] Check significance results")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 11_UQ_Inference.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASSES:")
print("-" * 70)
print()
print("1. BootstrapUQ")
print("   Purpose: Three-layer uncertainty quantification")
print("   Main Methods:")
print("     run(Phi, y, feature_names) - Run full bootstrap UQ")
print("     predictive_uq(X_new, residual_var) - Prediction intervals")
print("     get_confidence_classification() - Term classification")
print("     print_uq_report() - Detailed report")
print()
print("2. StatisticalInference")
print("   Purpose: Hypothesis testing for coefficients")
print("   Main Methods:")
print("     test_coefficients(coef_samples, support_samples, names)")
print("     print_inference_report() - Significance table")
print()
print("Three Layers of UQ:")
print("  Layer 1: Structural - Which terms to include?")
print("  Layer 2: Parametric - What are coefficient values?")
print("  Layer 3: Predictive - How uncertain are predictions?")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Run bootstrap UQ
uq = BootstrapUQ(n_bootstrap=200)
result = uq.run(Phi, y, feature_names)

# Check inclusion probabilities
print(result['inclusion_probs'])

# Get coefficient CIs
print(f"Estimates: {result['estimates']}")
print(f"95% CI: [{result['ci_lower']}, {result['ci_upper']}]")

# Statistical inference
inference = StatisticalInference()
tests = inference.test_coefficients(
    result['coef_samples'], result['support_samples'], feature_names
)
inference.print_inference_report()
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 11_UQ_Inference.ipynb")
print("=" * 70)