# 10_PhysicsVerification - Physics-SR Framework v3.0

## Stage 3.2: Dimensional Consistency + Physical Bounds Check

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

Verify that discovered equations satisfy physical constraints:
1. **Dimensional Consistency:** All terms have the same dimensions as the target
2. **Physical Bounds:** Predictions satisfy physical constraints (e.g., non-negativity)

### Dimensional Analysis

For each term in the equation, compute dimensional exponents:
$$\dim(\text{term}) = \sum_j \text{exponent}_j \times \dim(\text{variable}_j)$$

All terms must have the same dimensions, matching the target variable.

### Tolerance for Non-Integer Exponents

PySR may find exponents like 2.47 when the true value is 2.5. We use:
- `atol = 0.05` for dimensional comparison
- This accepts [0, -3.02, 0, 0] $\approx$ [0, -3, 0, 0]
- But rejects [0, -3.5, 0, 0] $\neq$ [0, -3, 0, 0]

### Reference

- Framework Section 5.2: Physics Verification
- Framework Section 8.4: Dimensional Tolerance

---
## Section 1: Header and Imports

In [None]:
"""
10_PhysicsVerification.ipynb - Dimensional Consistency + Physical Bounds
=========================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- PhysicsVerifier: Verify dimensional consistency and physical bounds
- Term-by-term dimensional analysis
- Physical bounds violation detection
- Tolerance for non-integer exponents (atol=0.05)

Algorithm:
    1. Parse equation into terms
    2. Compute dimensions for each term
    3. Check consistency across terms and with target
    4. Check physical bounds violations on predictions

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for Physics Verification
import re
from typing import Dict, List, Tuple, Optional, Any

print("10_PhysicsVerification: Additional imports successful.")

---
## Section 2: Class Definition

In [None]:
# ==============================================================================
# PHYSICS VERIFIER CLASS
# ==============================================================================

class PhysicsVerifier:
    """
    Physics Verification for Symbolic Regression.
    
    Verifies that discovered equations satisfy:
    1. Dimensional consistency - all terms have same dimensions
    2. Physical bounds - predictions within physical constraints
    
    Uses tolerance for non-integer exponents (PySR may find 2.47 vs 2.5).
    
    Attributes
    ----------
    dim_tolerance : float
        Tolerance for dimensional comparison (default: 0.05)
    bounds_tolerance : float
        Tolerance for bound violations (default: 0.01)
    
    Examples
    --------
    >>> verifier = PhysicsVerifier()
    >>> result = verifier.verify(terms, var_dims, target_dims, y_pred, bounds)
    >>> print(f"Dimensionally consistent: {result['dim_consistent']}")
    """
    
    def __init__(
        self,
        dim_tolerance: float = 0.05,
        bounds_tolerance: float = 0.01
    ):
        """
        Initialize PhysicsVerifier.
        
        Parameters
        ----------
        dim_tolerance : float
            Tolerance for dimensional exponent comparison.
            Default: 0.05 (accepts 2.47 ~ 2.5)
        bounds_tolerance : float
            Fraction of violations allowed before flagging.
            Default: 0.01 (1% violations OK)
        """
        self.dim_tolerance = dim_tolerance
        self.bounds_tolerance = bounds_tolerance
        
        # Internal state
        self._dim_result = None
        self._bounds_result = None
        self._verification_complete = False
    
    def verify(
        self,
        equation_terms: List[Dict],
        variable_dims: Dict[str, List[float]],
        target_dims: List[float],
        y_pred: np.ndarray = None,
        physical_bounds: Dict[str, float] = None
    ) -> Dict[str, Any]:
        """
        Verify equation satisfies physics constraints.
        
        Parameters
        ----------
        equation_terms : List[Dict]
            List of term structures, each with:
            - 'variables': Dict[str, float] mapping var name to exponent
            - 'coefficient': float
            - 'name': str
        variable_dims : Dict[str, List[float]]
            Dimensions for each variable, e.g.:
            {'q_c': [0, 0, 0, 0], 'N_d': [0, -3, 0, 0]}
            Format: [M, L, T, Theta] (mass, length, time, temperature)
        target_dims : List[float]
            Target variable dimensions, e.g., [0, 0, -1, 0] for rate
        y_pred : np.ndarray, optional
            Predictions to check against physical bounds
        physical_bounds : Dict[str, float], optional
            Physical bounds, e.g., {'min': 0, 'max': 1e10}
        
        Returns
        -------
        Dict[str, Any]
            Dictionary containing:
            - dim_consistent: bool - all terms have same dimensions
            - matches_target: bool - terms match target dimensions
            - term_dimensions: List - dimensions of each term
            - bounds_satisfied: bool - predictions within bounds
            - violation_rate: float - fraction of bound violations
            - n_violations: int - number of violations
            - overall_valid: bool - all checks pass
        """
        # Dimensional consistency check
        self._dim_result = self._check_dimensional_consistency(
            equation_terms, variable_dims, target_dims
        )
        
        # Physical bounds check
        if y_pred is not None and physical_bounds is not None:
            self._bounds_result = self._check_physical_bounds(
                y_pred, physical_bounds
            )
        else:
            self._bounds_result = {
                'bounds_satisfied': True,
                'violation_rate': 0.0,
                'n_violations': 0,
                'n_lower_violations': 0,
                'n_upper_violations': 0
            }
        
        # Overall validity
        overall_valid = (
            self._dim_result['dim_consistent'] and
            self._dim_result['matches_target'] and
            self._bounds_result['bounds_satisfied']
        )
        
        self._verification_complete = True
        
        return {
            **self._dim_result,
            **self._bounds_result,
            'overall_valid': overall_valid
        }
    
    def _check_dimensional_consistency(
        self,
        terms: List[Dict],
        var_dims: Dict[str, List[float]],
        target_dims: List[float]
    ) -> Dict[str, Any]:
        """
        Check dimensional consistency of equation terms.
        
        Parameters
        ----------
        terms : List[Dict]
            Equation terms with variable exponents
        var_dims : Dict
            Variable dimensions
        target_dims : List[float]
            Target dimensions
        
        Returns
        -------
        Dict
            Dimensional check results
        """
        if len(terms) == 0:
            return {
                'dim_consistent': True,
                'matches_target': True,
                'term_dimensions': [],
                'term_details': []
            }
        
        target_dims = np.array(target_dims)
        n_dims = len(target_dims)
        
        term_dimensions = []
        term_details = []
        
        for term in terms:
            term_dim = self._compute_term_dimensions(term, var_dims, n_dims)
            term_dimensions.append(term_dim)
            
            term_details.append({
                'name': term.get('name', 'unnamed'),
                'dimensions': term_dim.tolist(),
                'matches_target': np.allclose(
                    term_dim, target_dims, atol=self.dim_tolerance
                )
            })
        
        term_dimensions = np.array(term_dimensions)
        
        # Check if all terms are consistent with each other
        if len(term_dimensions) > 1:
            all_consistent = all(
                np.allclose(term_dimensions[0], td, atol=self.dim_tolerance)
                for td in term_dimensions[1:]
            )
        else:
            all_consistent = True
        
        # Check if terms match target
        matches_target = np.allclose(
            term_dimensions[0], target_dims, atol=self.dim_tolerance
        )
        
        return {
            'dim_consistent': all_consistent,
            'matches_target': matches_target,
            'term_dimensions': term_dimensions.tolist(),
            'term_details': term_details
        }
    
    def _compute_term_dimensions(
        self,
        term: Dict,
        var_dims: Dict[str, List[float]],
        n_dims: int
    ) -> np.ndarray:
        """
        Compute dimensions of a single term.
        
        dim(term) = sum_j exponent_j * dim(variable_j)
        
        Parameters
        ----------
        term : Dict
            Term with 'variables' dict mapping name to exponent
        var_dims : Dict
            Variable dimensions
        n_dims : int
            Number of dimensions (typically 4: M, L, T, Theta)
        
        Returns
        -------
        np.ndarray
            Computed dimensions
        """
        result = np.zeros(n_dims)
        
        variables = term.get('variables', {})
        
        for var_name, exponent in variables.items():
            if var_name in var_dims:
                var_dim = np.array(var_dims[var_name])
                result += exponent * var_dim
            # If variable not in var_dims, assume dimensionless
        
        return result
    
    def _check_physical_bounds(
        self,
        y_pred: np.ndarray,
        bounds: Dict[str, float]
    ) -> Dict[str, Any]:
        """
        Check if predictions satisfy physical bounds.
        
        Parameters
        ----------
        y_pred : np.ndarray
            Predictions
        bounds : Dict[str, float]
            Physical bounds with 'min' and/or 'max'
        
        Returns
        -------
        Dict
            Bounds check results
        """
        n_total = len(y_pred)
        n_lower_violations = 0
        n_upper_violations = 0
        
        if 'min' in bounds:
            n_lower_violations = int(np.sum(y_pred < bounds['min']))
        
        if 'max' in bounds:
            n_upper_violations = int(np.sum(y_pred > bounds['max']))
        
        n_violations = n_lower_violations + n_upper_violations
        violation_rate = n_violations / n_total if n_total > 0 else 0.0
        
        bounds_satisfied = violation_rate <= self.bounds_tolerance
        
        return {
            'bounds_satisfied': bounds_satisfied,
            'violation_rate': violation_rate,
            'n_violations': n_violations,
            'n_lower_violations': n_lower_violations,
            'n_upper_violations': n_upper_violations,
            'n_total': n_total
        }
    
    def parse_equation_to_terms(
        self,
        equation_str: str,
        feature_names: List[str]
    ) -> List[Dict]:
        """
        Parse equation string into term structures.
        
        Parameters
        ----------
        equation_str : str
            Equation string, e.g., "0.89*q_c**2.47*N_d**(-1.79)"
        feature_names : List[str]
            List of feature names
        
        Returns
        -------
        List[Dict]
            List of term structures
        """
        terms = []
        
        # Split by + (handling +/- signs)
        equation_str = equation_str.replace(' ', '')
        equation_str = equation_str.replace('-', '+-')
        term_strs = [t for t in equation_str.split('+') if t]
        
        for term_str in term_strs:
            term = self._parse_single_term(term_str, feature_names)
            if term:
                terms.append(term)
        
        return terms
    
    def _parse_single_term(
        self,
        term_str: str,
        feature_names: List[str]
    ) -> Dict:
        """
        Parse a single term string.
        
        Parameters
        ----------
        term_str : str
            Single term string
        feature_names : List[str]
            Feature names
        
        Returns
        -------
        Dict
            Term structure
        """
        term = {
            'coefficient': 1.0,
            'variables': {},
            'name': term_str
        }
        
        # Extract coefficient (leading number)
        coef_match = re.match(r'^([+-]?\d*\.?\d+)\*?', term_str)
        if coef_match:
            term['coefficient'] = float(coef_match.group(1))
        
        # Extract variable exponents
        for var_name in feature_names:
            # Pattern: var_name**exponent or var_name^exponent
            pattern = rf'{re.escape(var_name)}\*\*\(?([+-]?\d*\.?\d+)\)?'
            match = re.search(pattern, term_str)
            if match:
                term['variables'][var_name] = float(match.group(1))
            elif var_name in term_str:
                # Variable without explicit exponent = exponent 1
                term['variables'][var_name] = 1.0
        
        return term
    
    def print_verification_report(self) -> None:
        """
        Print detailed verification report.
        """
        if not self._verification_complete:
            print("Verification not yet performed. Call verify() first.")
            return
        
        print("=" * 70)
        print(" Physics Verification Results")
        print("=" * 70)
        print()
        print(f"Configuration:")
        print(f"  Dimensional tolerance: {self.dim_tolerance}")
        print(f"  Bounds tolerance: {self.bounds_tolerance}")
        print()
        print("-" * 70)
        print(" Dimensional Consistency:")
        print("-" * 70)
        
        dim_status = "PASS" if self._dim_result['dim_consistent'] else "FAIL"
        target_status = "PASS" if self._dim_result['matches_target'] else "FAIL"
        
        print(f"  All terms consistent: {dim_status}")
        print(f"  Matches target dims: {target_status}")
        print()
        
        if 'term_details' in self._dim_result:
            print("  Term Details:")
            for detail in self._dim_result['term_details']:
                status = "OK" if detail['matches_target'] else "MISMATCH"
                print(f"    {detail['name']}: {detail['dimensions']} [{status}]")
        
        print()
        print("-" * 70)
        print(" Physical Bounds:")
        print("-" * 70)
        
        bounds_status = "PASS" if self._bounds_result['bounds_satisfied'] else "FAIL"
        print(f"  Bounds satisfied: {bounds_status}")
        print(f"  Violation rate: {self._bounds_result['violation_rate']:.4%}")
        print(f"  Lower violations: {self._bounds_result['n_lower_violations']}")
        print(f"  Upper violations: {self._bounds_result['n_upper_violations']}")
        
        print()
        print("=" * 70)
        overall = self._dim_result['dim_consistent'] and self._dim_result['matches_target'] and self._bounds_result['bounds_satisfied']
        print(f" OVERALL: {'VALID' if overall else 'INVALID'}")
        print("=" * 70)

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 10_PhysicsVerification")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Dimensional Consistency - Correct Equation
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Dimensional Consistency - Correct Equation")
    
    # Warm rain equation: dq_r/dt = C * q_c^2.47 * N_d^(-1.79)
    # Dimensions: [M, L, T, Theta]
    
    variable_dims = {
        'q_c': [0, 0, 0, 0],     # kg/kg (dimensionless)
        'N_d': [0, -3, 0, 0]     # m^-3
    }
    
    # Target: rate (s^-1) = [0, 0, -1, 0]
    target_dims = [0, 0, -1, 0]
    
    # Correct equation term
    terms = [{
        'variables': {'q_c': 2.47, 'N_d': -1.79},
        'coefficient': 0.89,
        'name': 'q_c^2.47 * N_d^-1.79'
    }]
    
    # Expected dimension: 2.47*[0,0,0,0] + (-1.79)*[0,-3,0,0] = [0, 5.37, 0, 0]
    # This doesn't match target! The coefficient C must absorb dimensions.
    # For testing, let's use a simpler example.
    
    # Simpler test: y = x^2 where x is [0, 1, 0, 0] (length)
    # y should be [0, 2, 0, 0] (area)
    
    simple_var_dims = {
        'x': [0, 1, 0, 0]  # length
    }
    simple_target = [0, 2, 0, 0]  # area
    
    simple_terms = [{
        'variables': {'x': 2.0},
        'coefficient': 1.0,
        'name': 'x^2'
    }]
    
    verifier = PhysicsVerifier(dim_tolerance=0.05)
    result = verifier.verify(simple_terms, simple_var_dims, simple_target)
    
    print(f"Equation: y = x^2")
    print(f"x dimensions: [0, 1, 0, 0] (length)")
    print(f"Target: [0, 2, 0, 0] (area)")
    print(f"Computed: {result['term_dimensions'][0]}")
    print(f"Matches target: {result['matches_target']}")
    
    if result['matches_target']:
        print("[PASS] Dimensional consistency verified")
    else:
        print("[FAIL] Unexpected dimension mismatch")

In [None]:
# ==============================================================================
# TEST 2: Dimensional Inconsistency Detection
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Dimensional Inconsistency Detection")
    
    # Equation with wrong dimensions
    var_dims = {
        'x': [0, 1, 0, 0],  # length
        'v': [0, 1, -1, 0]  # velocity
    }
    target = [0, 2, 0, 0]  # area
    
    # Incorrect: adding length + velocity
    wrong_terms = [
        {'variables': {'x': 1.0}, 'coefficient': 1.0, 'name': 'x'},
        {'variables': {'v': 1.0}, 'coefficient': 1.0, 'name': 'v'}
    ]
    
    verifier = PhysicsVerifier()
    result = verifier.verify(wrong_terms, var_dims, target)
    
    print(f"Equation: y = x + v (incorrect!)")
    print(f"Term 1 (x): {result['term_dimensions'][0]}")
    print(f"Term 2 (v): {result['term_dimensions'][1]}")
    print(f"Terms consistent: {result['dim_consistent']}")
    print(f"Matches target: {result['matches_target']}")
    
    if not result['dim_consistent']:
        print("[PASS] Correctly detected dimensional inconsistency")
    else:
        print("[FAIL] Should have detected inconsistency")

In [None]:
# ==============================================================================
# TEST 3: Physical Bounds Violation
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Physical Bounds Violation")
    
    # Predictions with some negative values
    np.random.seed(42)
    y_pred = np.concatenate([
        np.random.uniform(0, 10, 95),   # 95 valid predictions
        np.array([-1, -2, -0.5]),        # 3 negative (invalid)
        np.array([100, 200])             # 2 too large
    ])
    
    bounds = {'min': 0, 'max': 50}
    
    verifier = PhysicsVerifier(bounds_tolerance=0.05)  # Allow 5% violations
    
    # Use empty terms for this test
    result = verifier.verify([], {}, [], y_pred, bounds)
    
    print(f"Total predictions: 100")
    print(f"Negative values: 3")
    print(f"Values > max: 2")
    print(f"Violation rate: {result['violation_rate']:.2%}")
    print(f"Bounds satisfied (5% tolerance): {result['bounds_satisfied']}")
    
    # 5% of 100 = 5 violations allowed, we have 5 exactly
    if result['bounds_satisfied']:
        print("[PASS] Bounds check working correctly")
    else:
        print("[INFO] Bounds exceeded threshold")

In [None]:
# ==============================================================================
# TEST 4: Tolerance for Non-Integer Exponents
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: Tolerance for Non-Integer Exponents")
    
    var_dims = {'x': [0, 1, 0, 0]}  # length
    target = [0, 2.5, 0, 0]  # target dimension
    
    # PySR might find 2.47 instead of 2.5
    terms = [{
        'variables': {'x': 2.47},
        'coefficient': 1.0,
        'name': 'x^2.47'
    }]
    
    # Test different tolerances
    tolerances = [0.01, 0.05, 0.1]
    
    print(f"Target: [0, 2.5, 0, 0]")
    print(f"Computed: [0, 2.47, 0, 0]")
    print(f"Difference: 0.03")
    print()
    print(f"{'Tolerance':<12} {'Matches'}")
    print("-" * 25)
    
    for tol in tolerances:
        verifier = PhysicsVerifier(dim_tolerance=tol)
        result = verifier.verify(terms, var_dims, target)
        print(f"{tol:<12.2f} {result['matches_target']}")
    
    print()
    print("Note: atol=0.05 is recommended for SR applications")

In [None]:
# ==============================================================================
# TEST 5: Equation Parsing
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 5: Equation Parsing")
    
    equation = "0.89*q_c**2.47*N_d**(-1.79)"
    feature_names = ['q_c', 'N_d']
    
    verifier = PhysicsVerifier()
    terms = verifier.parse_equation_to_terms(equation, feature_names)
    
    print(f"Equation: {equation}")
    print(f"Parsed terms:")
    for i, term in enumerate(terms):
        print(f"  Term {i+1}:")
        print(f"    Coefficient: {term['coefficient']}")
        print(f"    Variables: {term['variables']}")
    
    # Verify parsing
    if len(terms) == 1:
        term = terms[0]
        if abs(term['coefficient'] - 0.89) < 0.01:
            if abs(term['variables'].get('q_c', 0) - 2.47) < 0.01:
                if abs(term['variables'].get('N_d', 0) - (-1.79)) < 0.01:
                    print("[PASS] Equation parsed correctly")
                else:
                    print("[FAIL] N_d exponent incorrect")
            else:
                print("[FAIL] q_c exponent incorrect")
        else:
            print("[FAIL] Coefficient incorrect")
    else:
        print(f"[INFO] Parsed {len(terms)} terms")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 10_PhysicsVerification.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASS: PhysicsVerifier")
print("-" * 70)
print()
print("Purpose:")
print("  Verify discovered equations satisfy physics constraints:")
print("  - Dimensional consistency across all terms")
print("  - Physical bounds on predictions")
print()
print("Main Methods:")
print("  verify(equation_terms, variable_dims, target_dims, y_pred, bounds)")
print("      Full verification with dimensional and bounds checks")
print("      Returns: dict with dim_consistent, matches_target, bounds_satisfied")
print()
print("  parse_equation_to_terms(equation_str, feature_names)")
print("      Parse equation string into term structures")
print()
print("  print_verification_report()")
print("      Print detailed verification results")
print()
print("Key Parameters:")
print("  dim_tolerance: Tolerance for dimension comparison (default: 0.05)")
print("  bounds_tolerance: Allowed fraction of violations (default: 0.01)")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Define variable dimensions [M, L, T, Theta]
variable_dims = {
    'q_c': [0, 0, 0, 0],   # dimensionless
    'N_d': [0, -3, 0, 0]   # m^-3
}
target_dims = [0, 0, -1, 0]  # rate (s^-1)

# Physical bounds
bounds = {'min': 0, 'max': 1e-3}

# Verify
verifier = PhysicsVerifier(dim_tolerance=0.05)
result = verifier.verify(terms, variable_dims, target_dims, y_pred, bounds)

print(f"Valid: {result['overall_valid']}")
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 10_PhysicsVerification.ipynb")
print("=" * 70)