# 01_BuckinghamPi - Physics-SR Framework v3.0

## Stage 1.1: Buckingham Pi Dimensional Analysis

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Mathematical Foundation

The Buckingham Pi theorem states that for a physical equation involving $n$ variables with $k$ fundamental dimensions, the equation can be rewritten using $(n-k)$ dimensionless groups.

**Key Insight:** If a physical law relates $n$ dimensional variables:
$$f(q_1, q_2, \ldots, q_n) = 0$$

and these variables span $k$ independent dimensions (e.g., M, L, T, $\Theta$), then the law can be expressed as:
$$F(\pi_1, \pi_2, \ldots, \pi_{n-k}) = 0$$

where each $\pi_i$ is a dimensionless group formed from products of powers of the original variables.

### Algorithm Overview

1. Construct dimensional matrix $D \in \mathbb{R}^{k \times n}$
2. Find null space: vectors $\mathbf{e}$ such that $D\mathbf{e} = 0$
3. Select simplest linearly independent integer solutions
4. Form $\pi$-groups: $\pi_j = \prod_i q_i^{e_{ij}}$

### Reference

Buckingham, E. (1914). On physically similar systems; illustrations of the use of dimensional equations. *Physical Review*, 4(4), 345.

---
## Section 1: Header and Imports

In [None]:
"""
01_BuckinghamPi.ipynb - Buckingham Pi Dimensional Analysis
===========================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- BuckinghamPiAnalyzer: Complete implementation of Buckingham Pi theorem
- Automatic selection of simplest dimensionless groups
- Data transformation to dimensionless space

Algorithm:
    1. Construct dimensional matrix D in R^(k x n)
    2. Compute null space dimension: n_pi = n - rank(D)
    3. Enumerate all integer null vectors with |exp| <= max_exponent
    4. Score by complexity: 10*(nonzero count) + sum(|exp|) + 0.1*max(|exp|)
    5. Select n_pi linearly independent vectors with lowest complexity

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for Buckingham Pi
import itertools
from math import gcd
from functools import reduce
from typing import Dict, List, Tuple, Optional, Any

print("01_BuckinghamPi: Additional imports successful.")

---
## Section 2: Class Definition

In [None]:
# ==============================================================================
# BUCKINGHAM PI ANALYZER CLASS
# ==============================================================================

class BuckinghamPiAnalyzer:
    """
    Buckingham Pi Dimensional Analysis.
    
    Reduces n variables with k base units to (n-k) dimensionless groups.
    Uses automatic selection of simplest integer exponent combinations.
    
    The algorithm finds all integer solutions to the null space of the
    dimensional matrix, scores them by complexity, and selects the
    simplest linearly independent set.
    
    Reference:
        Buckingham, E. (1914). On physically similar systems; illustrations
        of the use of dimensional equations. Physical Review, 4(4), 345.
    
    Attributes
    ----------
    max_exponent : int
        Maximum absolute value of exponents to search (default: 4)
    require_confirmation : bool
        If True, pause for user approval of selected groups
    _dim_matrix : np.ndarray
        Dimensional matrix D (k x n)
    _var_names : List[str]
        Variable names in order
    _n_pi_groups : int
        Number of dimensionless groups needed
    _all_candidates : List[np.ndarray]
        All valid null vectors found
    _selected_groups : List[np.ndarray]
        Selected linearly independent groups
    _complexity_scores : List[float]
        Complexity scores for all candidates
    _analysis_complete : bool
        Whether analysis has been performed
    
    Examples
    --------
    >>> analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    >>> var_dims = {
    ...     'L': [0, 1, 0, 0],   # Length: m
    ...     'm': [1, 0, 0, 0],   # Mass: kg
    ...     'g': [0, 1, -2, 0],  # Acceleration: m/s^2
    ...     'T': [0, 0, 1, 0],   # Time: s
    ... }
    >>> result = analyzer.analyze(var_dims)
    >>> print(result['pi_names'])
    ['T^2 * g / L']  # Pendulum: T^2 * g / L is dimensionless
    """
    
    def __init__(
        self, 
        max_exponent: int = DEFAULT_MAX_EXPONENT, 
        require_confirmation: bool = False
    ):
        """
        Initialize BuckinghamPiAnalyzer.
        
        Parameters
        ----------
        max_exponent : int
            Maximum absolute value of exponents to search.
            Larger values find more candidates but take longer.
            Default: 4 (searches -4 to +4)
        require_confirmation : bool
            If True, print candidates and wait for user confirmation.
            Default: False (automatic selection)
        """
        self.max_exponent = max_exponent
        self.require_confirmation = require_confirmation
        
        # Internal state
        self._dim_matrix = None
        self._var_names = None
        self._n_vars = 0
        self._n_dims = 0
        self._rank = 0
        self._n_pi_groups = 0
        self._all_candidates = []
        self._selected_groups = []
        self._complexity_scores = []
        self._analysis_complete = False
    
    def analyze(self, variable_dimensions: Dict[str, List[float]]) -> Dict[str, Any]:
        """
        Perform complete Buckingham pi analysis.
        
        Parameters
        ----------
        variable_dimensions : Dict[str, List[float]]
            Dictionary mapping variable names to their dimensional exponents
            [M, L, T, Theta] where M=mass, L=length, T=time, Theta=temperature.
            Example: {'velocity': [0, 1, -1, 0]}  # m/s
        
        Returns
        -------
        Dict[str, Any]
            Dictionary containing:
            - n_variables: Number of input variables
            - n_base_units: Rank of dimensional matrix
            - n_pi_groups: Number of dimensionless groups
            - all_candidates: List of all valid pi-groups found
            - all_complexities: Complexity scores for each candidate
            - selected_groups: Auto-selected simplest groups (exponent vectors)
            - selected_indices: Indices of selected candidates
            - pi_exponents: Matrix of selected exponents (n_groups x n_vars)
            - pi_names: Human-readable pi-group expressions
            - variable_names: List of variable names in order
        """
        # Store variable names in consistent order
        self._var_names = list(variable_dimensions.keys())
        self._n_vars = len(self._var_names)
        
        # Step 1: Construct dimensional matrix
        self._dim_matrix = self._construct_dimensional_matrix(variable_dimensions)
        self._n_dims = self._dim_matrix.shape[0]  # Number of base dimensions (usually 4)
        
        # Step 2: Compute rank
        self._rank = np.linalg.matrix_rank(self._dim_matrix)
        
        # Step 3: Find null space dimension
        self._n_pi_groups = self._n_vars - self._rank
        
        if self._n_pi_groups <= 0:
            # Special case: all variables are dimensionally independent
            # or there's a problem with the input
            self._analysis_complete = True
            return {
                'n_variables': self._n_vars,
                'n_base_units': self._rank,
                'n_pi_groups': 0,
                'all_candidates': [],
                'all_complexities': [],
                'selected_groups': [],
                'selected_indices': [],
                'pi_exponents': np.array([]),
                'pi_names': [],
                'variable_names': self._var_names,
                'message': 'No dimensionless groups possible (rank = n_vars)'
            }
        
        # Step 4: Enumerate integer null vectors
        self._all_candidates = self._enumerate_null_vectors()
        
        if len(self._all_candidates) == 0:
            # No valid null vectors found
            self._analysis_complete = True
            return {
                'n_variables': self._n_vars,
                'n_base_units': self._rank,
                'n_pi_groups': self._n_pi_groups,
                'all_candidates': [],
                'all_complexities': [],
                'selected_groups': [],
                'selected_indices': [],
                'pi_exponents': np.array([]),
                'pi_names': [],
                'variable_names': self._var_names,
                'message': f'No null vectors found with max_exponent={self.max_exponent}'
            }
        
        # Step 5: Compute complexity scores
        self._complexity_scores = [
            self._compute_complexity(vec) for vec in self._all_candidates
        ]
        
        # Step 6: Sort by complexity
        sorted_indices = np.argsort(self._complexity_scores)
        
        # Step 7: Select n_pi_groups linearly independent vectors
        self._selected_groups, selected_indices = self._select_independent(
            sorted_indices
        )
        
        # Construct pi-group expressions
        pi_names = [self._format_pi_group(vec) for vec in self._selected_groups]
        
        # Create exponent matrix
        if len(self._selected_groups) > 0:
            pi_exponents = np.vstack(self._selected_groups)
        else:
            pi_exponents = np.array([])
        
        self._analysis_complete = True
        
        return {
            'n_variables': self._n_vars,
            'n_base_units': self._rank,
            'n_pi_groups': self._n_pi_groups,
            'all_candidates': self._all_candidates,
            'all_complexities': self._complexity_scores,
            'selected_groups': self._selected_groups,
            'selected_indices': selected_indices,
            'pi_exponents': pi_exponents,
            'pi_names': pi_names,
            'variable_names': self._var_names
        }
    
    def _construct_dimensional_matrix(
        self, 
        variable_dimensions: Dict[str, List[float]]
    ) -> np.ndarray:
        """
        Construct the dimensional matrix D.
        
        D[i,j] = exponent of base dimension i in variable j
        
        Parameters
        ----------
        variable_dimensions : Dict[str, List[float]]
            Variable dimensions dictionary
            
        Returns
        -------
        np.ndarray
            Dimensional matrix D of shape (n_dims, n_vars)
        """
        n_vars = len(self._var_names)
        n_dims = len(next(iter(variable_dimensions.values())))
        
        D = np.zeros((n_dims, n_vars))
        for j, var_name in enumerate(self._var_names):
            D[:, j] = variable_dimensions[var_name]
        
        return D
    
    def _enumerate_null_vectors(self) -> List[np.ndarray]:
        """
        Enumerate all integer null vectors of the dimensional matrix.
        
        Searches for vectors e such that D @ e = 0 with integer entries
        satisfying |e_i| <= max_exponent.
        
        Returns
        -------
        List[np.ndarray]
            List of valid null vectors (normalized and deduplicated)
        """
        candidates = []
        seen_normalized = set()
        
        # Generate all possible integer combinations
        exp_range = range(-self.max_exponent, self.max_exponent + 1)
        
        for exponents in itertools.product(exp_range, repeat=self._n_vars):
            vec = np.array(exponents, dtype=float)
            
            # Skip zero vector
            if np.allclose(vec, 0):
                continue
            
            # Check if D @ e = 0 (within numerical tolerance)
            product = self._dim_matrix @ vec
            if np.allclose(product, 0, atol=1e-10):
                # Normalize and simplify
                normalized = self._normalize_vector(vec)
                
                # Check for duplicates using tuple representation
                vec_tuple = tuple(normalized.astype(int))
                if vec_tuple not in seen_normalized:
                    seen_normalized.add(vec_tuple)
                    candidates.append(normalized)
        
        return candidates
    
    def _normalize_vector(self, vec: np.ndarray) -> np.ndarray:
        """
        Normalize a null vector for consistent representation.
        
        1. Ensure first nonzero element is positive
        2. Divide by GCD of all elements to get simplest form
        
        Parameters
        ----------
        vec : np.ndarray
            Input vector with integer entries
            
        Returns
        -------
        np.ndarray
            Normalized vector
        """
        # Convert to integers
        int_vec = np.round(vec).astype(int)
        
        # Find first nonzero element
        nonzero_idx = np.where(int_vec != 0)[0]
        if len(nonzero_idx) == 0:
            return vec
        
        # Ensure first nonzero is positive
        if int_vec[nonzero_idx[0]] < 0:
            int_vec = -int_vec
        
        # Divide by GCD
        nonzero_vals = np.abs(int_vec[int_vec != 0])
        if len(nonzero_vals) > 0:
            vec_gcd = reduce(gcd, nonzero_vals)
            if vec_gcd > 1:
                int_vec = int_vec // vec_gcd
        
        return int_vec.astype(float)
    
    def _compute_complexity(self, vec: np.ndarray) -> float:
        """
        Compute complexity score for a null vector.
        
        complexity(e) = 10 * (# nonzero) + 1 * (sum |e_i|) + 0.1 * max(|e_i|)
        
        Lower complexity indicates simpler, more interpretable pi-groups.
        
        Parameters
        ----------
        vec : np.ndarray
            Exponent vector
            
        Returns
        -------
        float
            Complexity score
        """
        n_nonzero = np.sum(vec != 0)
        sum_abs = np.sum(np.abs(vec))
        max_abs = np.max(np.abs(vec))
        
        return 10.0 * n_nonzero + 1.0 * sum_abs + 0.1 * max_abs
    
    def _select_independent(
        self, 
        sorted_indices: np.ndarray
    ) -> Tuple[List[np.ndarray], List[int]]:
        """
        Select n_pi_groups linearly independent vectors from candidates.
        
        Iterates through candidates in order of increasing complexity,
        adding each if it's linearly independent from those already selected.
        
        Parameters
        ----------
        sorted_indices : np.ndarray
            Indices of candidates sorted by complexity (ascending)
            
        Returns
        -------
        Tuple[List[np.ndarray], List[int]]
            - Selected vectors
            - Original indices of selected vectors
        """
        selected = []
        selected_indices = []
        
        for idx in sorted_indices:
            candidate = self._all_candidates[idx]
            
            if self._is_linearly_independent(candidate, selected):
                selected.append(candidate)
                selected_indices.append(int(idx))
                
                if len(selected) >= self._n_pi_groups:
                    break
        
        return selected, selected_indices
    
    def _is_linearly_independent(
        self, 
        vec: np.ndarray, 
        selected: List[np.ndarray]
    ) -> bool:
        """
        Check if a vector is linearly independent from selected vectors.
        
        Parameters
        ----------
        vec : np.ndarray
            Candidate vector
        selected : List[np.ndarray]
            Already selected vectors
            
        Returns
        -------
        bool
            True if vec is linearly independent from selected
        """
        if len(selected) == 0:
            return True
        
        # Stack selected vectors and candidate
        matrix = np.vstack([*selected, vec])
        
        # Check rank
        rank_before = np.linalg.matrix_rank(np.vstack(selected))
        rank_after = np.linalg.matrix_rank(matrix)
        
        return rank_after > rank_before
    
    def _format_pi_group(self, vec: np.ndarray) -> str:
        """
        Format a pi-group as a human-readable string.
        
        Parameters
        ----------
        vec : np.ndarray
            Exponent vector
            
        Returns
        -------
        str
            Formatted expression like 'L^2 * T^-1 / m'
        """
        terms_positive = []
        terms_negative = []
        
        for i, (var_name, exp) in enumerate(zip(self._var_names, vec)):
            exp = int(exp)
            if exp == 0:
                continue
            elif exp == 1:
                terms_positive.append(var_name)
            elif exp == -1:
                terms_negative.append(var_name)
            elif exp > 0:
                terms_positive.append(f"{var_name}^{exp}")
            else:
                terms_negative.append(f"{var_name}^{-exp}")
        
        if len(terms_positive) == 0:
            numerator = "1"
        else:
            numerator = " * ".join(terms_positive)
        
        if len(terms_negative) == 0:
            return numerator
        else:
            denominator = " * ".join(terms_negative)
            if len(terms_negative) == 1:
                return f"{numerator} / {denominator}"
            else:
                return f"{numerator} / ({denominator})"
    
    def transform_data(
        self, 
        X: np.ndarray, 
        feature_names: List[str]
    ) -> Tuple[np.ndarray, List[str]]:
        """
        Transform data from original variables to dimensionless pi-groups.
        
        Parameters
        ----------
        X : np.ndarray
            Original feature matrix (n_samples, n_features)
        feature_names : List[str]
            Names of features matching order in X
            
        Returns
        -------
        Tuple[np.ndarray, List[str]]
            - Transformed data (n_samples, n_pi_groups)
            - Names of pi-groups
            
        Raises
        ------
        ValueError
            If analysis has not been performed or feature names don't match
        """
        if not self._analysis_complete:
            raise ValueError("Must run analyze() before transform_data()")
        
        if len(self._selected_groups) == 0:
            raise ValueError("No pi-groups to transform to")
        
        # Build mapping from feature names to column indices
        name_to_idx = {name: i for i, name in enumerate(feature_names)}
        
        # Check all variable names are in feature_names
        for var_name in self._var_names:
            if var_name not in name_to_idx:
                raise ValueError(f"Variable '{var_name}' not found in feature_names")
        
        n_samples = X.shape[0]
        n_groups = len(self._selected_groups)
        X_transformed = np.zeros((n_samples, n_groups))
        
        for g, group_vec in enumerate(self._selected_groups):
            # Compute pi_g = prod(x_i^{e_i})
            pi_values = np.ones(n_samples)
            
            for var_name, exp in zip(self._var_names, group_vec):
                if exp != 0:
                    col_idx = name_to_idx[var_name]
                    # Use safe power for non-integer exponents
                    pi_values *= safe_power(X[:, col_idx], exp)
            
            X_transformed[:, g] = pi_values
        
        # Generate pi-group names
        pi_names = [f"pi_{i+1}" for i in range(n_groups)]
        
        return X_transformed, pi_names
    
    def get_pi_group_expressions(self) -> List[str]:
        """
        Get human-readable expressions for selected pi-groups.
        
        Returns
        -------
        List[str]
            List of formatted pi-group expressions
        """
        if not self._analysis_complete:
            return []
        
        return [self._format_pi_group(vec) for vec in self._selected_groups]
    
    def print_analysis_report(self) -> None:
        """
        Print a detailed analysis report.
        """
        if not self._analysis_complete:
            print("Analysis not yet performed. Run analyze() first.")
            return
        
        print("=" * 70)
        print(" Buckingham Pi Analysis Results")
        print("=" * 70)
        print()
        print(f"Variables: {self._n_vars}")
        print(f"Variable names: {self._var_names}")
        print(f"Dimensional matrix rank: {self._rank}")
        print(f"Number of pi-groups needed: {self._n_pi_groups}")
        print(f"Total candidates found: {len(self._all_candidates)}")
        print()
        
        if len(self._all_candidates) > 0:
            print("-" * 70)
            print(" All Candidates (sorted by complexity):")
            print("-" * 70)
            
            sorted_indices = np.argsort(self._complexity_scores)
            selected_set = set(tuple(vec.astype(int)) for vec in self._selected_groups)
            
            for i, idx in enumerate(sorted_indices[:20]):  # Show top 20
                vec = self._all_candidates[idx]
                complexity = self._complexity_scores[idx]
                expression = self._format_pi_group(vec)
                
                is_selected = tuple(vec.astype(int)) in selected_set
                marker = " [SELECTED]" if is_selected else ""
                
                print(f"  [{i:2d}] {expression:40s} complexity={complexity:.1f}{marker}")
            
            if len(self._all_candidates) > 20:
                print(f"  ... and {len(self._all_candidates) - 20} more candidates")
        
        print()
        print("-" * 70)
        print(" Selected Pi-Groups (simplest linearly independent):")
        print("-" * 70)
        
        for i, vec in enumerate(self._selected_groups):
            expression = self._format_pi_group(vec)
            print(f"  pi_{i+1} = {expression}")
            print(f"       Exponents: {vec.astype(int).tolist()}")
        
        print()
        print("=" * 70)

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 01_BuckinghamPi")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Warm Rain Variables
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Warm Rain Variables")
    
    # Define warm rain variable dimensions
    warm_rain_dims = {
        'q_c':   [0, 0, 0, 0],     # kg/kg (dimensionless)
        'N_d':   [0, -3, 0, 0],    # m^-3
        'r_eff': [0, 1, 0, 0],     # m
        'LWC':   [1, -3, 0, 0],    # kg/m^3
    }
    
    print("Input dimensions:")
    for var, dims in warm_rain_dims.items():
        print(f"  {var}: {dims}")
    print()
    
    # Run analysis
    analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    result = analyzer.analyze(warm_rain_dims)
    
    # Print results
    print(f"Number of variables: {result['n_variables']}")
    print(f"Rank of D matrix: {result['n_base_units']}")
    print(f"Number of pi-groups: {result['n_pi_groups']}")
    print(f"Total candidates found: {len(result['all_candidates'])}")
    print()
    print("Selected pi-groups:")
    for i, name in enumerate(result['pi_names']):
        print(f"  pi_{i+1} = {name}")
    
    # Verification
    print()
    print("Verification:")
    print(f"  - q_c is dimensionless: should be a pi-group")
    print(f"  - Expected: pi-groups should include simple combinations")

In [None]:
# ==============================================================================
# TEST 2: Classic Pendulum Example
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Simple Pendulum (Classic Example)")
    
    # Pendulum: period T depends on length L, mass m, gravity g
    # Expected: T * sqrt(g/L) is dimensionless, mass m drops out
    pendulum_dims = {
        'L': [0, 1, 0, 0],      # Length: m
        'm': [1, 0, 0, 0],      # Mass: kg
        'g': [0, 1, -2, 0],     # Acceleration: m/s^2
        'T': [0, 0, 1, 0],      # Time: s
    }
    
    print("Input dimensions (Pendulum problem):")
    for var, dims in pendulum_dims.items():
        print(f"  {var}: {dims}")
    print()
    
    # Run analysis
    analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    result = analyzer.analyze(pendulum_dims)
    
    # Print results
    print(f"Number of variables: {result['n_variables']}")
    print(f"Rank of D matrix: {result['n_base_units']}")
    print(f"Number of pi-groups: {result['n_pi_groups']}")
    print()
    print("Selected pi-groups:")
    for i, name in enumerate(result['pi_names']):
        print(f"  pi_{i+1} = {name}")
    print()
    
    # Verification
    print("Expected result:")
    print("  The pi-group should be T^2 * g / L (or equivalent)")
    print("  Mass 'm' should NOT appear (it's irrelevant to pendulum period)")
    
    # Check if mass appears in selected groups
    m_idx = result['variable_names'].index('m')
    mass_in_groups = any(
        vec[m_idx] != 0 for vec in result['selected_groups']
    )
    print(f"\n  Verification: Mass appears in pi-groups: {mass_in_groups}")
    if not mass_in_groups:
        print("  [PASS] Mass correctly excluded from dimensionless groups")

In [None]:
# ==============================================================================
# TEST 3: All Dimensionless Variables
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: All Dimensionless Variables (Edge Case)")
    
    # All variables are dimensionless (mixing ratios)
    dimensionless_dims = {
        'q_c': [0, 0, 0, 0],    # kg/kg
        'q_r': [0, 0, 0, 0],    # kg/kg
        'q_v': [0, 0, 0, 0],    # kg/kg
    }
    
    print("Input dimensions (all dimensionless):")
    for var, dims in dimensionless_dims.items():
        print(f"  {var}: {dims}")
    print()
    
    # Run analysis
    analyzer = BuckinghamPiAnalyzer(max_exponent=3)
    result = analyzer.analyze(dimensionless_dims)
    
    print(f"Number of variables: {result['n_variables']}")
    print(f"Rank of D matrix: {result['n_base_units']}")
    print(f"Number of pi-groups: {result['n_pi_groups']}")
    print()
    
    # When all variables are dimensionless:
    # - Rank of D = 0 (all zeros)
    # - n_pi_groups = n_vars - 0 = n_vars
    # - Each variable itself is a valid pi-group
    print("Expected behavior:")
    print("  - Rank should be 0 (zero matrix)")
    print("  - Number of pi-groups = number of variables")
    print("  - Each variable is its own dimensionless group")
    print()
    print("Selected pi-groups:")
    for i, name in enumerate(result['pi_names']):
        print(f"  pi_{i+1} = {name}")

In [None]:
# ==============================================================================
# TEST 4: Data Transformation Verification
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: Data Transformation Verification")
    
    # Generate pendulum data
    np.random.seed(42)
    n = 100
    L = np.random.uniform(0.5, 2.0, n)
    m = np.random.uniform(0.1, 1.0, n)
    g = np.random.uniform(9.7, 10.0, n)
    T = 2 * np.pi * np.sqrt(L / g)  # True pendulum period
    
    X = np.column_stack([L, m, g, T])
    feature_names = ['L', 'm', 'g', 'T']
    
    print("Generated pendulum data:")
    print(f"  n_samples: {n}")
    print(f"  Features: {feature_names}")
    print(f"  True relation: T = 2*pi*sqrt(L/g)")
    print()
    
    # Analyze and transform
    pendulum_dims = {
        'L': [0, 1, 0, 0],
        'm': [1, 0, 0, 0],
        'g': [0, 1, -2, 0],
        'T': [0, 0, 1, 0],
    }
    
    analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    result = analyzer.analyze(pendulum_dims)
    
    print("Selected pi-groups:")
    for i, name in enumerate(result['pi_names']):
        print(f"  pi_{i+1} = {name}")
    print()
    
    # Transform data
    X_transformed, pi_names = analyzer.transform_data(X, feature_names)
    
    print(f"Transformed data shape: {X_transformed.shape}")
    print(f"Pi-group names: {pi_names}")
    print()
    
    # Verify: pi = T^2 * g / L should be constant = (2*pi)^2 = 4*pi^2
    print("Verification:")
    print("  If pi = T^2 * g / L, then pi should be constant = 4*pi^2")
    
    for i in range(X_transformed.shape[1]):
        pi_values = X_transformed[:, i]
        print(f"  pi_{i+1}: mean = {np.mean(pi_values):.6f}, std = {np.std(pi_values):.6f}")
        
        # Check if close to constant
        if np.std(pi_values) / np.abs(np.mean(pi_values)) < 1e-10:
            print(f"       -> Constant! Value = {np.mean(pi_values):.6f}")
            print(f"       -> 4*pi^2 = {4 * np.pi**2:.6f}")

In [None]:
# ==============================================================================
# TEST 5: Full Report Output
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 5: Full Analysis Report")
    
    # Use warm rain example for full report
    warm_rain_dims = {
        'q_c':   [0, 0, 0, 0],     # kg/kg (dimensionless)
        'N_d':   [0, -3, 0, 0],    # m^-3
        'r_eff': [0, 1, 0, 0],     # m
        'LWC':   [1, -3, 0, 0],    # kg/m^3
    }
    
    analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    result = analyzer.analyze(warm_rain_dims)
    
    # Print full report
    analyzer.print_analysis_report()

In [None]:
# ==============================================================================
# TEST 6: Integration with UserInputs
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 6: Integration with UserInputs")
    
    # Generate warm rain data with UserInputs
    X, y, feature_names, user_inputs = generate_warm_rain_data(
        n_samples=500, noise_level=0.01, seed=42
    )
    
    print(f"Generated data: X.shape = {X.shape}")
    print(f"Feature names: {feature_names}")
    print(f"UserInputs variable dimensions:")
    for var, dims in user_inputs.variable_dimensions.items():
        print(f"  {var}: {dims}")
    print()
    
    # Run Buckingham Pi analysis using UserInputs
    analyzer = BuckinghamPiAnalyzer(max_exponent=4)
    result = analyzer.analyze(user_inputs.variable_dimensions)
    
    print(f"Pi-groups needed: {result['n_pi_groups']}")
    print("Selected pi-groups:")
    for i, name in enumerate(result['pi_names']):
        print(f"  pi_{i+1} = {name}")
    print()
    
    # Transform data
    X_dimless, pi_names = analyzer.transform_data(X, feature_names)
    print(f"Transformed data shape: {X_dimless.shape}")
    print(f"Sample pi-group values (first 5 rows):")
    for i in range(min(5, X_dimless.shape[0])):
        print(f"  {X_dimless[i, :]}")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 01_BuckinghamPi.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASS: BuckinghamPiAnalyzer")
print("-" * 70)
print()
print("Purpose:")
print("  Reduce n variables with k base units to (n-k) dimensionless groups.")
print("  Automatically selects the simplest linearly independent combinations.")
print()
print("Main Methods:")
print("  analyze(variable_dimensions)")
print("      Perform complete Buckingham Pi analysis")
print("      Returns: dict with pi-groups, exponents, and all candidates")
print()
print("  transform_data(X, feature_names)")
print("      Transform data from original variables to pi-groups")
print("      Returns: (X_transformed, pi_names)")
print()
print("  get_pi_group_expressions()")
print("      Get human-readable pi-group expressions")
print()
print("  print_analysis_report()")
print("      Print detailed analysis report")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Define variable dimensions [M, L, T, Theta]
var_dims = {
    'L': [0, 1, 0, 0],      # Length: m
    'g': [0, 1, -2, 0],     # Acceleration: m/s^2
    'T': [0, 0, 1, 0],      # Time: s
}

# Create analyzer and run analysis
analyzer = BuckinghamPiAnalyzer(max_exponent=4)
result = analyzer.analyze(var_dims)

# Print selected pi-groups
for name in result['pi_names']:
    print(name)

# Transform data to dimensionless form
X_dimless, pi_names = analyzer.transform_data(X, feature_names)
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 01_BuckinghamPi.ipynb")
print("=" * 70)