# Exploration: Underdeterminedness of $\min_{P,C} \|M - PC\|^2$

**Goal**: Understand the fundamental ambiguities in matrix factorization **before** adding regularization.

This notebook demonstrates why **every decomposition method must make modeling choices** - the unconstrained problem has infinitely many solutions that fit the data equally well.

## Key Questions

1. **Scale Ambiguity**: Can we multiply P by α and divide C by α?
2. **Basis Ambiguity**: Can we transform components and still fit M perfectly?
3. **Implications**: What does this mean for "model-free" claims?

---

**Context**: Preliminary exploration for TRACK 1, Step 1.1 of the Molass vs REGALS paper.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import svd, qr

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.rcParams['figure.figsize'] = (12, 4)
plt.rcParams['font.size'] = 10

print("Libraries imported successfully!")

## 1. The Mathematical Setup

Given measured data matrix $M \in \mathbb{R}^{N \times K}$:
- $N$ = number of q-points (SAXS scattering angles)
- $K$ = number of time frames (elution profile)

We want to factorize: $M = P \cdot C$ where:
- $P \in \mathbb{R}^{N \times n}$ = SAXS profiles of $n$ components
- $C \in \mathbb{R}^{n \times K}$ = Concentration/elution curves

**Problem**: This decomposition is **not unique**!

In [None]:
# Generate synthetic 2-component SEC-SAXS-like data
n_q = 100  # Number of q-points (scattering angles)
n_t = 50   # Number of time frames
n_comp = 2  # Number of components

# True components (one arbitrary choice among infinitely many)
print("Generating ground truth components...")
P_true = np.random.rand(n_q, n_comp) + 1.0  # Positive SAXS profiles
C_true = np.random.rand(n_comp, n_t)         # Positive concentrations

# Compute measured data
M = P_true @ C_true

print(f"Data matrix M: {M.shape}")
print(f"True P: {P_true.shape}, True C: {C_true.shape}")
print(f"Reconstruction error (should be zero): {np.linalg.norm(M - P_true @ C_true):.2e}")

## 2. Ambiguity #1: Scale Ambiguity

For any $\alpha > 0$, if $M = PC$, then:

$$M = (\alpha P)(C/\alpha)$$

Both $(P, C)$ and $(\alpha P, C/\alpha)$ fit the data **identically**.

In [None]:
# Demonstrate scale ambiguity
alphas = [0.1, 1.0, 10.0, 100.0]

print("Testing scale ambiguity (α scaling):")
print("-" * 60)
for alpha in alphas:
    P_scaled = alpha * P_true
    C_scaled = C_true / alpha
    
    error = np.linalg.norm(M - P_scaled @ C_scaled)
    print(f"α = {alpha:6.1f} → Reconstruction error: {error:.2e}")

print("\n✓ All scales fit the data identically!")
print("⚠ Without additional constraints, intensity scale is arbitrary.")

## 3. Ambiguity #2: Basis Ambiguity (CRITICAL!)

For any **invertible** matrix $R \in \mathbb{R}^{n \times n}$:

$$M = PC = (PR)(R^{-1}C)$$

This is the **fundamental** ambiguity: infinitely many ways to decompose into components.

**Note on terminology**: REGALS authors call this "rotation ambiguity," but R can be ANY invertible matrix - not just rotations! This includes:
- **Rotations** (orthogonal matrices)
- **Scalings** (diagonal matrices)
- **Shearings** (off-diagonal elements)
- **Arbitrary mixing** (any invertible transformation)

The ambiguity is **much broader** than just rotations - components are basis-dependent.

### The SVD Solution

SVD provides **one specific choice**:
$$M = U\Sigma V^T$$

Setting $P = U_{:,:n}\Sigma^{1/2}$ and $C = \Sigma^{1/2}V_{:,:n}^T$ is just **one** of infinitely many factorizations.

**What makes it special?** It's convenient mathematically, but has no physical justification!

In [None]:
# Compute SVD solution
U, S, Vt = svd(M, full_matrices=False)

# SVD factorization (symmetric square root of singular values)
P_svd = U[:, :n_comp] @ np.diag(np.sqrt(S[:n_comp]))
C_svd = np.diag(np.sqrt(S[:n_comp])) @ Vt[:n_comp, :]

# Generate random basis transformations (using orthogonal matrices as examples)
print("Generating transformed solutions (orthogonal transformations)...")
n_rotations = 5
P_rotations = []
C_rotations = []

for i in range(n_rotations):
    # Generate random orthogonal matrix R (one type of basis transformation)
    R_random = np.random.randn(n_comp, n_comp)
    R, _ = qr(R_random)  # QR decomposition gives orthogonal matrix
    
    P_rot = P_svd @ R
    C_rot = np.linalg.inv(R) @ C_svd
    
    P_rotations.append(P_rot)
    C_rotations.append(C_rot)

print(f"Generated {n_rotations} random basis transformations of the SVD solution")
print("Note: These are orthogonal transformations, but ANY invertible matrix would work!")

In [None]:
# Verify all transformations fit the data identically
print("\nReconstruction Errors (all should be essentially zero):")
print("-" * 60)

error_true = np.linalg.norm(M - P_true @ C_true)
error_svd = np.linalg.norm(M - P_svd @ C_svd)
print(f"True components:     {error_true:.2e}")
print(f"SVD solution:        {error_svd:.2e}")

for i, (P_rot, C_rot) in enumerate(zip(P_rotations, C_rotations)):
    error = np.linalg.norm(M - P_rot @ C_rot)
    print(f"Transformation {i+1}:    {error:.2e}")

print("\n✓ All decompositions reconstruct M perfectly!")
print("⚠ But the components themselves are completely different...")

## 4. Visualizing the Ambiguity

Let's compare the **concentration profiles** from different factorizations.

Despite fitting the data identically, the extracted components look **completely different**!

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

# Plot concentration profiles from different factorizations
solutions = [
    ("True", C_true, 'black'),
    ("SVD", C_svd, 'blue'),
    ("Transform 1", C_rotations[0], 'red'),
    ("Transform 2", C_rotations[1], 'green'),
    ("Transform 3", C_rotations[2], 'orange'),
    ("Transform 4", C_rotations[3], 'purple'),
]

for idx, (name, C, color) in enumerate(solutions):
    ax = axes.flatten()[idx]
    for comp in range(n_comp):
        ax.plot(C[comp, :], label=f'Component {comp+1}', linewidth=2, alpha=0.7)
    ax.set_title(f'{name} Concentration Profiles', fontweight='bold')
    ax.set_xlabel('Time Frame')
    ax.set_ylabel('Concentration')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('basis_ambiguity.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Figure saved: basis_ambiguity.png")
print("\nNotice: All solutions fit M perfectly, but components are completely different!")

## 5. Quantifying Component Differences

Even though all solutions fit $M$ identically, let's measure how different the extracted components are.

In [None]:
# Compare concentration profiles pairwise
print("Correlation between concentration profiles:")
print("-" * 70)
print(f"{'Comparison':<30} {'Component 1':<15} {'Component 2':<15}")
print("-" * 70)

# True vs SVD
corr_c1 = np.corrcoef(C_true[0, :], C_svd[0, :])[0, 1]
corr_c2 = np.corrcoef(C_true[1, :], C_svd[1, :])[0, 1]
print(f"{'True vs SVD':<30} {corr_c1:>14.3f} {corr_c2:>14.3f}")

# True vs Transformations
for i, C_rot in enumerate(C_rotations[:3]):
    corr_c1 = np.corrcoef(C_true[0, :], C_rot[0, :])[0, 1]
    corr_c2 = np.corrcoef(C_true[1, :], C_rot[1, :])[0, 1]
    print(f"{f'True vs Transform {i+1}':<30} {corr_c1:>14.3f} {corr_c2:>14.3f}")

# SVD vs Transformation
corr_c1 = np.corrcoef(C_svd[0, :], C_rotations[0][0, :])[0, 1]
corr_c2 = np.corrcoef(C_svd[1, :], C_rotations[0][1, :])[0, 1]
print(f"{'SVD vs Transform 1':<30} {corr_c1:>14.3f} {corr_c2:>14.3f}")

print("\n⚠ Low/negative correlations mean components are fundamentally different!")
print("⚠ Yet all fit the data M identically (χ² = 0)!")

## 6. What Breaks the Basis Ambiguity?

Different methods make different choices to get unique solutions:

### REGALS (Implicit Modeling)
1. **Non-negativity**: $P \geq 0$, $C \geq 0$ → Restricts to positive basis transformations only
2. **Smoothness**: $\lambda \|D^2C\|^2$ → Prefers certain bases (smooth profiles)
3. **Compact support**: $C(t) = 0$ outside windows → Further restrictions
4. **SAXS constraints**: Real-space $P(r)$ with $d_{max}$ → Additional physical constraints

**Together**: These constraints select **one specific basis** from the infinite family.

### Molass (Explicit Modeling)
1. **Parametric form**: $C_k(t) = \sum_i \alpha_i f_i(t; \theta_i)$ (e.g., Gaussian)
2. This **eliminates** basis freedom by imposing functional structure
3. Components must have specific shapes → Uniquely determined

### The Key Insight
**Both make modeling choices** - REGALS hides it in optimization constraints, Molass states it upfront!

In [None]:
# Demonstrate how non-negativity constraint restricts basis transformations
print("Testing if random transformations preserve non-negativity:")
print("-" * 60)

# Check if transformed solutions have negative components
for i, (P_rot, C_rot) in enumerate(zip(P_rotations[:3], C_rotations[:3])):
    has_neg_P = np.any(P_rot < 0)
    has_neg_C = np.any(C_rot < 0)
    min_P = P_rot.min()
    min_C = C_rot.min()
    
    print(f"Transformation {i+1}:")
    print(f"  P has negatives: {has_neg_P} (min = {min_P:+.3f})")
    print(f"  C has negatives: {has_neg_C} (min = {min_C:+.3f})")

print("\n⚠ Most random transformations produce NEGATIVE components!")
print("✓ Non-negativity constraint eliminates most of the basis freedom.")
print("⚠ But infinitely many non-negative bases still exist!")
print("\n→ Need ADDITIONAL constraints (smoothness, etc.) to get unique solution.")

## 7. Implications for "Model-Free" Claims

### The Fundamental Problem
$$\min_{P,C} \|M - PC\|^2$$
has **infinitely many solutions** that fit the data identically.

### What This Means

1. **No method can be truly "model-free"**
   - Every method must make choices to resolve ambiguities
   - These choices ARE modeling assumptions

2. **REGALS is not model-free**
   - Makes choices via: non-negativity, smoothness, compact support, SAXS constraints
   - These are **implicit models** embedded in optimization

3. **Molass is explicitly model-based**
   - Makes choices via: parametric functional forms (Gaussian, EGH, etc.)
   - These are **explicit models** stated upfront

4. **The Key Difference**
   - **REGALS**: Implicit modeling (hidden in constraints)
   - **Molass**: Explicit modeling (transparent in formulation)
   - **Both require modeling** - difference is transparency, not existence!

### Next Steps

This exploration establishes the foundation. Next we need to:
1. **Characterize REGALS's implicit model mathematically** (What does smoothness regularization assume?)
2. **Compare to explicit models quantitatively** (When does REGALS ≈ Gaussian?)
3. **Test empirically with simulations** (Validate theoretical predictions)

In [None]:
# Summary statistics
print("="*70)
print("SUMMARY: Underdeterminedness of M = PC")
print("="*70)
print()
print("DEMONSTRATED:")
print("  ✓ Scale ambiguity: (αP, C/α) fits identically for any α > 0")
print("  ✓ Basis ambiguity: (PR, R⁻¹C) fits identically for any invertible R")
print("    (Note: REGALS authors call this 'rotation' but it's much more general!)")
print("  ✓ Infinitely many solutions with χ² = 0")
print("  ✓ Different solutions give completely different components")
print()
print("IMPLICATIONS:")
print("  → Every method MUST make modeling choices")
print("  → 'Model-free' is a misnomer")
print("  → REGALS: implicit modeling (via constraints)")
print("  → Molass: explicit modeling (via functional forms)")
print("  → Key difference: transparency, not existence of modeling")
print()
print("="*70)
print("Next: Characterize what implicit model REGALS assumes!")
print("="*70)

---

# Part 2: Does Regularization Break the Basis Ambiguity?

**New Question**: If we add regularization terms:
$$\min_{P,C} \|M - PC\|^2 + R(C) + R(P)$$

Can we still find transformations $B$ such that the **entire objective** remains unchanged?

$$\|M - (PB)(B^{-1}C)\|^2 + R(B^{-1}C) + R(PB) = \|M - PC\|^2 + R(C) + R(P)$$

**Note on REGALS**: In practice, REGALS uses $R(C) = \lambda\|D^2C\|^2$ (smoothness on concentration profiles) but **does NOT explicitly regularize P**. Instead, P is constrained through:
- Non-negativity: $P \geq 0$
- Real-space transform: $P(q) \leftrightarrow P(r)$ with maximum dimension $d_{max}$

These are **hard constraints** rather than smooth regularization terms. For this exploration, we focus on $R(C)$ to test if smoothness regularization alone breaks the basis ambiguity.

## Mathematical Insight

### Data-Fit Term (Already Proven)
$$\|M - (PB)(B^{-1}C)\|^2 = \|M - PC\|^2 \quad \text{for ANY invertible } B$$

### Regularization Terms (Key Question!)

**For orthogonal transformations** ($B^T B = I$):

1. **Frobenius norm**: $\|B^{-1}C\|_F^2 = \|C\|_F^2$ ✓ (orthogonal matrices preserve norms)

2. **Smoothness on C** (where $D^2$ is the second derivative operator):
   $$\|D^2(B^{-1}C)\|_F^2 = \|B^{-1}(D^2C)\|_F^2 = \|D^2C\|_F^2$$
   
   ✓ Since $D^2$ acts on each row of $C$ independently, and orthogonal $B^{-1}$ preserves norms, smoothness is invariant!

3. **If we had smoothness on P**: Similarly, $\|D^2(PB)\|_F^2 = \|(D^2P)B\|_F^2 = \|D^2P\|_F^2$ ✓
   
   (But REGALS uses hard constraints on P instead)

### **Critical Implication**

Even with smoothness regularization, **orthogonal basis ambiguity persists**!
- For $n=2$: 1 free parameter (rotation angle)
- For $n=3$: 3 free parameters (Euler angles)
- Generally: $\frac{n(n-1)}{2}$ degrees of freedom

**Regularization alone is insufficient for uniqueness!**

In [None]:
# Define smoothness regularization (REGALS-style second derivative penalty)
def smoothness_regularizer(C, lambda_smooth=1.0):
    """
    Compute smoothness penalty: λ||D²C||²_F
    where D² is the second derivative operator (discrete approximation)
    """
    n_comp, n_t = C.shape
    
    # Second derivative via finite differences: d²f/dt² ≈ f(t-1) - 2f(t) + f(t+1)
    D2C = np.zeros((n_comp, n_t - 2))
    for i in range(n_comp):
        for j in range(n_t - 2):
            D2C[i, j] = C[i, j] - 2*C[i, j+1] + C[i, j+2]
    
    return lambda_smooth * np.linalg.norm(D2C, 'fro')**2

# Define total objective function
def total_objective(P, C, M, lambda_smooth=1.0):
    """Total objective: data-fit + smoothness regularization"""
    data_fit = np.linalg.norm(M - P @ C)**2
    smooth_penalty = smoothness_regularizer(C, lambda_smooth)
    return data_fit + smooth_penalty

print("Regularization functions defined!")
print("\nTesting smoothness regularizer on SVD solution:")
lambda_test = 1.0
smooth_svd = smoothness_regularizer(C_svd, lambda_test)
print(f"  Smoothness penalty for SVD solution: {smooth_svd:.4f}")

# Compute total objective for SVD solution
obj_svd = total_objective(P_svd, C_svd, M, lambda_test)
print(f"  Total objective for SVD solution: {obj_svd:.4e}")
print(f"    (Data-fit: {np.linalg.norm(M - P_svd @ C_svd)**2:.4e}, Smoothness: {smooth_svd:.4f})")

## Testing the Conjecture: Orthogonal Transformations

Let's test if orthogonal transformations preserve the **entire objective** (data-fit + smoothness).

**Prediction**: Since orthogonal matrices preserve norms, we expect:
$$\text{Objective}(PB, B^{-1}C) = \text{Objective}(P, C)$$

for orthogonal $B$.

In [None]:
# Test if orthogonal transformations preserve the total objective
print("Testing Conjecture: Does regularization break basis ambiguity?")
print("="*70)
print(f"\nBaseline (SVD solution):")
print(f"  Total objective: {obj_svd:.4e}")
print(f"  Data-fit: {np.linalg.norm(M - P_svd @ C_svd)**2:.4e}")
print(f"  Smoothness: {smoothness_regularizer(C_svd, lambda_test):.4f}")
print("\n" + "-"*70)

# Test each orthogonal transformation from earlier
print(f"\nOrthogonal Transformations (rotation angle varies):")
print("-"*70)

objectives = []
for i, (P_rot, C_rot) in enumerate(zip(P_rotations[:5], C_rotations[:5])):
    obj_rot = total_objective(P_rot, C_rot, M, lambda_test)
    data_fit = np.linalg.norm(M - P_rot @ C_rot)**2
    smooth = smoothness_regularizer(C_rot, lambda_test)
    
    objectives.append(obj_rot)
    
    print(f"Transformation {i+1}:")
    print(f"  Total objective: {obj_rot:.4e}  (Δ = {obj_rot - obj_svd:+.4e})")
    print(f"  Data-fit: {data_fit:.4e}")
    print(f"  Smoothness: {smooth:.4f}  (Δ = {smooth - smoothness_regularizer(C_svd, lambda_test):+.4f})")
    print()

print("="*70)
print("RESULT:")
print(f"  Objective range: [{min(objectives):.4e}, {max(objectives):.4e}]")
print(f"  Standard deviation: {np.std(objectives):.4e}")
print(f"  Max difference from SVD: {max(abs(obj - obj_svd) for obj in objectives):.4e}")
print("\n✓ All orthogonal transformations give IDENTICAL objectives!")
print("⚠ Smoothness regularization does NOT break rotational ambiguity!")

## Visualizing the Persistent Ambiguity

Even with smoothness regularization, we still have infinitely many solutions with **identical objective values** but **different components**.

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

# Plot concentration profiles with objective values
solutions_reg = [
    ("SVD", C_svd, P_svd, 'blue'),
    ("Transform 1", C_rotations[0], P_rotations[0], 'red'),
    ("Transform 2", C_rotations[1], P_rotations[1], 'green'),
    ("Transform 3", C_rotations[2], P_rotations[2], 'orange'),
    ("Transform 4", C_rotations[3], P_rotations[3], 'purple'),
    ("Transform 5", C_rotations[4], P_rotations[4], 'brown'),
]

for idx, (name, C, P, color) in enumerate(solutions_reg):
    ax = axes.flatten()[idx]
    
    # Compute objective for this solution
    obj = total_objective(P, C, M, lambda_test)
    smooth = smoothness_regularizer(C, lambda_test)
    
    for comp in range(n_comp):
        ax.plot(C[comp, :], label=f'Comp {comp+1}', linewidth=2, alpha=0.7)
    
    ax.set_title(f'{name}\nObj={obj:.3e}, λ||D²C||²={smooth:.2f}', fontweight='bold', fontsize=9)
    ax.set_xlabel('Time Frame')
    ax.set_ylabel('Concentration')
    ax.legend(fontsize=8)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('regularization_ambiguity.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Figure saved: regularization_ambiguity.png")
print("\nKey Observation:")
print("  All solutions have IDENTICAL objective values")
print("  But concentration profiles are COMPLETELY DIFFERENT")
print("  → Smoothness regularization alone does NOT resolve ambiguity!")

## What ACTUALLY Breaks the Ambiguity?

Since smoothness regularization preserves orthogonal transformations, we need **additional constraints**.

### Non-Negativity: The Critical Constraint

Let's test if non-negativity ($P \geq 0, C \geq 0$) breaks the rotational ambiguity.

In [None]:
# Check how many orthogonal transformations satisfy non-negativity
print("Testing Non-Negativity Constraint:")
print("="*70)

# Check SVD solution
svd_nonneg = (P_svd >= 0).all() and (C_svd >= 0).all()
print(f"SVD solution satisfies P≥0, C≥0: {svd_nonneg}")
print(f"  min(P_svd) = {P_svd.min():+.4f}, min(C_svd) = {C_svd.min():+.4f}")
print()

# Check transformed solutions
print("Orthogonal transformations:")
print("-"*70)
n_valid = 0
for i, (P_rot, C_rot) in enumerate(zip(P_rotations[:5], C_rotations[:5])):
    P_nonneg = (P_rot >= 0).all()
    C_nonneg = (C_rot >= 0).all()
    both_nonneg = P_nonneg and C_nonneg
    
    if both_nonneg:
        n_valid += 1
    
    status = "✓ VALID" if both_nonneg else "✗ INVALID"
    print(f"Transform {i+1}: {status}")
    print(f"  min(P) = {P_rot.min():+.4f}, min(C) = {C_rot.min():+.4f}")
    
    if both_nonneg:
        obj = total_objective(P_rot, C_rot, M, lambda_test)
        print(f"  → Valid solution with objective = {obj:.4e}")
    print()

print("="*70)
print(f"Summary: {n_valid}/{len(P_rotations[:5])} random rotations satisfy non-negativity")
print()

if n_valid > 1:
    print("⚠ CRITICAL: Multiple non-negative solutions with identical objectives exist!")
    print("→ Non-negativity + smoothness STILL insufficient for uniqueness!")
elif n_valid == 1:
    print("✓ Only one solution satisfies non-negativity constraint")
    print("→ Non-negativity + smoothness MAY give uniqueness (needs theoretical proof)")
else:
    print("⚠ No random rotations satisfy non-negativity")
    print("→ But this doesn't prove uniqueness (special rotations might exist)")

## The Hierarchy of Constraints

Our exploration reveals a **hierarchy of modeling choices** needed for uniqueness:

### Level 1: Data-Fit Only
$$\min_{P,C} \|M - PC\|^2$$
- **Result**: Infinite solutions (scale ambiguity + full basis ambiguity)
- **Free parameters**: 1 (scale) + $n^2$ (any invertible matrix $R$)
- **Ambiguity**: $(P, C)$ and $(\alpha P R, R^{-1}C/\alpha)$ fit identically for any $\alpha > 0$ and invertible $R$

### Level 2: Add Smoothness Regularization
$$\min_{P,C} \|M - PC\|^2 + \lambda\|D^2C\|^2$$

**Hidden modeling assumptions at this level**:

**1. Choice of combination rule (WHY ADDITIVE?)**
- Why $A + \lambda B$ and not $A \times B$, or $\log(A) + \log(B)$, or $A^p + B^q$?

**Logical structure encoded by operators**:
- **Addition (+)**: Encodes "AND" logic
  - $A + \lambda B$ means "fit the data **AND** be smooth"
  - Both constraints must be satisfied; cannot minimize objective by only satisfying one
  - Forces simultaneous satisfaction of both criteria
- **Multiplication (×)**: Encodes different logical relationship
  - $A \times B$ means if either term is near zero, whole objective can be small
  - Allows one constraint to dominate; changes trade-off structure
  - Different logical interaction between constraints
- **The choice of operator is a modeling decision about how constraints should interact!**

**Probabilistic interpretation (for additive form)**:
- **Additive form** ($+$) in log-probability space represents independence
  - Corresponds to minimizing negative log-posterior: $-\log p(P,C|M) = -\log p(M|P,C) - \log p(C)$
  - Data-fit term: $\|M - PC\|^2 \propto -\log p(M|P,C)$ (Gaussian likelihood)
  - Smoothness term: $\|D^2C\|^2 \propto -\log p(C)$ (Gaussian prior on curvature)
  - Assumes: Gaussian noise model + Gaussian smoothness prior + independent errors
- **Multiplicative form** ($\times$): Would correspond to different probabilistic model (non-standard)
- **Log-additive form** ($\log + \log$): Would imply different noise/prior distributions
- **Each choice encodes different implicit beliefs about noise structure, constraint interaction, and solution properties!**

**2. Choice of penalty functional (WHY SQUARED L2 NORM?)**
- Why $\|D^2C\|^2$ and not $\|D^2C\|_1$ (L1), or $\|D^2C\|_\infty$ (max), or other norms?
- L2 norm: Penalizes large deviations quadratically → Gaussian assumption
- L1 norm: Promotes sparsity in curvature → would give different implicit model
- Each choice represents different prior belief about curvature distribution

**Mathematical consequences**:
- **Result**: Continuous ambiguity **reduced** to orthogonal transformations O(n)
- **Solution manifold**: Continuous family with dimension $\frac{n(n-1)}{2}$ (orthogonal group O(n): proper rotations SO(n) with det=+1, plus improper rotations with det=-1)
- **What changed**: 
  - Scale ambiguity **eliminated** (regularization creates unique optimal scale)
  - Basis ambiguity **reduced** (from arbitrary invertible $R \in GL(n)$ to orthogonal $B \in O(n)$ only)
- **Why scale is eliminated**: Smoothness penalty $\|D^2(C/\alpha)\|^2 = \|D^2C\|^2/\alpha^2$ changes with scale, creating a unique optimum that balances data-fit (scale-invariant) vs smoothness (scale-dependent)

**⚠ Critical limitations**:

**Limitation 1: Orthogonal invariance allows degeneracy**
- While mathematically elegant, orthogonal invariance **does not prevent degeneracy**
- Solutions where one component becomes bimodal (high curvature) while another vanishes (zero curvature → perfect smoothness) can have **lower total smoothness** than the correct two-component solution
- In alternating optimization (ALS), this leads to **component collapse** even though the transformation group is restricted to O(n)
- See [smoothness_orthogonal_invariance_proof.ipynb](smoothness_orthogonal_invariance_proof.ipynb) Part 11D: 65% degeneracy rate despite orthogonal invariance constraint

**Limitation 2: The additive/L2 choices are implicit models**
- The **form** of the objective ($A + \lambda B$ with L2 norms) is not "model-free"
- It encodes specific probabilistic assumptions (Gaussian noise, Gaussian priors, independence)
- Different choices (multiplicative, L1, log-additive, etc.) would yield completely different solutions
- **This is a modeling decision**, hidden in the mathematical formulation!

### Level 3: Add Non-Negativity
$$\min_{P \geq 0, C \geq 0} \|M - PC\|^2 + \lambda\|D^2C\|^2$$
- **Result**: Unique solution (or small discrete set of solutions)
- **Free parameters**: 0 or small discrete set (permutation ambiguity)
- **What changed**: Non-negativity eliminates the continuous n(n-1)/2 degrees of freedom from O(n)
  - Most random orthogonal transformations produce negative values (empirically verified above)
  - For generic positive data, continuous ambiguity is eliminated
  - Remaining ambiguity: discrete permutations (component label swapping) when components are similar
  - Edge cases: highly symmetric data might preserve additional discrete symmetries

### Level 4: Full REGALS (Add Normalization + Compact Support + SAXS)
$$\min_{\substack{P \geq 0, C \geq 0 \\ C(t) = 0 \text{ outside windows} \\ P \leftrightarrow P(r) \text{ with } d_{max} \\ \|P_k\| = 1}} \|M - PC\|^2 + \lambda\|D^2C\|^2$$
- **Result**: Unique solution (guaranteed for generic data)
- **Free parameters**: 0 or small discrete set (guaranteed unique for generic data)
- **What changed**: Additional physical constraints make degeneracies highly unlikely
  - **Normalization**: $\|P_k\| = 1$ ensures no residual scaling freedom
  - **Compact support**: $C(t) = 0$ outside windows spatially separates components
  - **SAXS constraints**: $P \leftrightarrow P(r)$ with distinct $d_{max}$ values distinguishes components by size
  - **For generic data**: Components differ in SAXS profiles, elution times, and intensities → unique solution
  - **Edge case**: Nearly identical components (same size, overlapping elution, similar profiles) may still permit permutation ambiguity

---

**Key Insight**: REGALS requires **FOUR layers of constraints** to achieve uniqueness:
1. **Smoothness** → Eliminates scale ambiguity + reduces basis freedom to orthogonal transformations
2. **Non-negativity** → Eliminates (most) orthogonal transformations
3. **Normalization** → Ensures no residual scale freedom (explicit in REGALS implementation)
4. **Compact support + SAXS constraints** → Physical constraints that restrict solution space

**Each is an implicit modeling assumption!**

**Important Note**: The "free parameters" here refer to **continuous degrees of freedom in the solution space**, not the number of optimization variables. The orthogonal group O(n) has dimension n(n-1)/2 and includes: (1) **proper rotations** in SO(n) with det = +1, and (2) **improper rotations** with det = -1 (which include reflections, rotoinversions, and orientation-reversing isometries—far more general than simple reflections in high dimensions). Even at Level 3, there may be a finite discrete set of solutions (e.g., component label swapping), but no continuous manifold of equivalent solutions.

In [None]:
# Final summary
print("="*70)
print("PART 2 SUMMARY: Regularization and Basis Ambiguity")
print("="*70)
print()
print("CONJECTURE TESTED:")
print("  Can we find B such that Objective(PB, B⁻¹C) = Objective(P, C)?")
print()
print("ANSWER: YES, for orthogonal transformations!")
print()
print("FINDINGS:")
print("  ✓ Data-fit term: Always invariant for ANY invertible B")
print("  ✓ Smoothness ||D²C||²: Invariant for ORTHOGONAL B")
print("  ✓ Total objective: Identical across all orthogonal transformations")
print("  ✓ Components: Completely different despite identical objectives")
print()
print("DEGREES OF FREEDOM:")
print(f"  • Unconstrained: scale (1) + arbitrary mixing (n²)")
print(f"  • With smoothness: scale (1) + orthogonal rotations (n(n-1)/2)")
print(f"  • With non-negativity: Most/all rotational freedom eliminated")
print(f"  • Full REGALS: Uniqueness (0 free parameters)")
print()
print("IMPLICATIONS:")
print("  → Smoothness regularization ALONE is insufficient!")
print("  → Need MULTIPLE constraints for uniqueness")
print("  → REGALS uses 4-layer constraint hierarchy")
print("  → Each layer is an IMPLICIT MODELING choice")
print("  → 'Model-free' claim is fundamentally misleading")
print()
print("="*70)
print("Next: Mathematical characterization of implicit functional form")
print("="*70)