# Problem-Informed Q-Matrix Design for Orthogonal-Invariant Regularization

**Date**: January 27, 2026  
**Purpose**: Explore whether problem-specific knowledge can be incorporated into fixed Q-matrix design while maintaining orthogonal invariance

**Context**: From [orthogonal_invariance_journey.md](orthogonal_invariance_journey.md) Open Research Question #3

---

## The Research Question

From Parts 11A-D of [smoothness_orthogonal_invariance_proof.ipynb](smoothness_orthogonal_invariance_proof.ipynb), we learned:

1. **Fixed Q maintains invariance**: $S(C) = \text{tr}(CQC^T)$ ⟺ orthogonal invariance
2. **Generic Q (ridge) insufficient**: $Q = (D^2)^T D^2 + \epsilon I$ helps marginally but doesn't prevent degeneracy
3. **Trade-off exists**: Invariance ↔ Degeneracy prevention

**New Question**: Can we design $Q$ using **problem-class knowledge** to reduce degeneracy risk while maintaining invariance?

### Key Distinction

- **Generic Q** (e.g., ridge): Problem-agnostic, works for any data
- **Problem-informed Q**: Incorporates priors (peak separation, widths, expected shapes) but **still fixed once designed**

### Why This Might Work

Standard smoothness $\|D^2C\|^2$ treats all curvature equally. But for SEC-SAXS:
- **Expected**: Smooth Gaussian-like peaks at certain separations
- **Degenerate**: Bimodal profile with unexpected peak locations

**Idea**: Design $Q$ to penalize "unexpected" curvature more heavily!

---

## Three Approaches to Explore

### 1. Spatially-Weighted Smoothness

$$Q = (D^2)^T W D^2$$

where $W$ is diagonal with higher weights in "unexpected" regions.

**Example**: If components expected at frames 35 and 55:
```
W_ii = 1 + γ·min(|i-35|, |i-55|) / frame_range
```
- Low penalty near expected peaks (allow curvature)
- High penalty in between (discourage bimodal spanning both)

### 2. Band-Pass Frequency Filter

$$Q = F^T \Lambda F$$

where $F$ is Fourier/DCT transform, $\Lambda$ diagonal frequency weights.

**Example**: For Gaussian peaks with width σ=5:
```
Λ_kk = exp(-(ω_k - ω_peak)² / (2σ_freq²))
```
- Allow frequencies corresponding to expected peak widths
- Penalize very low frequencies (flat/constant) and very high (noise/oscillations)

### 3. Size-Informed Smoothness Scaling

$$Q = (D^2)^T D^2 + \beta \text{diag}(\text{temporal\_weights})$$

where temporal weights based on SEC physics:
```
w(t) = expected_concentration(t | known_sizes)
```

**Example**: Larger particles (early elution) → narrower peaks → less allowed curvature in late frames

---

## Experimental Design

We'll test these approaches using the same setup as Part 11D (ALS with both P and C optimized):

1. **Baseline**: Standard smoothness (0% success from Part 11D)
2. **Ridge**: Generic improvement (35% → 70% from Part 11D)
3. **Spatially-weighted**: Test if prior knowledge helps
4. **Band-pass**: Test frequency-domain approach
5. **Size-informed**: Test SEC-physics-based weighting

**Goal**: Can problem-informed Q achieve >80% success while maintaining invariance?

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm as scipy_norm
from scipy.fft import dct, idct

np.random.seed(42)

print("Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")

## Part 1: Reproduce Test Setup from Part 11D

Use the same data generation as the ALS test that showed degeneracy.

In [None]:
# Generate test data (same as Part 11D)
n_components = 2
n_frames = 100
frames = np.arange(n_frames)

# True concentration profiles (two separated Gaussian peaks)
C_true = np.zeros((n_components, n_frames))
C_true[0, :] = scipy_norm.pdf(frames, loc=35, scale=4)  # Component 1: frame 35, σ=4
C_true[1, :] = scipy_norm.pdf(frames, loc=55, scale=6)  # Component 2: frame 55, σ=6
C_true = C_true / C_true.sum(axis=1, keepdims=True)  # Normalize

# SAXS profiles (Guinier-Porod-like decay, correlated)
q = np.linspace(0.01, 0.3, 50)
P_true = np.zeros((2, 50))
P_true[0, :] = np.exp(-0.5 * (q * 40)**2 / 3)  # Larger particle (Rg=40Å)
P_true[1, :] = np.exp(-0.5 * (q * 20)**2 / 3) * 0.5  # Smaller particle (Rg=20Å)

# Generate data
M = P_true.T @ C_true

# Correlation (this is what causes degeneracy)
correlation = np.corrcoef(P_true[0, :], P_true[1, :])[0, 1]

print(f"Test setup:")
print(f"  Components: {n_components}")
print(f"  Frames: {n_frames}")
print(f"  Peak separation: {55-35} frames")
print(f"  Peak widths: σ₁={4}, σ₂={6}")
print(f"  SAXS correlation: r = {correlation:.3f}")
print(f"  Data shape: {M.shape}")
print()
print("✓ This is the same setup that showed 100% degeneracy in Part 11D")

## Part 2: ALS Implementation with Custom Q-Matrix

Generalize the ALS implementation to accept arbitrary Q-matrix for smoothness.

In [None]:
def als_with_custom_Q(M, k, Q, epsilon_ridge=0, max_iter=100, tol=1e-6, random_seed=None):
    """
    Alternating Least Squares with custom Q-matrix regularization.
    
    Objective: ||M - P^T C||² + tr(CQC^T) + ε||C||²
    
    Parameters
    ----------
    M : array (n_q, K)
        Data matrix
    k : int
        Number of components
    Q : array (K, K)
        Positive semi-definite regularization matrix
    epsilon_ridge : float
        Ridge regularization parameter
    
    Returns
    -------
    P : array (n_q, k)
        SAXS profiles
    C : array (k, K)
        Concentration profiles
    history : dict
        Optimization history
    """
    n_q, K = M.shape
    
    # Random initialization
    if random_seed is not None:
        np.random.seed(random_seed)
    
    C = np.random.rand(k, K)
    C = C / C.sum(axis=1, keepdims=True)
    P = np.random.rand(k, n_q).T
    
    history = {'data_fit': [], 'smoothness': [], 'ridge': [], 'total': []}
    
    for iteration in range(max_iter):
        C_old = C.copy()
        
        # Update C (fix P)
        for j in range(k):
            pj = P[:, j]
            pj_norm_sq = np.dot(pj, pj)
            
            # RHS
            residual_j = M.T @ pj
            for j_other in range(k):
                if j_other != j:
                    residual_j -= pj_norm_sq * C[j_other, :]
            
            # Solve: (p_j^T p_j I + Q + ε I) c_j = RHS
            A = pj_norm_sq * np.eye(K) + Q + epsilon_ridge * np.eye(K)
            b = residual_j
            
            C[j, :] = np.linalg.solve(A, b)
            C[j, :] = np.maximum(C[j, :], 0)  # Non-negativity
        
        # Update P (fix C)
        for i in range(n_q):
            mi = M[i, :]
            CtC = C @ C.T
            Ctm = C @ mi
            CtC_reg = CtC + 1e-10 * np.eye(k)
            P[i, :] = np.linalg.solve(CtC_reg, Ctm)
            P[i, :] = np.maximum(P[i, :], 0)
        
        # Compute objective
        M_recon = P @ C
        data_fit = np.linalg.norm(M - M_recon, 'fro')**2
        smoothness = np.trace(C @ Q @ C.T)
        ridge = np.linalg.norm(C, 'fro')**2
        total_obj = data_fit + smoothness + epsilon_ridge * ridge
        
        history['data_fit'].append(data_fit)
        history['smoothness'].append(smoothness)
        history['ridge'].append(ridge)
        history['total'].append(total_obj)
        
        # Check convergence
        delta = np.linalg.norm(C - C_old, 'fro') / (np.linalg.norm(C_old, 'fro') + 1e-10)
        if delta < tol:
            break
    
    return P, C, history


def evaluate_solution(C_opt, C_true):
    """
    Check if permutation is correct and if solution is degenerate.
    
    Returns
    -------
    is_correct : bool
        True if permutation matches ground truth
    is_degenerate : bool
        True if one component nearly vanishes
    """
    # Check permutation by correlation
    corr_11 = np.corrcoef(C_opt[0, :], C_true[0, :])[0, 1]
    corr_12 = np.corrcoef(C_opt[0, :], C_true[1, :])[0, 1]
    is_correct = abs(corr_11) > abs(corr_12)
    
    # Check for degeneracy
    energies = np.linalg.norm(C_opt, axis=1)
    is_degenerate = np.min(energies) / np.max(energies) < 0.1
    
    return is_correct, is_degenerate


print("✓ ALS implementation with custom Q-matrix ready")

## Part 3: Standard Smoothness (Baseline)

Reproduce the baseline from Part 11D: $Q = (D^2)^T D^2$

In [None]:
# Second derivative operator
K = n_frames
D2 = np.zeros((K - 2, K))
for i in range(K - 2):
    D2[i, i:i+3] = [1, -2, 1]

# Standard smoothness Q
Q_standard = D2.T @ D2

print("Testing standard smoothness (baseline):")
print("="*70)

n_trials = 20
results_standard = []

for trial in range(n_trials):
    P_opt, C_opt, history = als_with_custom_Q(
        M, k=n_components, Q=Q_standard, epsilon_ridge=0,
        max_iter=100, random_seed=trial
    )
    
    is_correct, is_degenerate = evaluate_solution(C_opt, C_true)
    
    results_standard.append({
        'correct': is_correct,
        'degenerate': is_degenerate,
        'objective': history['total'][-1]
    })

n_correct = sum(r['correct'] for r in results_standard)
n_degenerate = sum(r['degenerate'] for r in results_standard)

print(f"Standard smoothness Q = (D²)ᵀD²:")
print(f"  Correct permutation: {n_correct}/{n_trials} ({n_correct/n_trials*100:.0f}%)")
print(f"  Degenerate solutions: {n_degenerate}/{n_trials} ({n_degenerate/n_trials*100:.0f}%)")
print()
print("✓ Baseline established (should match Part 11D: ~35% success)")
print("="*70)

## Part 4: Approach 1 - Spatially-Weighted Smoothness

Design $Q = (D^2)^T W D^2$ where $W$ penalizes curvature more heavily in "unexpected" regions.

**Hypothesis**: Bimodal solution needs curvature between the two true peaks (frames 35-55). By increasing penalty in this region, we make the degenerate solution less attractive.

In [None]:
# Design spatial weighting based on expected peak locations
def create_spatial_weights(frames, peak_locs, peak_widths, gamma=2.0):
    """
    Create spatial weights that penalize curvature in unexpected regions.
    
    Lower weights near expected peaks (allow curvature)
    Higher weights between peaks (discourage bimodal spanning)
    
    Parameters
    ----------
    frames : array
        Frame indices
    peak_locs : list
        Expected peak locations (e.g., [35, 55])
    peak_widths : list
        Expected peak widths (e.g., [4, 6])
    gamma : float
        Steepness of weighting (higher = stronger penalty between peaks)
    
    Returns
    -------
    W : array (K-2, K-2)
        Diagonal weight matrix
    """
    K = len(frames)
    weights = np.ones(K - 2)
    
    for i in range(K - 2):
        frame_center = frames[i + 1]  # Center of D² stencil
        
        # Distance to nearest expected peak (normalized by width)
        min_dist = float('inf')
        for loc, width in zip(peak_locs, peak_widths):
            dist = abs(frame_center - loc) / width
            min_dist = min(min_dist, dist)
        
        # Weight increases with distance from peaks
        # w(x) = 1 + γ·min_dist
        weights[i] = 1.0 + gamma * min_dist
    
    W = np.diag(weights)
    return W


# Create spatially-weighted Q
peak_locs = [35, 55]  # Known from ground truth
peak_widths = [4, 6]   # Known from ground truth

print("Testing spatially-weighted smoothness:")
print("="*70)
print(f"Expected peak locations: {peak_locs}")
print(f"Expected peak widths: {peak_widths}")
print()

# Test different γ values
gamma_values = [0.5, 1.0, 2.0, 4.0]
results_spatial = {}

for gamma in gamma_values:
    W = create_spatial_weights(frames, peak_locs, peak_widths, gamma=gamma)
    Q_spatial = D2.T @ W @ D2
    
    results = []
    for trial in range(n_trials):
        P_opt, C_opt, history = als_with_custom_Q(
            M, k=n_components, Q=Q_spatial, epsilon_ridge=0,
            max_iter=100, random_seed=trial
        )
        
        is_correct, is_degenerate = evaluate_solution(C_opt, C_true)
        results.append({
            'correct': is_correct,
            'degenerate': is_degenerate,
            'objective': history['total'][-1]
        })
    
    n_correct = sum(r['correct'] for r in results)
    n_degenerate = sum(r['degenerate'] for r in results)
    
    results_spatial[gamma] = {
        'n_correct': n_correct,
        'n_degenerate': n_degenerate,
        'success_rate': n_correct / n_trials * 100,
        'degeneracy_rate': n_degenerate / n_trials * 100
    }
    
    print(f"γ = {gamma:4.1f}: Correct {n_correct}/{n_trials} ({n_correct/n_trials*100:5.1f}%), "
          f"Degenerate {n_degenerate}/{n_trials} ({n_degenerate/n_trials*100:5.1f}%)")

print("="*70)

# Find best γ
best_gamma = max(results_spatial.keys(), key=lambda g: results_spatial[g]['success_rate'])
best_rate = results_spatial[best_gamma]['success_rate']

print(f"\nBest spatial weighting: γ = {best_gamma} → {best_rate:.0f}% success")

if best_rate > 80:
    print("✓✓ SUCCESS! Spatial weighting prevents degeneracy while maintaining invariance!")
elif best_rate > n_correct/n_trials*100:
    print(f"⚠ Improvement over baseline ({n_correct/n_trials*100:.0f}% → {best_rate:.0f}%) but insufficient")
else:
    print("✗ Spatial weighting did not improve over baseline")

## Part 5: Approach 2 - Band-Pass Frequency Filter

Design $Q$ in frequency domain: $Q = F^T \Lambda F$ where $F$ is DCT transform and $\Lambda$ contains frequency weights.

**Hypothesis**: Degenerate solution has different frequency content than true solution. By targeting the "expected" frequency range, we can disfavor degeneracy.

In [None]:
def create_frequency_Q(K, peak_width, low_cutoff=0.1, high_cutoff=0.8):
    """
    Create Q-matrix based on frequency-domain filtering.
    
    Penalizes very low frequencies (constant/flat) and very high (noise).
    Allows mid-range frequencies corresponding to expected peak widths.
    
    Parameters
    ----------
    K : int
        Number of frames
    peak_width : float
        Expected peak width in frames
    low_cutoff : float
        Low frequency cutoff (0-1, fraction of max frequency)
    high_cutoff : float
        High frequency cutoff (0-1, fraction of max frequency)
    
    Returns
    -------
    Q : array (K, K)
        Frequency-based regularization matrix
    """
    # DCT basis
    F = np.zeros((K, K))
    for k in range(K):
        for n in range(K):
            F[k, n] = np.cos(np.pi * k * (n + 0.5) / K)
    F = F / np.sqrt(K / 2)
    F[0, :] /= np.sqrt(2)
    
    # Frequency weights (band-pass)
    freqs = np.arange(K) / K  # Normalized frequencies [0, 1]
    
    # Penalty for very low frequencies (constant/flat)
    low_penalty = np.exp(-((freqs - 0) / low_cutoff)**2)
    
    # Penalty for very high frequencies (noise/oscillations)
    high_penalty = 1 - np.exp(-((freqs - 1) / (1 - high_cutoff))**2)
    
    # Combined penalty (low in middle, high at extremes)
    weights = low_penalty + high_penalty
    weights[0] *= 10  # Extra penalty on DC component (flat)
    
    Lambda = np.diag(weights)
    
    # Q = F^T Λ F
    Q = F.T @ Lambda @ F
    
    return Q


print("Testing frequency-domain band-pass filtering:")
print("="*70)

# Test different frequency cutoffs
cutoff_pairs = [(0.05, 0.7), (0.1, 0.8), (0.15, 0.85)]
results_freq = {}

for low_cut, high_cut in cutoff_pairs:
    Q_freq = create_frequency_Q(K, peak_width=5, low_cutoff=low_cut, high_cutoff=high_cut)
    
    results = []
    for trial in range(n_trials):
        P_opt, C_opt, history = als_with_custom_Q(
            M, k=n_components, Q=Q_freq, epsilon_ridge=0,
            max_iter=100, random_seed=trial
        )
        
        is_correct, is_degenerate = evaluate_solution(C_opt, C_true)
        results.append({
            'correct': is_correct,
            'degenerate': is_degenerate,
            'objective': history['total'][-1]
        })
    
    n_correct = sum(r['correct'] for r in results)
    n_degenerate = sum(r['degenerate'] for r in results)
    
    key = (low_cut, high_cut)
    results_freq[key] = {
        'n_correct': n_correct,
        'n_degenerate': n_degenerate,
        'success_rate': n_correct / n_trials * 100,
        'degeneracy_rate': n_degenerate / n_trials * 100
    }
    
    print(f"Cutoffs ({low_cut:.2f}, {high_cut:.2f}): "
          f"Correct {n_correct}/{n_trials} ({n_correct/n_trials*100:5.1f}%), "
          f"Degenerate {n_degenerate}/{n_trials} ({n_degenerate/n_trials*100:5.1f}%)")

print("="*70)

# Find best cutoffs
best_cutoffs = max(results_freq.keys(), key=lambda k: results_freq[k]['success_rate'])
best_rate = results_freq[best_cutoffs]['success_rate']

print(f"\nBest frequency cutoffs: {best_cutoffs} → {best_rate:.0f}% success")

if best_rate > 80:
    print("✓✓ SUCCESS! Frequency filtering prevents degeneracy while maintaining invariance!")
elif best_rate > n_correct/n_trials*100:
    print(f"⚠ Improvement over baseline ({n_correct/n_trials*100:.0f}% → {best_rate:.0f}%) but insufficient")
else:
    print("✗ Frequency filtering did not improve over baseline")

## Part 6: Approach 3 - Combined Spatial + Ridge

Combine the best spatial weighting with ridge regularization:

$$Q = (D^2)^T W D^2 + \epsilon I$$

**Hypothesis**: Spatial weighting addresses where curvature appears, ridge addresses amplitude imbalance. Together they might be sufficient.

In [None]:
print("Testing combined spatial weighting + ridge:")
print("="*70)

# Use best spatial gamma from Part 4
W_best = create_spatial_weights(frames, peak_locs, peak_widths, gamma=best_gamma)
Q_spatial_best = D2.T @ W_best @ D2

# Test different ridge values
epsilon_values = [0.01, 0.1, 0.5, 1.0]
results_combined = {}

for eps in epsilon_values:
    results = []
    for trial in range(n_trials):
        P_opt, C_opt, history = als_with_custom_Q(
            M, k=n_components, Q=Q_spatial_best, epsilon_ridge=eps,
            max_iter=100, random_seed=trial
        )
        
        is_correct, is_degenerate = evaluate_solution(C_opt, C_true)
        results.append({
            'correct': is_correct,
            'degenerate': is_degenerate,
            'objective': history['total'][-1]
        })
    
    n_correct = sum(r['correct'] for r in results)
    n_degenerate = sum(r['degenerate'] for r in results)
    
    results_combined[eps] = {
        'n_correct': n_correct,
        'n_degenerate': n_degenerate,
        'success_rate': n_correct / n_trials * 100,
        'degeneracy_rate': n_degenerate / n_trials * 100
    }
    
    print(f"ε = {eps:5.2f}: Correct {n_correct}/{n_trials} ({n_correct/n_trials*100:5.1f}%), "
          f"Degenerate {n_degenerate}/{n_trials} ({n_degenerate/n_trials*100:5.1f}%)")

print("="*70)

# Find best epsilon
best_eps = max(results_combined.keys(), key=lambda e: results_combined[e]['success_rate'])
best_rate = results_combined[best_eps]['success_rate']

print(f"\nBest combined (γ={best_gamma}, ε={best_eps}): {best_rate:.0f}% success")

if best_rate > 80:
    print("✓✓ SUCCESS! Combined spatial + ridge prevents degeneracy!")
elif best_rate > results_spatial[best_gamma]['success_rate']:
    improvement = best_rate - results_spatial[best_gamma]['success_rate']
    print(f"⚠ Improvement over spatial alone (+{improvement:.0f}%) but insufficient")
else:
    print("✗ Ridge did not add value to spatial weighting")

## Part 7: Visualization - Comparison of All Approaches

In [None]:
# Compile all results
methods = [
    ('Standard\n(D²)ᵀD²', sum(r['correct'] for r in results_standard) / n_trials * 100),
    (f'Spatial\nγ={best_gamma}', results_spatial[best_gamma]['success_rate']),
    (f'Frequency\n{best_cutoffs}', results_freq[best_cutoffs]['success_rate']),
    (f'Combined\nγ={best_gamma}, ε={best_eps}', results_combined[best_eps]['success_rate'])
]

method_names = [m[0] for m in methods]
success_rates = [m[1] for m in methods]

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Left: Success rates
colors = ['red' if r < 50 else 'orange' if r < 80 else 'green' for r in success_rates]
bars = ax1.bar(range(len(methods)), success_rates, color=colors, alpha=0.7, edgecolor='black', linewidth=1.5)
ax1.axhline(80, color='green', linestyle='--', linewidth=2, label='Target: 80%')
ax1.axhline(50, color='orange', linestyle='--', linewidth=1, alpha=0.5)
ax1.set_ylabel('Success Rate (%)', fontsize=12, fontweight='bold')
ax1.set_title('Problem-Informed Q-Matrix Design\nComparison of Approaches', 
              fontsize=13, fontweight='bold')
ax1.set_xticks(range(len(methods)))
ax1.set_xticklabels(method_names, fontsize=10)
ax1.set_ylim([0, 105])
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for i, (bar, rate) in enumerate(zip(bars, success_rates)):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 2,
            f'{rate:.0f}%', ha='center', va='bottom', fontweight='bold', fontsize=11)

# Right: Summary table
ax2.axis('off')
summary_text = f"""
SUMMARY: Problem-Informed Q-Matrix Design
{'='*50}

Research Question:
Can problem-class knowledge be incorporated into
fixed Q-matrix to reduce degeneracy while maintaining
orthogonal invariance?

Approaches Tested:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Standard (D²)ᵀD²: {success_rates[0]:.0f}%
   → Baseline (from Part 11D)

2. Spatial weighting: {success_rates[1]:.0f}%
   → Penalize curvature between peaks
   → Q = (D²)ᵀWD² with γ={best_gamma}

3. Frequency filtering: {success_rates[2]:.0f}%
   → Band-pass in frequency domain
   → Q = FᵀΛF with cutoffs {best_cutoffs}

4. Combined (spatial+ridge): {success_rates[3]:.0f}%
   → Spatial location + amplitude control
   → Q + εI with γ={best_gamma}, ε={best_eps}

{'✓✓ SUCCESS!' if max(success_rates) > 80 else '⚠ INSUFFICIENT' if max(success_rates) > success_rates[0] else '✗ NO IMPROVEMENT'}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Best approach: {method_names[success_rates.index(max(success_rates))]}
Success rate: {max(success_rates):.0f}%
Improvement: +{max(success_rates) - success_rates[0]:.0f}% over baseline

Key Insight:
{'Problem-informed Q can prevent degeneracy!' if max(success_rates) > 80 else 'Problem-informed Q helps but insufficient.' if max(success_rates) > success_rates[0] + 10 else 'Generic Q limitations persist even with priors.'}
"""

ax2.text(0.05, 0.95, summary_text, transform=ax2.transAxes,
        fontsize=9, verticalalignment='top', fontfamily='monospace',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.tight_layout()
plt.savefig('../problem_informed_Q_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("\n✓ Figure saved: problem_informed_Q_comparison.png")

## Part 8: Analysis and Conclusions

Let's interpret the results and draw conclusions about problem-informed Q-matrix design.

In [None]:
print("="*70)
print("ANALYSIS: Problem-Informed Q-Matrix Design")
print("="*70)
print()

baseline = success_rates[0]
best_approach = method_names[success_rates.index(max(success_rates))]
best_rate = max(success_rates)
improvement = best_rate - baseline

print(f"Baseline (standard smoothness): {baseline:.0f}%")
print(f"Best approach: {best_approach.replace(chr(10), ' ')}")
print(f"Best success rate: {best_rate:.0f}%")
print(f"Improvement: +{improvement:.0f} percentage points")
print()

# Conclusions
print("CONCLUSIONS:")
print("-"*70)
print()

if best_rate >= 80:
    print("✓✓ SUCCESS! Problem-informed Q-matrix design SOLVES degeneracy!")
    print()
    print("Key findings:")
    print(f"  1. {best_approach.replace(chr(10), ' ')} achieves {best_rate:.0f}% reliability")
    print("  2. Maintains orthogonal invariance (fixed Q form)")
    print("  3. Incorporates problem-class knowledge without breaking invariance")
    print("  4. Demonstrates that 'informed invariant' regularizers can be designed")
    print()
    print("Implications:")
    print("  → Problem-specific Q design is a viable research direction")
    print("  → Trade-off (invariance ↔ effectiveness) can be partially resolved")
    print("  → Future work: Develop Q-design principles for different problem classes")
    
elif best_rate > baseline + 15:
    print("⚠ SUBSTANTIAL IMPROVEMENT but still insufficient for reliability")
    print()
    print("Key findings:")
    print(f"  1. Problem-informed Q improves success by +{improvement:.0f}%")
    print("  2. Still falls short of 80% reliability threshold")
    print("  3. Suggests problem-informed approach is on right track")
    print("  4. May need more sophisticated Q design or additional constraints")
    print()
    print("Implications:")
    print("  → Hybrid approaches worth exploring (problem-informed Q + other methods)")
    print("  → May still need to break invariance in extreme cases")
    print("  → Future work: Combine with good initialization or post-hoc validation")
    
else:
    print("✗ MINIMAL OR NO IMPROVEMENT - Generic Q limitations persist")
    print()
    print("Key findings:")
    print(f"  1. Problem-informed Q yields only +{improvement:.0f}% improvement")
    print("  2. Fundamental limitation of fixed Q form confirmed")
    print("  3. Sum structure weakness cannot be overcome by spatial/frequency priors")
    print("  4. Invariance-degeneracy trade-off appears fundamental")
    print()
    print("Implications:")
    print("  → Must accept breaking invariance for reliable deconvolution")
    print("  → Profile-weighted or minimum-amplitude penalties necessary")
    print("  → No 'perfect' invariant regularizer exists for this problem")
    print("  → Future work: Characterize how little invariance-breaking is needed")

print()
print("="*70)
print("NEXT STEPS:")
print("="*70)
print()

if best_rate >= 80:
    print("1. Document successful Q-design principles")
    print("2. Test on varied datasets (different separations, overlaps, SNR)")
    print("3. Develop general framework for problem-class-specific Q design")
    print("4. Compare with profile-weighted approaches (invariance vs effectiveness)")
elif best_rate > baseline + 15:
    print("1. Explore more sophisticated Q designs (adapt work from here)")
    print("2. Test hybrid: problem-informed Q + good initialization")
    print("3. Investigate combining with post-hoc validation methods")
    print("4. Consider: Is 80% threshold achievable with invariant Q?")
else:
    print("1. Document that problem-informed Q insufficient for this problem")
    print("2. Formalize fundamental limitation theorem")
    print("3. Explore 'mostly invariant' approaches (minimal invariance breaking)")
    print("4. Accept profile-weighted methods as necessary for reliability")

print()
print("="*70)

## Appendix: Why These Q Designs Maintain Invariance

All approaches tested maintain the form $S(C) = \text{tr}(CQC^T)$ with fixed $Q$:

### 1. Spatially-Weighted Smoothness

$$Q = (D^2)^T W D^2$$

- $W$ is diagonal (spatial weights)
- $D^2$ is differential operator
- $Q$ is symmetric positive semi-definite
- **Fixed**: $W$ designed once based on expected peaks, doesn't change with solution

**Invariance**: For orthogonal $R$:
$$\text{tr}((RC)Q(RC)^T) = \text{tr}(RCQ C^T R^T) = \text{tr}(CQC^T R^T R) = \text{tr}(CQC^T)$$

### 2. Frequency-Domain Band-Pass

$$Q = F^T \Lambda F$$

- $F$ is DCT transform matrix (orthogonal)
- $\Lambda$ is diagonal (frequency weights)
- $Q$ is symmetric positive semi-definite
- **Fixed**: Frequency bands determined by expected peak widths, constant

**Invariance**: Same as above - $Q$ is fixed, so orthogonal transformations preserve penalty.

### 3. Combined Spatial + Ridge

$$Q = (D^2)^T W D^2 + \epsilon I$$

- Linear combination of two fixed $Q$ matrices
- Still has $\text{tr}(CQC^T)$ form
- **Fixed**: Both $W$ and $\epsilon$ set based on priors

**Invariance**: Preserved by linearity.

---

**Contrast with profile-weighted**: Would have $Q_i = \|p_i\|^2 (D^2)^T D^2$ where weights depend on optimized $P$. This breaks invariance because $Q$ changes during optimization.

---

## Summary

This notebook explored whether **problem-specific knowledge** can be incorporated into Q-matrix design to reduce degeneracy while maintaining orthogonal invariance.

**Three approaches tested**:
1. Spatially-weighted smoothness (penalize curvature in unexpected regions)
2. Frequency-domain band-pass (target expected frequency content)
3. Combined spatial + ridge (location + amplitude control)

**Key insight**: All maintain $S(C) = \text{tr}(CQC^T)$ form with **fixed** $Q$, so invariance preserved.

**Results**: [See Part 7 for experimental outcomes]

**Relation to broader research**:
- Tests Open Research Question #3 from [orthogonal_invariance_journey.md](orthogonal_invariance_journey.md)
- Explores middle ground: generic Q (ridge) vs fully adaptive (profile-weighted)
- Informs debate: Can invariance + effectiveness coexist?

**Next steps**: [See Part 8 for recommendations based on results]