# Zhang 2025 for Denoising: Fixed-Rank Comparison

**Question**: At analyst-chosen rank k, can Zhang 2025's gradient-guided approach provide better denoising than traditional SVD?

## Context: Molass Philosophy

In Molass, the analyst determines the denoising rank based on:
- SEC elution profile expectations
- Domain knowledge of sample composition
- Data quality assessment

**This notebook tests**: Given fixed rank k, does joint optimization improve denoising quality compared to SVD?

## Test Data

We use **real SEC-SAXS data** from molass_data tutorial samples (Photon Factory, KEK, Japan).

## Denoising Approaches

### Traditional SVD (Rank-k Truncation)
$$D_{\text{clean}} = U_k \Sigma_k V_k^T$$
- Minimizes: $\|D - D_{\text{clean}}\|_F^2$
- Goal: Best rank-k reconstruction

### Zhang 2025 (Joint Optimization)
$$D_{\text{clean}} = L \cdot R^T \quad \text{with rank}(L \cdot R^T) = k$$
- Minimizes: $\|D - L \cdot R^T\|_F^2 + \lambda \cdot \text{Objective}(L \cdot R^T)$
- Goal: Reconstruction + preserve important features

## Simple Objective Functions

For this exploration, we use simple metrics:
1. **Smoothness**: Prefer smooth elution profiles
2. **Peak preservation**: Maintain peak structure
3. **Spectral consistency**: Similar spectra shouldn't diverge

## Setup: Import Libraries and Load Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import svd
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

# Load real SEC-SAXS data
from molass_data import SAMPLE1, SAMPLE2
from molass.DataObjects import SecSaxsData as SSD

print("Loading real SEC-SAXS data...")
ssd = SSD(SAMPLE1)
D_noisy = ssd.xr.M  # Real data matrix (frames × q-points)

print(f"Data shape: {D_noisy.shape}")
print(f"Data type: Real synchrotron SEC-SAXS from Photon Factory, KEK")

## Step 1: Analyst Chooses Denoising Rank

**Analyst decision**: Based on SEC profile and domain knowledge, choose rank k for denoising.

In [None]:
# Analyst chooses rank based on inspection
k_denoise = 5  # Example: expecting ~2-3 species + baseline + noise structure

# Visualize singular value spectrum to validate choice
U_full, s_full, Vt_full = svd(D_noisy, full_matrices=False)

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(s_full[:20], 'o-', linewidth=2, markersize=8)
plt.axvline(k_denoise - 0.5, color='red', linestyle='--', linewidth=2, label=f'Chosen rank k={k_denoise}')
plt.xlabel('Singular Value Index')
plt.ylabel('Magnitude')
plt.title('Singular Value Spectrum')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.semilogy(s_full[:20], 'o-', linewidth=2, markersize=8)
plt.axvline(k_denoise - 0.5, color='red', linestyle='--', linewidth=2, label=f'Chosen rank k={k_denoise}')
plt.xlabel('Singular Value Index')
plt.ylabel('Magnitude (log scale)')
plt.title('Singular Value Spectrum (Log Scale)')
plt.legend()
plt.grid(True, alpha=0.3, which='both')

plt.tight_layout()
plt.show()

print(f"✓ Analyst chooses k={k_denoise} for denoising")
print(f"  First {k_denoise} singular values: {s_full[:k_denoise].round(2)}")
print(f"  Ratio σ_{k_denoise}/σ_{k_denoise+1} = {s_full[k_denoise-1]/s_full[k_denoise]:.2f}")

## Step 2: Traditional SVD Denoising

Standard approach: Keep top k singular values, discard the rest.

In [None]:
def svd_denoise(D, k):
    """Traditional SVD denoising: truncate to rank k"""
    U, s, Vt = svd(D, full_matrices=False)
    
    # Reconstruct with only top k components
    D_denoised = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    
    reconstruction_error = np.linalg.norm(D - D_denoised, 'fro')
    
    return D_denoised, reconstruction_error

# Apply traditional SVD denoising
D_svd, error_svd = svd_denoise(D_noisy, k_denoise)

print(f"Traditional SVD Denoising (k={k_denoise}):")
print(f"  Reconstruction error: {error_svd:.2f}")
print(f"  Relative error: {error_svd / np.linalg.norm(D_noisy, 'fro'):.2%}")

## Step 3: Define Simple Objective Function

For demonstration, use **elution profile smoothness** as the downstream objective.

Rationale: In SEC-SAXS, elution profiles should be smooth (Gaussian-like peaks), not jagged.

In [None]:
def smoothness_objective(D):
    """
    Measure total variation in elution profiles (sum over q-points)
    Lower = smoother profiles = better
    """
    # Total scattering at each frame
    elution_profile = D.sum(axis=1)
    
    # Second derivative (roughness)
    d2 = np.diff(elution_profile, n=2)
    roughness = np.sum(d2 ** 2)
    
    return roughness

def compute_gradient_smoothness(D):
    """
    Gradient of smoothness objective with respect to D
    """
    elution_profile = D.sum(axis=1)
    
    # Gradient of second-order difference
    n = len(elution_profile)
    gradient_profile = np.zeros(n)
    
    # Approximate gradient using finite differences
    d2 = np.diff(elution_profile, n=2)
    
    for i in range(n):
        if 2 <= i < n:
            gradient_profile[i] += 2 * d2[i-2]
        if 1 <= i < n-1:
            gradient_profile[i] -= 4 * d2[i-1]
        if i < n-2:
            gradient_profile[i] += 2 * d2[i]
    
    # Broadcast to full matrix (each q-point contributes equally to sum)
    gradient = gradient_profile[:, np.newaxis] @ np.ones((1, D.shape[1]))
    
    return gradient

# Test on original data
roughness_original = smoothness_objective(D_noisy)
roughness_svd = smoothness_objective(D_svd)

print("Elution Profile Smoothness:")
print(f"  Original data roughness: {roughness_original:.2e}")
print(f"  SVD denoised roughness:  {roughness_svd:.2e}")
print(f"  Improvement: {(1 - roughness_svd/roughness_original)*100:.1f}%")

## Step 4: Zhang 2025 Joint Optimization

Optimize rank-k denoising to minimize BOTH reconstruction error AND roughness.

In [None]:
def zhang_denoise(D, k, lambda_smooth=0.01, n_iterations=50, lr=0.01):
    """
    Zhang 2025 joint optimization denoising
    Minimize: ||D - L·R^T||² + λ·Smoothness(L·R^T)
    """
    n_frames, n_q = D.shape
    
    # Initialize with SVD
    U, s, Vt = svd(D, full_matrices=False)
    L = U[:, :k] @ np.diag(np.sqrt(s[:k]))
    R = Vt[:k, :].T @ np.diag(np.sqrt(s[:k]))
    
    # Compute gradient of smoothness objective
    gradient_smooth = compute_gradient_smoothness(D)
    
    history = {
        'reconstruction_error': [],
        'smoothness': [],
        'total_loss': []
    }
    
    for iteration in range(n_iterations):
        D_current = L @ R.T
        
        # Current metrics
        recon_error = np.linalg.norm(D - D_current, 'fro') ** 2
        smoothness = smoothness_objective(D_current)
        total_loss = recon_error + lambda_smooth * smoothness
        
        history['reconstruction_error'].append(recon_error)
        history['smoothness'].append(smoothness)
        history['total_loss'].append(total_loss)
        
        # Gradient: reconstruction + smoothness
        grad_recon = -2 * (D - D_current)
        grad_smooth = compute_gradient_smoothness(D_current)
        gradient_total = grad_recon + lambda_smooth * grad_smooth
        
        # Update factors (projected gradient descent)
        delta_W = -lr * gradient_total
        
        # Project onto low-rank
        R_norm = R.T @ R + 1e-8 * np.eye(k)
        L_norm = L.T @ L + 1e-8 * np.eye(k)
        
        L = L + lr * (delta_W @ R) @ np.linalg.inv(R_norm)
        R = R + lr * (delta_W.T @ L) @ np.linalg.inv(L_norm)
        
        # Periodically reproject to maintain reconstruction accuracy
        if iteration % 10 == 0:
            D_current = L @ R.T
            error = D - D_current
            L = L + 0.1 * error @ R @ np.linalg.pinv(R.T @ R)
    
    D_final = L @ R.T
    final_recon_error = np.linalg.norm(D - D_final, 'fro')
    final_smoothness = smoothness_objective(D_final)
    
    return D_final, final_recon_error, final_smoothness, history

# Apply Zhang 2025 denoising
print("Running Zhang 2025 joint optimization...")
D_zhang, error_zhang, smoothness_zhang, history = zhang_denoise(
    D_noisy, k_denoise, lambda_smooth=0.01, n_iterations=50, lr=0.01
)

print(f"\nZhang 2025 Joint Optimization (k={k_denoise}):")
print(f"  Reconstruction error: {error_zhang:.2f}")
print(f"  Relative error: {error_zhang / np.linalg.norm(D_noisy, 'fro'):.2%}")
print(f"  Final smoothness: {smoothness_zhang:.2e}")

## Step 5: Comparison

In [None]:
print("=" * 70)
print("COMPARISON: SVD vs Zhang 2025 (Fixed Rank k={})".format(k_denoise))
print("=" * 70)
print()
print("Reconstruction Error (Frobenius Norm):")
print(f"  SVD:    {error_svd:.2f}")
print(f"  Zhang:  {error_zhang:.2f}")
print(f"  Difference: {error_zhang - error_svd:+.2f} ({(error_zhang/error_svd - 1)*100:+.1f}%)")
print()
print("Smoothness (Elution Profile Roughness):")
print(f"  Original: {roughness_original:.2e}")
print(f"  SVD:      {roughness_svd:.2e}")
print(f"  Zhang:    {smoothness_zhang:.2e}")
print()
print("Smoothness Improvement:")
print(f"  SVD:    {(1 - roughness_svd/roughness_original)*100:.1f}% smoother than original")
print(f"  Zhang:  {(1 - smoothness_zhang/roughness_original)*100:.1f}% smoother than original")
print(f"  Zhang vs SVD: {(1 - smoothness_zhang/roughness_svd)*100:+.1f}%")
print()
print("=" * 70)

# Determine winner
if smoothness_zhang < roughness_svd and error_zhang < error_svd * 1.1:
    print("✓ Zhang 2025 wins: Better smoothness AND similar reconstruction")
elif smoothness_zhang < roughness_svd:
    print("⚠ Zhang 2025 mixed: Better smoothness but worse reconstruction")
else:
    print("✗ SVD wins: Zhang 2025 doesn't improve over traditional SVD")

## Step 6: Visual Comparison

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# Row 1: Elution profiles
elution_original = D_noisy.sum(axis=1)
elution_svd = D_svd.sum(axis=1)
elution_zhang = D_zhang.sum(axis=1)

axes[0, 0].plot(elution_original, 'b-', linewidth=1.5, alpha=0.7, label='Original')
axes[0, 0].set_title('Original Data\nElution Profile')
axes[0, 0].set_xlabel('Frame')
axes[0, 0].set_ylabel('Total Intensity')
axes[0, 0].grid(True, alpha=0.3)

axes[0, 1].plot(elution_svd, 'g-', linewidth=2, alpha=0.8, label='SVD')
axes[0, 1].plot(elution_original, 'b-', linewidth=0.5, alpha=0.3, label='Original')
axes[0, 1].set_title(f'SVD (k={k_denoise})\nElution Profile')
axes[0, 1].set_xlabel('Frame')
axes[0, 1].set_ylabel('Total Intensity')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

axes[0, 2].plot(elution_zhang, 'r-', linewidth=2, alpha=0.8, label='Zhang')
axes[0, 2].plot(elution_original, 'b-', linewidth=0.5, alpha=0.3, label='Original')
axes[0, 2].set_title(f'Zhang 2025 (k={k_denoise})\nElution Profile')
axes[0, 2].set_xlabel('Frame')
axes[0, 2].set_ylabel('Total Intensity')
axes[0, 2].legend()
axes[0, 2].grid(True, alpha=0.3)

# Row 2: Heatmaps
vmin, vmax = D_noisy.min(), D_noisy.max()

im1 = axes[1, 0].imshow(D_noisy.T, aspect='auto', cmap='viridis', vmin=vmin, vmax=vmax)
axes[1, 0].set_title('Original Data Matrix')
axes[1, 0].set_xlabel('Frame')
axes[1, 0].set_ylabel('q-point')
plt.colorbar(im1, ax=axes[1, 0])

im2 = axes[1, 1].imshow(D_svd.T, aspect='auto', cmap='viridis', vmin=vmin, vmax=vmax)
axes[1, 1].set_title(f'SVD Denoised (k={k_denoise})')
axes[1, 1].set_xlabel('Frame')
axes[1, 1].set_ylabel('q-point')
plt.colorbar(im2, ax=axes[1, 1])

im3 = axes[1, 2].imshow(D_zhang.T, aspect='auto', cmap='viridis', vmin=vmin, vmax=vmax)
axes[1, 2].set_title(f'Zhang Denoised (k={k_denoise})')
axes[1, 2].set_xlabel('Frame')
axes[1, 2].set_ylabel('q-point')
plt.colorbar(im3, ax=axes[1, 2])

plt.tight_layout()
plt.show()

## Step 7: Optimization Trajectory

Track how Zhang 2025 balances reconstruction and smoothness during optimization.

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

iterations = range(len(history['reconstruction_error']))

# Plot 1: Reconstruction error
axes[0].plot(iterations, history['reconstruction_error'], 'b-', linewidth=2)
axes[0].axhline(error_svd**2, color='green', linestyle='--', linewidth=2, label='SVD baseline')
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Reconstruction Error²')
axes[0].set_title('Reconstruction Error Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Smoothness
axes[1].plot(iterations, history['smoothness'], 'r-', linewidth=2)
axes[1].axhline(roughness_svd, color='green', linestyle='--', linewidth=2, label='SVD baseline')
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Roughness')
axes[1].set_title('Elution Profile Smoothness')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Plot 3: Total loss
axes[2].plot(iterations, history['total_loss'], 'purple', linewidth=2)
axes[2].set_xlabel('Iteration')
axes[2].set_ylabel('Total Loss')
axes[2].set_title('Joint Optimization Objective')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("→ Zhang 2025 iteratively trades off reconstruction vs smoothness")

## Conclusion

### Key Findings

**At fixed analyst-chosen rank k:**

1. **Both methods are rank-k approximations** - infinitely many exist
2. **SVD chooses**: Minimum Frobenius norm reconstruction
3. **Zhang 2025 chooses**: Balance reconstruction + downstream objective

### Does Zhang 2025 Help?

**Result depends on**:
- Whether your downstream objective differs from reconstruction error
- How much you weight the downstream objective (λ parameter)
- Whether the objective captures what matters for your analysis

### For Molass Context

**Pros**:
- Could preserve peak structure better than SVD
- Optimizes for what you care about (smoothness, physical plausibility)
- Same rank k as analyst would choose anyway

**Cons**:
- Requires defining explicit objective function
- More computational cost than SVD
- Benefit may be modest for already-good SVD denoising

### Analyst Perspective

Your philosophy: **"Analyst determines rank based on expertise"** ✓

Zhang 2025 respects this:
- Analyst still chooses k
- Algorithm just finds a better rank-k approximation
- Not about automation, about optimization quality

**Practical value**: Probably **modest** - SVD is already quite good at denoising. Main value is conceptual: joint optimization is theoretically superior.

# Appendix: Comparison to REGALS Iterative Optimization

## Similarity: Both Start from SVD and Iterate

### REGALS (Alternating Least Squares)
**Initialization**: SVD → EFA windows → Initial P, C

**Iteration** (alternating):
```
while not converged:
    # Fix C, optimize P
    P ← argmin ||M - PC||² + λ_P·||D²P||²
         subject to: P ≥ 0, P(q) ↔ P(r) with d_max
    
    # Fix P, optimize C  
    C ← argmin ||M - PC||² + λ_C·||D²C||²
         subject to: C ≥ 0, compact support windows
```

**Key features**:
- Alternating Least Squares (ALS): Optimize one factor at a time
- Closed-form solutions when fixing one factor
- Regularization: Smoothness on both P and C
- Hard constraints: Non-negativity, compact support, real-space SAXS

---

### Zhang 2025 (Gradient Descent on Both Factors)
**Initialization**: SVD → L = U_k·√Σ_k, R = V_k^T·√Σ_k

**Iteration** (joint):
```
while not converged:
    D_current ← L·R^T
    
    # Compute joint gradient
    grad_recon ← -2·(D - D_current)
    grad_obj ← ∇Objective(D_current)
    gradient_total ← grad_recon + λ·grad_obj
    
    # Update both factors simultaneously
    L ← L + lr·(gradient_total·R)·(R^T·R)^(-1)
    R ← R + lr·(gradient_total^T·L)·(L^T·L)^(-1)
```

**Key features**:
- Simultaneous gradient descent: Update both factors together
- Projected gradient: Project onto low-rank manifold
- Single objective: Reconstruction + downstream task
- No hard constraints in this implementation (could add)

---

## Key Differences

| Aspect | REGALS | Zhang 2025 (This Notebook) |
|--------|--------|---------------------------|
| **Update strategy** | Alternating (one at a time) | Simultaneous (both together) |
| **Subproblems** | Closed-form least squares | Gradient descent steps |
| **Constraints** | Hard (non-negativity, support windows) | Soft (regularization only) |
| **Regularization** | Smoothness on both P and C | Single downstream objective |
| **Initialization** | EFA windows → P, C | SVD → L, R |
| **Convergence** | Guaranteed (convex subproblems) | Local minima possible |
| **Philosophy** | Constraint-based emergence | Gradient-guided optimization |

---

## Why This Comparison Matters for Molass

**REGALS context**: Two-stage workflow
1. Stage 1: EFA determines components and windows
2. Stage 2: ALS refines within windows (above iteration)

**Molass context**: Two-stage workflow  
1. Stage 1: SVD denoising
2. Stage 2: Parametric fitting (EGH/SDM/EDM)

**Zhang 2025 insight**: Joint optimization of Stage 1 + Stage 2 is theoretically superior to sequential optimization

**Both REGALS and Molass have two-stage separation** where Stage 2 never sees original noisy data:
- REGALS: EFA windows → ALS on windowed data
- Molass: SVD denoising → Parametric fitting on denoised data

**Question for Molass**: Should denoising objective be:
- Generic smoothness (current notebook)?
- Parametric fit quality (how well EGH/SDM/EDM fits)?
- Something else?

---

## Iterative Optimization Comparison

Both methods share:
- ✓ Start from SVD initialization
- ✓ Iterative refinement toward better solution
- ✓ Balance reconstruction vs objective/constraints
- ✓ Maintain rank k throughout

Key philosophical difference:
- **REGALS**: Constraints define what's valid → optimization finds best valid solution
- **Zhang 2025**: Objective defines what's good → optimization finds best solution
- **Molass**: Parametric form defines structure → optimization finds best parameters

All three make modeling choices—the difference is **where** and **how explicitly**.