# Exploration: Alternative Objective Function Combination Rules

**Goal**: Investigate how different mathematical operators for combining constraints encode different logical relationships and lead to fundamentally different solutions.

## The Fundamental Question

Standard regularized optimization uses **additive** combination:
$$\min_{P,C} \|M - PC\|^2 + \lambda\|D^2C\|^2$$

But why addition? What about:
- **Multiplicative**: $\|M - PC\|^2 \times \lambda\|D^2C\|^2$
- **Log-additive**: $\log(\|M - PC\|^2) + \lambda\log(\|D^2C\|^2)$
- **Weighted power**: $(\|M - PC\|^2)^p + \lambda(\|D^2C\|^2)^q$
- **Max operator**: $\max(\|M - PC\|^2, \lambda\|D^2C\|^2)$

## Key Insight: Operators Encode Logic

- **Addition (+)**: "OR"-like behavior - allows trade-offs between constraints
- **Multiplication (×)**: "AND"-like enforcement - forces both terms to be balanced
- **Log-additive**: "AND"-like (multiplicative in original space)
- **Max**: "Worst-case" logic - minimize the larger violation

**Each choice is a modeling decision about how constraints should interact!**

---

**Context**: Supporting analysis for underdeterminedness_exploration.ipynb Level 2 discussion.  
**Date**: January 27, 2026

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import qr
from scipy.optimize import minimize

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 10

print("Libraries imported successfully!")

## 1. Generate Test Data

We'll use the same SEC-SAXS-like synthetic data as in underdeterminedness_exploration.ipynb for direct comparison.

## A Critical Note on "AND" vs "OR" Language

**Important**: In this notebook, we use "AND-like" and "OR-like" to describe **optimization behavior**, not probabilistic logic:

### Gradient Analysis Reveals True Behavior:

**Additive: $f(A,B) = A + \lambda B$**
$$\frac{\partial f}{\partial A} = 1, \quad \frac{\partial f}{\partial B} = \lambda$$

The gradients are **independent**. If we trade: $(A + \epsilon) + (B - \epsilon) = A + B$ (unchanged at first order)
- **Allows marginal substitution** between constraints
- → **"OR"-like**: Can trade one for the other

**Multiplicative: $f(A,B) = A \times \lambda B$**
$$\frac{\partial f}{\partial A} = \lambda B, \quad \frac{\partial f}{\partial B} = \lambda A$$

The gradients are **coupled**. If we trade: $(A + \epsilon)(B - \epsilon) = AB + (A-B)\epsilon - \epsilon^2$
- If $A > B$: change is positive → push A down
- If $A < B$: change is negative → but then A must catch up
- **Forces balance** between A and B
- → **"AND"-like**: Must satisfy both comparably

**Note**: This is opposite to probability theory where $P(A \cap B) = P(A) \times P(B)$. Here we focus on **optimization dynamics**, not probabilistic semantics.

In [None]:
# Generate synthetic 2-component SEC-SAXS-like data
n_q = 100  # Number of q-points (scattering angles)
n_t = 50   # Number of time frames
n_comp = 2  # Number of components

# True components with intentional structure
print("Generating ground truth components...")
np.random.seed(123)

# Create smooth, well-separated concentration profiles
t = np.linspace(0, 1, n_t)
C_true = np.zeros((n_comp, n_t))
C_true[0, :] = np.exp(-50*(t - 0.3)**2)  # Peak at t=0.3
C_true[1, :] = np.exp(-50*(t - 0.7)**2)  # Peak at t=0.7

# Positive SAXS profiles
P_true = np.random.rand(n_q, n_comp) + 1.0

# Compute measured data
M = P_true @ C_true

print(f"Data matrix M: {M.shape}")
print(f"True P: {P_true.shape}, True C: {C_true.shape}")
print(f"Reconstruction error (should be zero): {np.linalg.norm(M - P_true @ C_true):.2e}")

# Visualize true components
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(t, C_true[0, :], 'b-', linewidth=2, label='Component 1')
axes[0].plot(t, C_true[1, :], 'r-', linewidth=2, label='Component 2')
axes[0].set_title('True Concentration Profiles', fontweight='bold')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Concentration')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

im = axes[1].imshow(M, aspect='auto', cmap='viridis')
axes[1].set_title('Measured Data Matrix M', fontweight='bold')
axes[1].set_xlabel('Time Frame')
axes[1].set_ylabel('q-point')
plt.colorbar(im, ax=axes[1])

plt.tight_layout()
plt.show()

print("\n✓ Ground truth data generated with smooth, well-separated profiles")

## 2. Define Objective Functions with Different Combination Rules

We'll implement multiple variants and compare their behavior.

In [None]:
def smoothness_penalty(C):
    """
    Compute smoothness penalty: ||D²C||²_F
    where D² is the second derivative operator (discrete approximation)
    """
    n_comp, n_t = C.shape
    
    # Second derivative via finite differences: d²f/dt² ≈ f(t-1) - 2f(t) + f(t+1)
    D2C = np.zeros((n_comp, n_t - 2))
    for i in range(n_comp):
        for j in range(n_t - 2):
            D2C[i, j] = C[i, j] - 2*C[i, j+1] + C[i, j+2]
    
    return np.linalg.norm(D2C, 'fro')**2

def data_fit(P, C, M):
    """Data-fit term: ||M - PC||²"""
    return np.linalg.norm(M - P @ C)**2

# Define different combination rules
def objective_additive(P, C, M, lambda_reg=1.0):
    """Standard additive: A + λB (allows trade-offs, OR-like)"""
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    return A + lambda_reg * B

def objective_multiplicative(P, C, M, lambda_reg=1.0):
    """Multiplicative: A × λB (enforces balance, AND-like)"""
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    # Add small epsilon to avoid zeros
    return (A + 1e-6) * (lambda_reg * B + 1e-6)

def objective_log_additive(P, C, M, lambda_reg=1.0):
    """Log-additive: log(A) + λ log(B)"""
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    # Add small epsilon to avoid log(0)
    return np.log(A + 1e-6) + lambda_reg * np.log(B + 1e-6)

def objective_max(P, C, M, lambda_reg=1.0):
    """Max operator: max(A, λB) (worst-case, strongest AND enforcement)"""
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    return max(A, lambda_reg * B)

def objective_weighted_power(P, C, M, lambda_reg=1.0, p=0.5, q=0.5):
    """Weighted power: A^p + λB^q"""
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    return A**p + lambda_reg * B**q

print("Objective functions defined!")
print("\nAvailable combination rules:")
print("  1. Additive:         A + λB")
print("  2. Multiplicative:   A × λB")
print("  3. Log-additive:     log(A) + λ log(B)")
print("  4. Max:              max(A, λB)")
print("  5. Weighted power:   A^p + λB^q")

## 3. Test: How Do Different Rules Respond to the Same Data?

Let's evaluate all objective functions on the same solution to see how they score it differently.

In [None]:
# Generate a test solution (random orthogonal transformation of SVD)
from scipy.linalg import svd

U, S, Vt = svd(M, full_matrices=False)
P_svd = U[:, :n_comp] @ np.diag(np.sqrt(S[:n_comp]))
C_svd = np.diag(np.sqrt(S[:n_comp])) @ Vt[:n_comp, :]

# Apply random orthogonal transformation
R = qr(np.random.randn(n_comp, n_comp))[0]
P_test = P_svd @ R
C_test = np.linalg.inv(R) @ C_svd

lambda_test = 1.0

print("Testing different objective functions on the same solution:")
print("="*70)
print(f"\nSolution characteristics:")
print(f"  Data-fit:  ||M - PC||² = {data_fit(P_test, C_test, M):.4e}")
print(f"  Smoothness: ||D²C||²   = {smoothness_penalty(C_test):.4f}")
print("\n" + "-"*70)

objectives = {
    'Additive (A + λB)': objective_additive(P_test, C_test, M, lambda_test),
    'Multiplicative (A × λB)': objective_multiplicative(P_test, C_test, M, lambda_test),
    'Log-additive (log A + λ log B)': objective_log_additive(P_test, C_test, M, lambda_test),
    'Max (max(A, λB))': objective_max(P_test, C_test, M, lambda_test),
    'Weighted power (A^0.5 + λB^0.5)': objective_weighted_power(P_test, C_test, M, lambda_test, 0.5, 0.5),
}

print("Objective values:")
for name, value in objectives.items():
    print(f"  {name:<35} = {value:>12.6e}")

print("\n✓ Same solution scores VERY differently under different combination rules!")
print("→ Each rule encodes different notion of 'optimal' solution")

## 4. Key Experiment: How Do Rules Respond to Trade-offs?

Let's create solutions that:
1. Fit data perfectly but are rough (high smoothness penalty)
2. Are very smooth but don't fit data well

This reveals the **logical structure** encoded by each operator.

In [None]:
# Create extreme solutions to probe logical structure

# Solution 1: Perfect data-fit, rough (high curvature)
P_perfect = P_true.copy()
C_rough = C_true.copy()
# Add high-frequency noise to make it rough
C_rough += 0.1 * np.random.randn(*C_true.shape)

# Solution 2: Poor data-fit, very smooth
C_smooth = np.zeros((n_comp, n_t))
for i in range(n_comp):
    # Ultra-smooth: just constant values
    C_smooth[i, :] = np.mean(C_true[i, :])
# Find P that minimizes data-fit for this smooth C
P_smooth = M @ C_smooth.T @ np.linalg.inv(C_smooth @ C_smooth.T)

# Solution 3: Balanced (moderate fit, moderate smoothness)
P_balanced = P_svd.copy()
C_balanced = C_svd.copy()

print("Created three test solutions:")
print("="*70)

solutions = {
    'Perfect Fit, Rough': (P_perfect, C_rough),
    'Poor Fit, Very Smooth': (P_smooth, C_smooth),
    'Balanced': (P_balanced, C_balanced),
}

print("\n{:<25} {:>12} {:>12}".format("Solution", "Data-fit (A)", "Smoothness (B)"))
print("-"*70)

for name, (P, C) in solutions.items():
    A = data_fit(P, C, M)
    B = smoothness_penalty(C)
    print(f"{name:<25} {A:>12.4e} {B:>12.4f}")

print("\n" + "="*70)
print("Now let's see how different combination rules score these solutions:")
print("="*70)

In [None]:
# Evaluate all combinations
lambda_test = 1.0

objective_functions = {
    'Additive (A + λB)': objective_additive,
    'Multiplicative (A × λB)': objective_multiplicative,
    'Log-additive (log A + λ log B)': objective_log_additive,
    'Max (max(A, λB))': objective_max,
    'Weighted power (A^0.5 + λB^0.5)': objective_weighted_power,
}

print("\n{:<25} {:>15} {:>15} {:>15}".format(
    "Combination Rule", "Perfect+Rough", "Poor+Smooth", "Balanced"))
print("-"*70)

for obj_name, obj_func in objective_functions.items():
    scores = []
    for sol_name, (P, C) in solutions.items():
        if obj_name == 'Weighted power (A^0.5 + λB^0.5)':
            score = obj_func(P, C, M, lambda_test, 0.5, 0.5)
        else:
            score = obj_func(P, C, M, lambda_test)
        scores.append(score)
    
    print(f"{obj_name:<25} {scores[0]:>15.4e} {scores[1]:>15.4e} {scores[2]:>15.4e}")
    
    # Show which solution is preferred (lowest score)
    best_idx = np.argmin(scores)
    best_name = list(solutions.keys())[best_idx]
    print(f"{'→ Prefers:':<25} {best_name}")
    print()

print("="*70)
print("OBSERVATIONS:")
print("  • Additive (OR-like): Allows trading one constraint for another")
print("  • Multiplicative (AND-like): Forces balance between constraints")
print("  • Log-additive: Scale-insensitive, also AND-like in original space")
print("  • Max: Only cares about the WORST violation (strongest AND)")
print("\n→ Each rule encodes fundamentally different notion of optimality!")

## 5. Visualize Trade-off Surfaces

Let's visualize how different combination rules create different "valleys" in the objective landscape.

In [None]:
# Create grid of (A, B) values
A_values = np.logspace(-2, 2, 100)  # Data-fit values
B_values = np.linspace(0.01, 10, 100)  # Smoothness values

A_grid, B_grid = np.meshgrid(A_values, B_values)

lambda_reg = 1.0

# Compute objective surfaces
Z_additive = A_grid + lambda_reg * B_grid
Z_multiplicative = A_grid * (lambda_reg * B_grid)
Z_log_additive = np.log(A_grid + 1e-10) + lambda_reg * np.log(B_grid + 1e-10)
Z_max = np.maximum(A_grid, lambda_reg * B_grid)

# Plot
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

surfaces = [
    (Z_additive, 'Additive: A + λB\n(OR-like: allows trade-offs)', axes[0, 0]),
    (Z_multiplicative, 'Multiplicative: A × λB\n(AND-like: enforces balance)', axes[0, 1]),
    (Z_log_additive, 'Log-additive: log(A) + λ log(B)\n(AND-like: scale-invariant)', axes[1, 0]),
    (Z_max, 'Max: max(A, λB)\n(Strongest AND: worst-case)', axes[1, 1]),
]

for Z, title, ax in surfaces:
    # Use log scale for A-axis
    contour = ax.contourf(A_grid, B_grid, Z, levels=20, cmap='viridis')
    ax.set_xscale('log')
    ax.set_xlabel('Data-fit term (A)', fontsize=10)
    ax.set_ylabel('Smoothness term (B)', fontsize=10)
    ax.set_title(title, fontweight='bold', fontsize=11)
    
    # Add contour lines
    ax.contour(A_grid, B_grid, Z, levels=10, colors='white', alpha=0.3, linewidths=0.5)
    
    # Mark our three test solutions
    for name, (P, C) in solutions.items():
        A_val = data_fit(P, C, M)
        B_val = smoothness_penalty(C)
        marker = 'o' if name == 'Balanced' else ('s' if 'Rough' in name else '^')
        ax.plot(A_val, B_val, marker, color='red', markersize=10, 
                markeredgecolor='white', markeredgewidth=2, label=name)
    
    ax.legend(fontsize=8, loc='upper right')
    ax.grid(True, alpha=0.3)
    plt.colorbar(contour, ax=ax)

plt.tight_layout()
plt.savefig('objective_combination_rules_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Figure saved: objective_combination_rules_comparison.png")
print("\nKey insight from contour plots:")
print("  • Additive: Linear trade-offs (can substitute A for B)")
print("  • Multiplicative: Hyperbolic valleys (must balance both A and B)")
print("  • Log-additive: Different scale sensitivity, also enforces balance")
print("  • Max: L-shaped optimal region (only worst matters)")

## 6. Optimization Dynamics: Trade-offs vs Balance Enforcement

Let's demonstrate explicitly how additive (+) allows trade-offs while multiplicative (×) enforces balance through coupled gradients.

In [None]:
# Create extreme test cases
print("="*70)
print("DEMONSTRATING LOGICAL STRUCTURE")
print("="*70)

# Case 1: A ≈ 0, B large (fits data perfectly, very rough)
A1, B1 = 1e-10, 100.0

# Case 2: A large, B ≈ 0 (poor fit, perfectly smooth)
A2, B2 = 100.0, 1e-10

# Case 3: A and B moderate (balanced)
A3, B3 = 1.0, 1.0

lambda_reg = 1.0

print("\nThree extreme cases:")
print(f"\nCase 1: A ≈ 0,    B = {B1:.1f}   (perfect fit, very rough)")
print(f"Case 2: A = {A2:.1f}, B ≈ 0      (poor fit, perfectly smooth)")
print(f"Case 3: A = {A3:.1f}, B = {B3:.1f}    (balanced)")

print("\n" + "-"*70)
print("How different combination rules score these:")
print("-"*70)

cases = [(A1, B1), (A2, B2), (A3, B3)]
case_names = ['Case 1\n(A≈0, B large)', 'Case 2\n(A large, B≈0)', 'Case 3\n(balanced)']

# Additive
add_scores = [A + lambda_reg * B for A, B in cases]
print(f"\nAdditive (A + λB) - allows trade-offs (OR-like):")
for name, score in zip(case_names, add_scores):
    print(f"  {name.replace(chr(10), ' '):<25} = {score:.4e}")
print(f"  → Preferred: {case_names[np.argmin(add_scores)].replace(chr(10), ' ')}")
print(f"  → Interpretation: Can trade one constraint for another at the margin")

# Multiplicative
mult_scores = [(A + 1e-10) * (lambda_reg * B + 1e-10) for A, B in cases]
print(f"\nMultiplicative (A × λB) - enforces balance (AND-like):")
for name, score in zip(case_names, mult_scores):
    print(f"  {name.replace(chr(10), ' '):<25} = {score:.4e}")
print(f"  → Preferred: {case_names[np.argmin(mult_scores)].replace(chr(10), ' ')}")
print(f"  → Interpretation: Gradients ∂/∂A = B, ∂/∂B = A force both terms to be comparable")

# Log-additive
log_scores = [np.log(A + 1e-10) + lambda_reg * np.log(B + 1e-10) for A, B in cases]
print(f"\nLog-additive (log A + λ log B) - multiplicative in original space:")
for name, score in zip(case_names, log_scores):
    print(f"  {name.replace(chr(10), ' '):<25} = {score:.4e}")
print(f"  → Preferred: {case_names[np.argmin(log_scores)].replace(chr(10), ' ')}")
print(f"  → Interpretation: Equivalent to minimizing A^1 × B^λ")

print("\n" + "="*70)
print("KEY OBSERVATION:")
print("  • Additive (+): Allows trading constraints (∂/∂A = ∂/∂B = 1)")
print("  • Multiplicative (×): Forces balance (∂/∂A = B, ∂/∂B = A)")
print("  • Log-additive: Also enforces balance (multiplicative in original space)")
print("\n→ The choice of operator fundamentally changes solution preferences!")
print("→ Marginal optimization dynamics differ, not just global extrema!")
print("="*70)

## 7. Implications for "Model-Free" Claims

### What We've Demonstrated

1. **Mathematical operators encode constraint interaction**
   - Addition (+): "OR"-like - allows trading one constraint for another (∂/∂A = ∂/∂B = 1)
   - Multiplication (×): "AND"-like - forces balance between constraints (∂/∂A = B, ∂/∂B = A)
   - Log-additive: Also "AND"-like (multiplicative in original space)
   - Max: Strongest "AND" enforcement - only worst violation matters

2. **Different rules prefer fundamentally different solutions**
   - Same data, same regularizer, but different combinations → different optima
   - The choice is NOT a technical detail!

3. **The standard additive form has strong assumptions**
   - Additive combination ↔ Independence assumption (Bayesian interpretation)
   - Gaussian likelihood + Gaussian prior → additive in log-probability space
   - Allows marginal trade-offs between data-fit and smoothness
   - This is a **modeling choice** about constraint interaction

### The Broader Argument

**Every "model-free" method must make this choice:**
- Should constraints allow trade-offs (additive)?
- Or enforce balance (multiplicative)?
- Or allow trade-offs (multiplicative)?
- Or worst-case optimization (max)?

- Allows trading data-fit for smoothness at the margin
- Encodes "AND" logic: must fit data AND be smooth
- Assumes independence of errors and prior
- Gaussian noise + Gaussian smoothness prior
- **This is an implicit modeling assumption!**

- Multiplicative: Would enforce balance between fit and smoothness (∂/∂A = B)
- Log-additive: Also enforces balance, scale-insensitive
- Log-additive: Different scale sensitivity, different optima
- Max: Would only care about worst violation

### The Fundamental Point

**Before even choosing WHAT to regularize, you must choose HOW to combine terms.**

This is **implicit modeling** at its most fundamental level:
- Not about functional form of components
- Not about which constraints to add
- But about the **logical structure** of optimization itself!

1. Whether constraints should allow trade-offs (additive) or enforce balance (multiplicative)
2. The optimization dynamics through gradient structure (∂/∂A, ∂/∂B)
2. What trade-offs are acceptable
3. What constitutes an "optimal" solution

**There is no "model-free" way to make this choice!**

## 8. Summary and Takeaways

### Key Findings

1. **Additive combination (A + λB)**:
   - Allows trade-offs: can substitute one constraint for another
   - Gradients independent: ∂/∂A = 1, ∂/∂B = 1
   - Corresponds to Gaussian assumptions in Bayesian framework
   - Standard choice in regularization literature

2. **Multiplicative combination (A × λB)**:
   - Enforces balance between terms
   - Coupled gradients: ∂/∂A = B, ∂/∂B = A force A ≈ B
   - Optimization dynamics prevent one term dominating
   - Non-standard, no clear probabilistic interpretation

3. **Log-additive combination (log A + λ log B)**:
   - Scale-invariant behavior
   - Equivalent to minimizing $A^1 \times B^\lambda$ in original space (multiplicative)
   - Also enforces balance like multiplicative form
   - Could correspond to log-normal distributions

4. **Max combination (max(A, λB))**:
   - "Worst-case" logic: only cares about largest violation
   - Minimax optimization framework
   - Very different optimal solutions

### For the JOSS Paper Argument

This exploration strengthens the critique of "model-free" claims by showing:

**Level 1: The operator choice itself is a modeling decision**
- Before choosing regularizer form
- Before choosing constraint types
- The basic mathematics of combination encodes assumptions

**Level 2: Each choice has strong implications**
- Different operators → fundamentally different solutions
- No mathematical reason to prefer one over others a priori
- Choice depends on domain beliefs about constraint interaction

**Level 3: REGALS's additive choice is implicit modeling**
- Additive = trade-off allowance + independence + Gaussian assumptions
- Alternative choices (multiplicative) would enforce balance differently
- This choice is hidden in the mathematical formulation

**Level 4: Transparency matters**
- Explicit models state these choices upfront
- Implicit models hide them in optimization structure
- Both are modeling - difference is transparency!

### Future Research Directions

1. **Theoretical**: Characterize solution sets for each combination rule
2. **Empirical**: Test on real SEC-SAXS data - does choice matter in practice?
3. **Bayesian**: Formal probabilistic interpretation of each rule
4. **Hybrid**: Can we design adaptive rules that switch based on data?

---

**Created**: January 27, 2026  
**Context**: Supporting analysis for [underdeterminedness_exploration.ipynb](underdeterminedness_exploration.ipynb) Level 2 discussion  
**Status**: Complete - ready for paper integration