# Generalized Memory Polynomial (GMP): Cross-Memory Terms

This notebook demonstrates **Generalized Memory Polynomial (GMP)** models, which extend the standard Memory Polynomial by adding **cross-memory interaction terms**.

## Why GMP?

Standard Memory Polynomial (MP) models use only **diagonal terms**:
$$
y(t) = \sum_{n=1}^{N} \sum_{m=0}^{M-1} h_n[m] \cdot x^n(t-m)
$$

But real systems often exhibit **cross-memory effects**, such as:
- $x(t) \cdot x(t-1)$ — interaction between current and past inputs
- $x^2(t) \cdot x(t-3)$ — mixed-order memory coupling

GMP adds these cross-terms selectively:
$$
y(t) = \underbrace{\sum_{n,m} h_n[m] \cdot x^n(t-m)}_{\text{Diagonal (MP)}} + \underbrace{\sum_{(n,m) \in \mathcal{L}} g_{n,m} \cdot x^n(t-m)}_{\text{Cross-terms (GMP)}}
$$

where $\mathcal{L}$ defines the **lag structure** (which cross-terms to include).

**Applications:**
- Power amplifier modeling (RF/microwave engineering)
- Audio distortion with intermodulation products
- Biomedical signal processing (EEG, EMG)

---

## Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

from volterra import GeneralizedMemoryPolynomial

np.random.seed(123)

plt.rcParams['figure.figsize'] = (12, 4)
plt.rcParams['font.size'] = 10

---

## 1. Generate Data with Cross-Memory Effects

We'll create a system with **explicit cross-memory interaction**:
$$
y(t) = 0.8 x(t) + 0.1 x^2(t) + \underbrace{0.15 \cdot x(t) \cdot x(t-2)}_{\text{Cross-memory!}}
$$

In [None]:
# Generate input signal
fs = 48000
duration = 0.5
n_samples = int(fs * duration)

# Bandlimited noise
x_white = np.random.randn(n_samples)
sos = signal.butter(6, [200, 6000], btype='bandpass', fs=fs, output='sos')
x = signal.sosfilt(sos, x_white)
x = x / np.std(x) * 0.3

# Create delayed version for cross-term
x_delayed = np.concatenate([np.zeros(2), x[:-2]])  # x(t-2)

# True system with cross-memory interaction
y_nonlinear = (
    0.8 * x +                    # Linear term
    0.1 * x**2 +                 # Quadratic (diagonal)
    0.15 * x * x_delayed         # CROSS-MEMORY INTERACTION
)

# Add memory via IIR filter
b = [0.2, -0.38, 0.18]
a = [1.0, -1.9, 0.94]
y_clean = signal.lfilter(b, a, y_nonlinear)

# Add noise
noise = np.random.randn(n_samples) * 0.01
y = y_clean + noise

print(f"Generated {n_samples} samples with cross-memory interaction")
print(f"Cross-term contribution: x(t) * x(t-2)")
print(f"SNR: {10 * np.log10(np.mean(y_clean**2) / np.mean(noise**2)):.1f} dB")

In [None]:
# Visualize the cross-memory effect
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Compare MP-only vs. full signal (with cross-term)
y_mp_only = signal.lfilter(b, a, 0.8 * x + 0.1 * x**2)  # Without cross-term
y_with_cross = y_clean  # With cross-term

t_ms = np.arange(2000) / fs * 1000
axes[0].plot(t_ms, y_mp_only[:2000], label='MP only (no cross-term)', alpha=0.7)
axes[0].plot(t_ms, y_with_cross[:2000], label='With cross-memory', alpha=0.7)
axes[0].set_xlabel('Time (ms)')
axes[0].set_ylabel('Amplitude')
axes[0].set_title('Impact of Cross-Memory Term')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Difference (what the cross-term adds)
cross_contribution = y_with_cross - y_mp_only
axes[1].plot(t_ms, cross_contribution[:2000], color='red', alpha=0.7)
axes[1].axhline(0, color='black', linestyle='--', linewidth=0.8)
axes[1].set_xlabel('Time (ms)')
axes[1].set_ylabel('Amplitude')
axes[1].set_title('Cross-Memory Contribution')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"RMS of cross-term contribution: {np.sqrt(np.mean(cross_contribution**2)):.4f}")
print(f"RMS of full signal: {np.sqrt(np.mean(y_with_cross**2)):.4f}")
print(f"Cross-term represents {100 * np.sqrt(np.mean(cross_contribution**2)) / np.sqrt(np.mean(y_with_cross**2)):.1f}% of signal energy")

---

## 2. Compare MP vs. GMP

Let's fit both models and see which one captures the cross-memory effect.

In [None]:
# Split data
n_train = int(0.7 * n_samples)
x_train, x_test = x[:n_train], x[n_train:]
y_train, y_test = y[:n_train], y[n_train:]

print(f"Training samples: {n_train}")
print(f"Testing samples: {len(x_test)}")

In [None]:
# Fit standard Memory Polynomial (diagonal only)
mp_model = GeneralizedMemoryPolynomial(
    memory_length=10,
    order=3,
    lags=None,  # None = diagonal MP
    lambda_reg=1e-6
)
mp_model.fit(x_train, y_train)

# Fit Generalized Memory Polynomial (with cross-terms)
# Custom lag structure: include cross-lags for order 1
gmp_lags = {
    1: [0, 1, 2, 3, 4, 5],  # Linear: include lags 0-5 (cross-terms!)
    2: [0, 1, 2],           # Quadratic: lags 0-2
    3: [0, 1]               # Cubic: lags 0-1
}

gmp_model = GeneralizedMemoryPolynomial(
    memory_length=10,
    order=3,
    lags=gmp_lags,  # Custom lag structure
    lambda_reg=1e-6
)
gmp_model.fit(x_train, y_train)

print("Model Parameters:")
print(f"  MP (diagonal):  {mp_model.coeffs_.size} parameters")
print(f"  GMP (with cross-terms): {gmp_model.coeffs_.size} parameters")
print(f"\nGMP adds {gmp_model.coeffs_.size - mp_model.coeffs_.size} extra parameters for cross-terms")

In [None]:
# Predict and evaluate
y_mp_pred = mp_model.predict(x_test)
y_gmp_pred = gmp_model.predict(x_test)

# Trim ground truth
M = 10
y_test_trimmed = y_test[M - 1:]

# Compute NMSE
def compute_nmse(y_true, y_pred):
    mse = np.mean((y_true - y_pred) ** 2)
    signal_power = np.mean(y_true ** 2)
    nmse_db = 10 * np.log10(mse / signal_power)
    return nmse_db

nmse_mp = compute_nmse(y_test_trimmed, y_mp_pred)
nmse_gmp = compute_nmse(y_test_trimmed, y_gmp_pred)

print("Test NMSE:")
print(f"  MP (diagonal):        {nmse_mp:.2f} dB")
print(f"  GMP (with cross-terms): {nmse_gmp:.2f} dB")
print(f"\nImprovement: {nmse_mp - nmse_gmp:.2f} dB")
print("\nInterpretation:")
if nmse_gmp < nmse_mp - 3:
    print("  ✅ GMP significantly outperforms MP → cross-memory effects are present!")
elif nmse_gmp < nmse_mp - 1:
    print("  ⚠️  GMP slightly better → weak cross-memory effects")
else:
    print("  ❌ No improvement → system is likely diagonal (use MP to avoid overfitting)")

In [None]:
# Visual comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 8))

n_plot = 1000
t_ms = np.arange(n_plot) / fs * 1000

# MP prediction
axes[0, 0].plot(t_ms, y_test_trimmed[:n_plot], label='True', alpha=0.7, linewidth=1.5)
axes[0, 0].plot(t_ms, y_mp_pred[:n_plot], label='MP', alpha=0.7, linewidth=1.5, linestyle='--')
axes[0, 0].set_xlabel('Time (ms)')
axes[0, 0].set_ylabel('Amplitude')
axes[0, 0].set_title(f'Memory Polynomial (NMSE: {nmse_mp:.2f} dB)')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# GMP prediction
axes[0, 1].plot(t_ms, y_test_trimmed[:n_plot], label='True', alpha=0.7, linewidth=1.5)
axes[0, 1].plot(t_ms, y_gmp_pred[:n_plot], label='GMP', alpha=0.7, linewidth=1.5, linestyle='--')
axes[0, 1].set_xlabel('Time (ms)')
axes[0, 1].set_ylabel('Amplitude')
axes[0, 1].set_title(f'Generalized MP (NMSE: {nmse_gmp:.2f} dB)')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Error comparison
error_mp = y_test_trimmed[:n_plot] - y_mp_pred[:n_plot]
error_gmp = y_test_trimmed[:n_plot] - y_gmp_pred[:n_plot]

axes[1, 0].plot(t_ms, error_mp, alpha=0.7, color='blue', label='MP error')
axes[1, 0].plot(t_ms, error_gmp, alpha=0.7, color='green', label='GMP error')
axes[1, 0].axhline(0, color='black', linestyle='--', linewidth=0.8)
axes[1, 0].set_xlabel('Time (ms)')
axes[1, 0].set_ylabel('Prediction Error')
axes[1, 0].set_title('Error Comparison')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Error distribution
axes[1, 1].hist(error_mp, bins=50, density=True, alpha=0.5, label='MP', color='blue')
axes[1, 1].hist(error_gmp, bins=50, density=True, alpha=0.5, label='GMP', color='green')
axes[1, 1].axvline(0, color='red', linestyle='--', linewidth=2)
axes[1, 1].set_xlabel('Prediction Error')
axes[1, 1].set_ylabel('Probability Density')
axes[1, 1].set_title(f'Error Distribution (MP σ={np.std(error_mp):.4f}, GMP σ={np.std(error_gmp):.4f})')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 3. Understanding GMP Lag Structures

GMP allows flexible lag structures. Let's visualize different configurations.

In [None]:
# Compare different lag structures
lag_configs = [
    {
        'name': 'Diagonal (MP)',
        'lags': None  # Default: diagonal
    },
    {
        'name': 'GMP Light',
        'lags': {1: [0, 1, 2], 2: [0, 1], 3: [0]}
    },
    {
        'name': 'GMP Medium',
        'lags': {1: [0, 1, 2, 3, 4], 2: [0, 1, 2], 3: [0, 1]}
    },
    {
        'name': 'GMP Heavy',
        'lags': {1: [0, 1, 2, 3, 4, 5, 6], 2: [0, 1, 2, 3], 3: [0, 1, 2]}
    }
]

results = []

for config in lag_configs:
    model = GeneralizedMemoryPolynomial(
        memory_length=10,
        order=3,
        lags=config['lags'],
        lambda_reg=1e-6
    )
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    
    nmse = compute_nmse(y_test_trimmed, y_pred)
    n_params = model.coeffs_.size
    
    results.append({
        'name': config['name'],
        'n_params': n_params,
        'nmse_db': nmse
    })
    
    print(f"{config['name']:15s}: {n_params:3d} params, NMSE = {nmse:6.2f} dB")

# Plot results
fig, ax = plt.subplots(figsize=(10, 6))

names = [r['name'] for r in results]
params = [r['n_params'] for r in results]
nmses = [r['nmse_db'] for r in results]

colors = ['blue', 'orange', 'green', 'red']
bars = ax.bar(names, nmses, color=colors, alpha=0.7, edgecolor='black')

# Annotate with parameter counts
for i, (bar, n_param) in enumerate(zip(bars, params)):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height - 1,
            f'{n_param} params',
            ha='center', va='top', fontsize=10, fontweight='bold')

ax.set_ylabel('Test NMSE (dB)', fontsize=12)
ax.set_title('Impact of Lag Structure on GMP Performance', fontsize=14, fontweight='bold')
ax.axhline(-20, color='green', linestyle='--', linewidth=1.5, alpha=0.5, label='Excellent threshold')
ax.grid(True, alpha=0.3, axis='y')
ax.legend()

plt.tight_layout()
plt.show()

print("\nKey insight: More lags ≠ always better!")
print("  - Too few lags: underfitting (can't capture cross-memory)")
print("  - Too many lags: overfitting risk + increased computational cost")
print("  - Use cross-validation or model selection to choose optimal structure")

---

## 4. Coefficient Analysis: What Did GMP Learn?

Let's examine the learned coefficients to understand which cross-terms are important.

In [None]:
# Analyze GMP coefficients
gmp_coeffs = gmp_model.coeffs_

print(f"GMP coefficient matrix shape: {gmp_coeffs.shape}")
print(f"Total nonzero coefficients: {np.count_nonzero(gmp_coeffs)}")

# Visualize coefficient structure
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for order_idx in range(3):
    order = order_idx + 1
    coeffs_order = gmp_coeffs[:, order_idx]
    
    # Find which lags are active (nonzero)
    active_lags = np.where(coeffs_order != 0)[0]
    active_values = coeffs_order[active_lags]
    
    axes[order_idx].stem(active_lags, active_values, basefmt=' ')
    axes[order_idx].set_xlabel('Memory Lag')
    axes[order_idx].set_ylabel(f'Coefficient h_{order}[m]')
    axes[order_idx].set_title(f'Order {order} Coefficients ({len(active_lags)} active lags)')
    axes[order_idx].grid(True, alpha=0.3)
    
    # Highlight the most important lag
    if len(active_values) > 0:
        max_idx = np.argmax(np.abs(active_values))
        max_lag = active_lags[max_idx]
        axes[order_idx].axvline(max_lag, color='red', linestyle='--', 
                                linewidth=2, alpha=0.5, label=f'Strongest: lag {max_lag}')
        axes[order_idx].legend()

plt.tight_layout()
plt.show()

# Identify dominant cross-lags
print("\nDominant cross-lags (lags with largest absolute coefficients):")
for order_idx in range(3):
    order = order_idx + 1
    coeffs_order = gmp_coeffs[:, order_idx]
    active_lags = np.where(coeffs_order != 0)[0]
    
    if len(active_lags) > 0:
        # Sort by absolute value
        sorted_idx = np.argsort(np.abs(coeffs_order[active_lags]))[::-1]
        top_lags = active_lags[sorted_idx[:3]]  # Top 3
        top_values = coeffs_order[top_lags]
        
        print(f"  Order {order}:")
        for lag, val in zip(top_lags, top_values):
            print(f"    Lag {lag}: {val:+.4f}")

---

## Summary

In this notebook, we:

1. **Generated data with cross-memory effects** (interaction between $x(t)$ and $x(t-2)$)
2. **Compared MP vs. GMP models** and showed GMP significantly outperforms MP
3. **Explored different lag structures** and their impact on performance
4. **Analyzed learned coefficients** to identify dominant cross-lags

### When to use GMP:
- ✅ **Cross-memory effects suspected** (e.g., intermodulation distortion)
- ✅ **MP underfits** (poor NMSE despite sufficient memory_length/order)
- ✅ **Domain knowledge** suggests specific lag interactions
- ❌ **No cross-memory** → stick with MP (simpler, less overfitting risk)
- ❌ **High-dimensional MIMO** → use TT-Volterra instead

### Practical tips:
1. **Start with MP** (diagonal) as baseline
2. **Add cross-lags incrementally** if MP underfits
3. **Use regularization** (`lambda_reg > 0`) to prevent overfitting
4. **Cross-validate** lag structure selection
5. **Use ModelSelector** (Notebook 03) for automatic MP vs. GMP selection

### Next steps:
- **Notebook 02**: Tensor-Train Volterra for full MIMO systems
- **Notebook 03**: Automatic model selection (MP vs GMP vs TT-Full)