# Automatic Model Selection: MP vs. GMP vs. TT-Full

This notebook demonstrates **automatic model selection** using information criteria (AIC/BIC) to choose the best Volterra model structure.

## The Model Selection Problem

Given a nonlinear system, which model should we use?
- **Memory Polynomial (MP)**: Diagonal, $O(M \cdot N)$ parameters
- **Generalized MP (GMP)**: Cross-terms, $O(M \cdot N + |\mathcal{L}|)$ parameters
- **TT-Volterra**: Full MIMO, $O(N \cdot M \cdot I \cdot r^2)$ parameters

**Challenge:** More complex models fit better but risk **overfitting**.

## Information Criteria

We use **AIC** (Akaike Information Criterion) and **BIC** (Bayesian Information Criterion) to balance fit quality and model complexity:

$$
\text{AIC} = n \log(\text{NMSE}) + 2k
$$
$$
\text{BIC} = n \log(\text{NMSE}) + k \log(n)
$$

where:
- $n$ = number of samples
- $k$ = number of parameters
- $\text{NMSE}$ = normalized mean squared error

**Lower is better**: Automatically penalizes complexity!

---

## Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

from volterra import ModelSelector
from volterra import GeneralizedMemoryPolynomial, TTVolterraMIMO

np.random.seed(789)

plt.rcParams['figure.figsize'] = (12, 4)
plt.rcParams['font.size'] = 10

---

## 1. Scenario A: System Favors Diagonal MP

Generate data from a pure diagonal Memory Polynomial system (no cross-terms).

In [None]:
# Generate diagonal MP system
fs = 48000
n_samples = 12000

x = np.random.randn(n_samples) * 0.3

# Pure diagonal: no cross-memory terms
y_nl = 0.8 * x + 0.12 * x**2 + 0.05 * x**3

# Add memory
b, a = [0.2, -0.38, 0.18], [1.0, -1.9, 0.94]
y_clean = signal.lfilter(b, a, y_nl)
y = y_clean + np.random.randn(n_samples) * 0.01

print("Scenario A: Diagonal Memory Polynomial System")
print(f"  No cross-memory terms")
print(f"  SNR: {10 * np.log10(np.mean(y_clean**2) / 0.01**2):.1f} dB")

In [None]:
# Use ModelSelector to automatically choose best model
selector_A = ModelSelector(
    memory_length=10,
    order=3,
    criterion='aic',  # Can also use 'bic' or 'nmse'
    try_diagonal_mp=True,
    try_gmp=True,
    try_tt_full=True,
    tt_ranks=[1, 2, 2, 1],
    validation_split=0.2,
    verbose=True
)

print("\nFitting models and selecting best...\n")
selector_A.fit(x, y)

print(f"\n{'='*60}")
print(f"SELECTED MODEL: {selector_A.selected_model_type}")
print(f"{'='*60}")

In [None]:
# Visualize model comparison
results_A = selector_A.results

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

model_names = [r['model_type'] for r in results_A]
nmses = [r['nmse'] for r in results_A]
aics = [r['aic'] for r in results_A]
n_params = [r['n_params'] for r in results_A]

# NMSE comparison
bars = axes[0].bar(model_names, nmses, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_A.selected_model_type)].set_color('green')
axes[0].set_ylabel('NMSE (linear scale)')
axes[0].set_title('Validation NMSE (lower is better)')
axes[0].grid(True, alpha=0.3, axis='y')

# AIC comparison
bars = axes[1].bar(model_names, aics, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_A.selected_model_type)].set_color('green')
axes[1].set_ylabel('AIC (lower is better)')
axes[1].set_title(f'AIC: Best = {selector_A.selected_model_type}')
axes[1].grid(True, alpha=0.3, axis='y')

# Parameter count
bars = axes[2].bar(model_names, n_params, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_A.selected_model_type)].set_color('green')
axes[2].set_ylabel('Number of Parameters')
axes[2].set_title('Model Complexity')
axes[2].grid(True, alpha=0.3, axis='y')

plt.suptitle('Scenario A: Diagonal MP System → Best Model Selected', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nInterpretation for Scenario A:")
if selector_A.selected_model_type == "Diagonal-MP":
    print("  ✅ Correctly identified diagonal system!")
    print("  - Diagonal-MP has fewest parameters")
    print("  - GMP and TT-Full offer no improvement (just add complexity)")
else:
    print("  ⚠️  Selected more complex model (might be noise/overfitting)")

---

## 2. Scenario B: System Favors GMP (Cross-Memory Terms)

Generate data with explicit cross-memory interactions.

In [None]:
# Generate GMP system with cross-terms
x = np.random.randn(n_samples) * 0.3
x_delayed = np.concatenate([np.zeros(3), x[:-3]])  # x(t-3)

# Cross-memory interaction!
y_nl = (
    0.8 * x +
    0.1 * x**2 +
    0.05 * x**3 +
    0.2 * x * x_delayed  # CROSS-TERM: x(t) * x(t-3)
)

y_clean = signal.lfilter(b, a, y_nl)
y = y_clean + np.random.randn(n_samples) * 0.01

print("Scenario B: GMP System with Cross-Memory Terms")
print(f"  Cross-term: x(t) * x(t-3)")
print(f"  SNR: {10 * np.log10(np.mean(y_clean**2) / 0.01**2):.1f} dB")

In [None]:
# Use ModelSelector
selector_B = ModelSelector(
    memory_length=10,
    order=3,
    criterion='aic',
    try_diagonal_mp=True,
    try_gmp=True,
    try_tt_full=True,
    tt_ranks=[1, 2, 2, 1],
    validation_split=0.2,
    verbose=True
)

print("\nFitting models and selecting best...\n")
selector_B.fit(x, y)

print(f"\n{'='*60}")
print(f"SELECTED MODEL: {selector_B.selected_model_type}")
print(f"{'='*60}")

In [None]:
# Visualize model comparison
results_B = selector_B.results

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

model_names = [r['model_type'] for r in results_B]
nmses = [r['nmse'] for r in results_B]
aics = [r['aic'] for r in results_B]
n_params = [r['n_params'] for r in results_B]

# NMSE comparison
bars = axes[0].bar(model_names, nmses, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_B.selected_model_type)].set_color('green')
axes[0].set_ylabel('NMSE (linear scale)')
axes[0].set_title('Validation NMSE (lower is better)')
axes[0].grid(True, alpha=0.3, axis='y')

# AIC comparison
bars = axes[1].bar(model_names, aics, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_B.selected_model_type)].set_color('green')
axes[1].set_ylabel('AIC (lower is better)')
axes[1].set_title(f'AIC: Best = {selector_B.selected_model_type}')
axes[1].grid(True, alpha=0.3, axis='y')

# Parameter count
bars = axes[2].bar(model_names, n_params, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_B.selected_model_type)].set_color('green')
axes[2].set_ylabel('Number of Parameters')
axes[2].set_title('Model Complexity')
axes[2].grid(True, alpha=0.3, axis='y')

plt.suptitle('Scenario B: GMP System with Cross-Terms → Best Model Selected', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nInterpretation for Scenario B:")
if selector_B.selected_model_type == "GMP":
    print("  ✅ Correctly identified cross-memory effects!")
    print("  - GMP NMSE significantly lower than Diagonal-MP")
    print("  - AIC/BIC balance: better fit justifies extra parameters")
elif selector_B.selected_model_type == "Diagonal-MP":
    print("  ⚠️  Selected diagonal despite cross-terms (might need more data or stronger signal)")
else:
    print(f"  ℹ️  Selected {selector_B.selected_model_type}")

---

## 3. Scenario C: MIMO System Favors TT-Full

Generate MIMO data with cross-input interactions.

In [None]:
# Generate 2-input, 1-output MIMO system
x1 = np.random.randn(n_samples) * 0.25
x2 = np.random.randn(n_samples) * 0.25
x_mimo = np.column_stack([x1, x2])

# MIMO system with cross-input interactions
y_nl = (
    0.6 * x1 +
    0.4 * x2 +
    0.1 * x1**2 +
    0.08 * x2**2 +
    0.15 * x1 * x2  # CROSS-INPUT INTERACTION
)

y_clean = signal.lfilter(b, a, y_nl)
y_mimo = y_clean + np.random.randn(n_samples) * 0.01

print("Scenario C: MIMO System with Cross-Input Interactions")
print(f"  2 inputs, 1 output")
print(f"  Cross-input term: x1(t) * x2(t)")
print(f"  SNR: {10 * np.log10(np.mean(y_clean**2) / 0.01**2):.1f} dB")

In [None]:
# Use ModelSelector with MIMO input
selector_C = ModelSelector(
    memory_length=8,
    order=3,
    criterion='aic',
    try_diagonal_mp=True,
    try_gmp=True,
    try_tt_full=True,
    tt_ranks=[1, 3, 3, 1],
    validation_split=0.2,
    verbose=True
)

print("\nFitting MIMO models and selecting best...\n")
selector_C.fit(x_mimo, y_mimo)

print(f"\n{'='*60}")
print(f"SELECTED MODEL: {selector_C.selected_model_type}")
print(f"{'='*60}")

In [None]:
# Visualize model comparison
results_C = selector_C.results

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

model_names = [r['model_type'] for r in results_C]
nmses = [r['nmse'] for r in results_C]
aics = [r['aic'] for r in results_C]
n_params = [r['n_params'] for r in results_C]

# NMSE comparison
bars = axes[0].bar(model_names, nmses, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_C.selected_model_type)].set_color('green')
axes[0].set_ylabel('NMSE (linear scale)')
axes[0].set_title('Validation NMSE (lower is better)')
axes[0].grid(True, alpha=0.3, axis='y')

# AIC comparison
bars = axes[1].bar(model_names, aics, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_C.selected_model_type)].set_color('green')
axes[1].set_ylabel('AIC (lower is better)')
axes[1].set_title(f'AIC: Best = {selector_C.selected_model_type}')
axes[1].grid(True, alpha=0.3, axis='y')

# Parameter count
bars = axes[2].bar(model_names, n_params, alpha=0.7, edgecolor='black')
bars[model_names.index(selector_C.selected_model_type)].set_color('green')
axes[2].set_ylabel('Number of Parameters')
axes[2].set_title('Model Complexity')
axes[2].grid(True, alpha=0.3, axis='y')

plt.suptitle('Scenario C: MIMO System → Best Model Selected', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nInterpretation for Scenario C:")
if selector_C.selected_model_type == "TT-Full":
    print("  ✅ Correctly identified need for full MIMO model!")
    print("  - TT-Full captures cross-input interactions efficiently")
    print("  - Diagonal MP/GMP treat inputs independently (underfit MIMO)")
elif selector_C.selected_model_type in ["GMP", "Diagonal-MP"]:
    print(f"  ⚠️  Selected {selector_C.selected_model_type} (TT-Full might need more data or higher rank)")
else:
    print(f"  ℹ️  Selected {selector_C.selected_model_type}")

---

## 4. AIC vs. BIC: Which Criterion to Use?

Let's compare AIC and BIC on the same system.

In [None]:
# Compare AIC vs. BIC on Scenario B (GMP system)
x = np.random.randn(n_samples) * 0.3
x_delayed = np.concatenate([np.zeros(3), x[:-3]])
y_nl = 0.8 * x + 0.1 * x**2 + 0.05 * x**3 + 0.2 * x * x_delayed
y_clean = signal.lfilter(b, a, y_nl)
y = y_clean + np.random.randn(n_samples) * 0.01

selectors = {}
for criterion in ['aic', 'bic', 'nmse']:
    selector = ModelSelector(
        memory_length=10,
        order=3,
        criterion=criterion,
        try_diagonal_mp=True,
        try_gmp=True,
        try_tt_full=True,
        tt_ranks=[1, 2, 2, 1],
        validation_split=0.2,
        verbose=False
    )
    selector.fit(x, y)
    selectors[criterion] = selector
    print(f"{criterion.upper():4s} selected: {selector.selected_model_type}")

print("\nCriterion interpretation:")
print("  - AIC: Tends to favor more complex models (better fit)")
print("  - BIC: More conservative, stronger penalty for parameters")
print("  - NMSE: No complexity penalty (risk of overfitting)")
print("\nRecommendation: Use AIC for exploration, BIC for production")

---

## 5. Using the Selected Model

Once ModelSelector chooses the best model, you can access it directly.

In [None]:
# Access the selected model from Scenario B
best_model = selector_B.best_model

print(f"Selected model type: {selector_B.selected_model_type}")
print(f"Model class: {type(best_model).__name__}")

# Use the model for prediction
x_test = np.random.randn(1000) * 0.3
x_test_delayed = np.concatenate([np.zeros(3), x_test[:-3]])
y_test_true = signal.lfilter(b, a, 0.8 * x_test + 0.1 * x_test**2 + 0.05 * x_test**3 + 0.2 * x_test * x_test_delayed)

y_test_pred = best_model.predict(x_test)

# Trim ground truth to match prediction length
M = best_model.memory_length
y_test_true_trimmed = y_test_true[M - 1:]

# Compute error
mse = np.mean((y_test_true_trimmed - y_test_pred) ** 2)
nmse_db = 10 * np.log10(mse / np.mean(y_test_true_trimmed ** 2))

print(f"\nTest set performance:")
print(f"  NMSE: {nmse_db:.2f} dB")
print(f"  MSE: {mse:.6f}")

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(y_test_true_trimmed[:500], label='True output', alpha=0.7, linewidth=1.5)
plt.plot(y_test_pred[:500], label=f'{selector_B.selected_model_type} prediction', alpha=0.7, linewidth=1.5, linestyle='--')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.title(f'Using Selected Model ({selector_B.selected_model_type}) for Prediction')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---

## Summary

In this notebook, we:

1. **Demonstrated automatic model selection** using AIC/BIC criteria
2. **Tested three scenarios**:
   - Diagonal MP system → correctly selected Diagonal-MP
   - GMP system with cross-terms → correctly selected GMP
   - MIMO system → selected appropriate model for cross-input interactions
3. **Compared selection criteria** (AIC vs. BIC vs. NMSE)
4. **Showed how to use the selected model** for prediction

### When to use ModelSelector:
- ✅ **Unknown system structure** (don't know if diagonal, GMP, or MIMO)
- ✅ **Need automatic pipeline** (no manual hyperparameter tuning)
- ✅ **Data-driven applications** (let the data decide)
- ✅ **Research/exploration** (compare multiple architectures)
- ❌ **Production with known structure** → use specific model directly (faster)
- ❌ **Very large datasets** → ModelSelector fits multiple models (computational cost)

### ModelSelector configuration tips:
1. **Validation split**: Use 0.2-0.3 for robust selection
2. **Criterion choice**:
   - `'aic'`: Balanced, good default
   - `'bic'`: More conservative (fewer false positives)
   - `'nmse'`: Only for debugging (no complexity penalty)
3. **TT ranks**: Start with `[1, 2, 2, 1]` or `[1, 3, 3, 1]`
4. **Enable/disable models**: Set `try_*=False` to skip unwanted models
5. **Verbose output**: Use `verbose=True` to see selection process

### Practical workflow:
```python
# 1. Use ModelSelector to find best architecture
selector = ModelSelector(memory_length=10, order=3, criterion='aic')
selector.fit(x_train, y_train)

# 2. Inspect results
print(f"Selected: {selector.selected_model_type}")
print(f"Results: {selector.results}")

# 3. Use the best model
y_pred = selector.best_model.predict(x_test)

# 4. (Optional) Retrain with full dataset
if selector.selected_model_type == "GMP":
    final_model = GeneralizedMemoryPolynomial(memory_length=10, order=3)
    final_model.fit(x_full, y_full)
```

### Next steps:
- **Notebook 04**: Real-world application (instrument + room pipeline)