# Missing Value Recovery with Fourier Ratio and L1 Minimization

This notebook demonstrates how to recover missing values in signals using the Fourier Ratio framework and compressed sensing techniques from the Talagrand constant paper.

## Theory Overview

**Theorem 1.20** states that if a signal has small Fourier Ratio FR, we can recover it from a small number of observations using L1 minimization.

**Key Formula:**
Number of observations needed:
$$q = C \times \frac{FR^2}{\epsilon^2} \times \log^2\left(\frac{FR}{\epsilon}\right) \times \log(N)$$

**Recovery Guarantee:**
$$\|x^* - f\|_2 \leq 11.47 \times \|f\|_2 \times \epsilon$$

where $x^*$ is the recovered signal and $f$ is the true signal.

**Method:**
1. Represent signal in DCT (Discrete Cosine Transform) basis
2. Solve L1 minimization: $\min \|c\|_1$ subject to $Ac = y$ (observed values)
3. Reconstruct signal from recovered coefficients

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
import matplotlib.pyplot as plt

# Import our modules
from src.fourier_core import fourier_ratio
from src.imputation import (
    mask_observations,
    compute_q,
    build_dct_basis,
    recover_l1_via_lp,
    check_theorem_bound
)
from src.signal_utils import sample_signal, plot_reconstruction, original_signal

## 1. Generate Test Signal and Simulate Missing Data

In [None]:
# Signal parameters
sr = 32  # sampling rate (Hz)
seconds = 5.0  # duration
N = int(sr * seconds)

# Generate signal
t, f_full = sample_signal(sr, seconds)

# Compute Fourier Ratio
FR = fourier_ratio(f_full)
print(f"Signal length N = {N}")
print(f"Fourier Ratio FR = {FR:.4f}")
print(f"Interpretation: {'Low complexity (good for recovery)' if FR < 5 else 'High complexity'}")

# Plot complete signal
plt.figure(figsize=(12, 4))
plt.plot(t, f_full, 'b-', label='Complete signal')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Original Complete Signal')
plt.grid(True)
plt.legend()
plt.show()

In [None]:
# Simulate missing data
keep_prob = 0.7  # Keep 70% of observations (30% missing)
seed = 0  # for reproducibility

mask, f_obs = mask_observations(f_full, keep_prob=keep_prob, seed=seed)

num_observed = np.sum(mask)
num_missing = np.sum(~mask)
missing_rate = num_missing / N * 100

print(f"Total samples: {N}")
print(f"Observed: {num_observed} ({100-missing_rate:.1f}%)")
print(f"Missing: {num_missing} ({missing_rate:.1f}%)")

# Visualize observed vs missing
plt.figure(figsize=(12, 4))
plt.plot(t, f_full, 'gray', alpha=0.3, label='True signal (hidden)')
plt.scatter(t[mask], f_obs[mask], c='blue', s=15, label='Observed', zorder=3)
plt.scatter(t[~mask], f_full[~mask], c='red', s=15, label='Missing', alpha=0.5, zorder=2)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title(f'Signal with {missing_rate:.1f}% Missing Values')
plt.legend()
plt.grid(True)
plt.show()

## 2. Compute Required Observations for Recovery

Based on Theorem 1.20, we calculate how many observations are needed.

In [None]:
# Recovery parameters
eps = 0.1  # desired recovery accuracy
C = 1.0  # universal constant multiplier

valid_idx = np.where(mask)[0]
q = compute_q(FR=FR, eps=eps, N=N, C=C, max_available=len(valid_idx))

print(f"Recovery parameters:")
print(f"  ε (accuracy) = {eps}")
print(f"  C (constant) = {C}")
print(f"  FR = {FR:.4f}")
print(f"\nTheoretical observations needed: q = {q}")
print(f"Available observations: {len(valid_idx)}")
print(f"Formula: q = C × (FR²/ε²) × log²(FR/ε) × log(N)")
print(f"       = {C} × ({FR:.2f}²/{eps}²) × log²({FR:.2f}/{eps}) × log({N})")

## 3. Recover Missing Values using L1 Minimization

In [None]:
# Select q observations randomly
np.random.seed(seed)
obs_idx = np.random.choice(valid_idx, q, replace=False)
y = f_obs[obs_idx]

print(f"Using {len(y)} observations for recovery")

# Build DCT basis
B = build_dct_basis(N)

# Create measurement matrix (only rows corresponding to observed indices)
A = B[obs_idx, :]

print(f"Measurement matrix A shape: {A.shape}")
print(f"Solving: min ||c||₁ subject to Ac = y")

# Solve L1 minimization
c_rec = recover_l1_via_lp(A, y)

# Reconstruct signal from coefficients
f_rec = B @ c_rec

print(f"✓ Recovery complete")

## 4. Analyze Recovery Quality

In [None]:
# Compute errors
rel_err_full = np.linalg.norm(f_rec - f_full) / np.linalg.norm(f_full)
rel_err_missing = np.linalg.norm(f_rec[~mask] - f_full[~mask]) / np.linalg.norm(f_full[~mask])

print("Recovery Errors:")
print(f"  Relative error (full signal):    {rel_err_full:.6f}")
print(f"  Relative error (missing points): {rel_err_missing:.6f}")

# Check Theorem 1.20 bound
lhs, rhs, ratio, ok = check_theorem_bound(f_full, f_rec, eps)

print("\n=== Theorem 1.20 Bound Verification ===")
print(f"||x* - f||₂         = {lhs:.6f}")
print(f"11.47 × ||f||₂ × ε  = {rhs:.6f}")
print(f"Ratio (lhs/rhs)     = {ratio:.4f}")
print(f"\nBound holds: {'✅ Yes' if ok else '❌ No'}")
if ok:
    print("The recovery satisfies the theoretical guarantee!")
else:
    print("Try adjusting ε, C, or keep_prob parameters.")

## 5. Visualize Recovery Results

In [None]:
plot_reconstruction(t, f_full, mask, f_obs, f_rec, seconds)

In [None]:
# Detailed view of reconstruction at missing points
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Full view
axes[0].plot(t, f_full, 'k-', alpha=0.5, label='True signal')
axes[0].scatter(t[mask], f_obs[mask], c='b', s=10, label='Observed', zorder=3)
axes[0].plot(t[~mask], f_rec[~mask], 'ro', markersize=4, label='Recovered (missing)', zorder=4)
axes[0].set_xlabel('Time (s)')
axes[0].set_ylabel('Amplitude')
axes[0].set_title('Recovery: Observed vs Recovered Points')
axes[0].legend()
axes[0].grid(True)

# Error view
error_per_point = np.abs(f_rec - f_full)
axes[1].plot(t, error_per_point, 'r-', alpha=0.5, label='Absolute error')
axes[1].scatter(t[~mask], error_per_point[~mask], c='red', s=15, label='Error at missing points')
axes[1].axhline(y=np.mean(error_per_point[~mask]), color='blue', linestyle='--', 
                label=f'Mean error (missing): {np.mean(error_per_point[~mask]):.4f}')
axes[1].set_xlabel('Time (s)')
axes[1].set_ylabel('Absolute Error')
axes[1].set_title('Point-wise Recovery Error')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

## 6. Parameter Sensitivity Analysis

Let's explore how different parameters affect recovery quality.

### 6.1 Effect of Missing Rate

In [None]:
# Test different missing rates
keep_probs = [0.9, 0.8, 0.7, 0.6, 0.5]
results_missing = []

for kp in keep_probs:
    mask_test, f_obs_test = mask_observations(f_full, keep_prob=kp, seed=seed)
    valid_idx_test = np.where(mask_test)[0]
    
    q_test = compute_q(FR=FR, eps=eps, N=N, C=C, max_available=len(valid_idx_test))
    
    if q_test <= len(valid_idx_test):
        obs_idx_test = np.random.choice(valid_idx_test, q_test, replace=False)
        y_test = f_obs_test[obs_idx_test]
        
        A_test = B[obs_idx_test, :]
        c_rec_test = recover_l1_via_lp(A_test, y_test)
        f_rec_test = B @ c_rec_test
        
        rel_err_test = np.linalg.norm(f_rec_test - f_full) / np.linalg.norm(f_full)
        
        results_missing.append({
            'keep_prob': kp,
            'missing_rate': (1 - kp) * 100,
            'q': q_test,
            'rel_error': rel_err_test
        })

# Display results
print("Missing Rate\tObservations Used (q)\tRelative Error")
print("="*60)
for r in results_missing:
    print(f"{r['missing_rate']:.0f}%\t\t{r['q']}\t\t\t{r['rel_error']:.6f}")

# Plot
plt.figure(figsize=(10, 5))
plt.plot([r['missing_rate'] for r in results_missing], 
         [r['rel_error'] for r in results_missing], 'o-', markersize=8)
plt.xlabel('Missing Rate (%)')
plt.ylabel('Relative Recovery Error')
plt.title('Recovery Quality vs Missing Rate')
plt.grid(True)
plt.show()

### 6.2 Effect of Recovery Accuracy Parameter ε

In [None]:
# Test different epsilon values
epsilon_values = [0.5, 0.3, 0.2, 0.1, 0.05]
results_eps = []

# Use fixed missing rate
keep_prob_fixed = 0.7
mask_fixed, f_obs_fixed = mask_observations(f_full, keep_prob=keep_prob_fixed, seed=seed)
valid_idx_fixed = np.where(mask_fixed)[0]

for eps_test in epsilon_values:
    q_test = compute_q(FR=FR, eps=eps_test, N=N, C=C, max_available=len(valid_idx_fixed))
    
    if q_test <= len(valid_idx_fixed):
        obs_idx_test = np.random.choice(valid_idx_fixed, q_test, replace=False)
        y_test = f_obs_fixed[obs_idx_test]
        
        A_test = B[obs_idx_test, :]
        c_rec_test = recover_l1_via_lp(A_test, y_test)
        f_rec_test = B @ c_rec_test
        
        rel_err_test = np.linalg.norm(f_rec_test - f_full) / np.linalg.norm(f_full)
        lhs_test, rhs_test, ratio_test, ok_test = check_theorem_bound(f_full, f_rec_test, eps_test)
        
        results_eps.append({
            'eps': eps_test,
            'q': q_test,
            'rel_error': rel_err_test,
            'bound_holds': ok_test
        })

# Display results
print(f"Fixed missing rate: {(1-keep_prob_fixed)*100:.0f}%\n")
print("ε\tObservations (q)\tRel Error\tBound Holds")
print("="*60)
for r in results_eps:
    print(f"{r['eps']:.2f}\t{r['q']}\t\t\t{r['rel_error']:.6f}\t{'✅' if r['bound_holds'] else '❌'}")

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Observations vs epsilon
ax1.plot([r['eps'] for r in results_eps], [r['q'] for r in results_eps], 'o-', markersize=8)
ax1.set_xlabel('ε (accuracy parameter)')
ax1.set_ylabel('Required observations q')
ax1.set_title('Observations Needed vs Accuracy')
ax1.grid(True)
ax1.invert_xaxis()

# Error vs epsilon
ax2.plot([r['eps'] for r in results_eps], [r['rel_error'] for r in results_eps], 'o-', markersize=8)
ax2.set_xlabel('ε (accuracy parameter)')
ax2.set_ylabel('Achieved relative error')
ax2.set_title('Recovery Error vs Accuracy Parameter')
ax2.grid(True)
ax2.invert_xaxis()

plt.tight_layout()
plt.show()

## 7. Comparison with Naive Interpolation

Let's compare L1 recovery with simple linear interpolation.

In [None]:
# Linear interpolation for missing values
from scipy.interpolate import interp1d

# Use only observed points for interpolation
f_interp_func = interp1d(t[mask], f_obs[mask], kind='linear', fill_value='extrapolate')
f_interp = f_interp_func(t)

# Compute errors
err_l1 = np.linalg.norm(f_rec - f_full) / np.linalg.norm(f_full)
err_interp = np.linalg.norm(f_interp - f_full) / np.linalg.norm(f_full)

print("Method Comparison:")
print(f"L1 minimization (DCT):    {err_l1:.6f}")
print(f"Linear interpolation:     {err_interp:.6f}")
print(f"\nL1 minimization is {err_interp/err_l1:.2f}x better")

# Plot comparison
plt.figure(figsize=(12, 6))
plt.plot(t, f_full, 'k-', alpha=0.4, label='True signal', linewidth=2)
plt.scatter(t[mask], f_obs[mask], c='blue', s=15, label='Observed', zorder=3)
plt.plot(t, f_rec, 'r--', label=f'L1 recovery (err={err_l1:.4f})', linewidth=2)
plt.plot(t, f_interp, 'g-.', label=f'Linear interp (err={err_interp:.4f})', linewidth=2)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Comparison: L1 Minimization vs Linear Interpolation')
plt.legend()
plt.grid(True)
plt.show()

## Summary

This notebook demonstrated:

1. **Missing value simulation** - creating realistic scenarios with missing data
2. **Theoretical analysis** - computing required observations based on Fourier Ratio
3. **L1 minimization recovery** - using compressed sensing to recover missing values
4. **Theorem verification** - checking Theorem 1.20 recovery bounds
5. **Parameter sensitivity** - analyzing effects of missing rate and accuracy parameter
6. **Method comparison** - L1 minimization vs simple interpolation

**Key Parameters for Imputation:**

| Parameter | Description | Typical Values |
|-----------|-------------|----------------|
| `FR` | Fourier Ratio (complexity measure) | 1 to √N |
| `keep_prob` | Observation probability (1 - missing rate) | 0.6 - 0.8 |
| `eps` | Recovery accuracy parameter | 0.1 - 0.5 |
| `C` | Universal constant multiplier | 1.0 |
| `q` | Number of observations for recovery | Computed from formula |
| `N` | Signal length | Problem-dependent |
| `seed` | Random seed for reproducibility | Any integer |

**Key Insights:**
- Signals with small FR can be recovered from few observations
- L1 minimization outperforms naive interpolation for structured signals
- The number of required observations scales as $q \propto FR^2 \log^2(FR) \log(N)$
- Recovery quality degrades gracefully with higher missing rates