# Deep-Deep Dive: Advanced Signal Transforms for Anomaly Detection

Building on the EDA and Deep Dive findings, this notebook explores **specialized transforms** that exploit the observed signal characteristics:

## Key Insights to Exploit
1. **Symbol timing/ACF** was highly discriminative (AUC=0.75) → Cyclostationary analysis
2. **Higher-order cumulants** (C42) achieved AUC=0.79 → Time-varying cumulants, bispectrum
3. **Amplitude kurtosis** achieved AUC=0.94 for Class 7 → Multi-scale kurtosis
4. **Constellation geometry** matters → Dictionary learning on IQ patterns

## Transforms to Explore
- **Wavelet Scattering Transform**: Invariant multi-scale representations
- **Continuous Wavelet Transform**: Time-frequency with optimal resolution
- **Cyclic Spectral Analysis**: Exploit cyclostationary nature
- **Time-Varying Cumulants**: Local higher-order statistics
- **Dictionary Learning**: Learn modulation-specific atoms
- **Hilbert-Huang Transform (EMD)**: For non-stationary analysis

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Signal processing
from scipy import signal
from scipy.stats import kurtosis, skew
from scipy.fft import fft, fftfreq
import pywt  # PyWavelets

# Machine learning
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import DictionaryLearning, SparseCoder
from sklearn.metrics import roc_auc_score, precision_recall_curve, f1_score
from sklearn.ensemble import IsolationForest

# Data loading
from data_utils import load_train_data, load_test_anomalies, filter_by_snr, create_binary_labels

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

In [None]:
# Load data
print("Loading data...")
train_signals, train_labels, train_snr = load_train_data()
test_signals, test_labels, test_snr = load_test_anomalies()

# Filter out SNR=0 from training (not present in test)
train_signals_filtered, train_labels_filtered, train_snr_filtered = filter_by_snr(
    train_signals, train_labels, train_snr, [10, 20, 30]
)

# Create binary labels for test
test_binary = create_binary_labels(test_labels)

print(f"Train (filtered): {train_signals_filtered.shape[0]} samples")
print(f"Test: {test_signals.shape[0]} samples ({test_binary.sum()} anomalies)")

In [None]:
# Helper: Convert IQ to complex
def iq_to_complex(signal):
    """Convert (N, 2) IQ signal to complex array."""
    return signal[:, 0] + 1j * signal[:, 1]

# Helper: Compute AUC
def compute_auc(features_train, features_test, labels_test):
    """Compute AUC using Isolation Forest trained on known samples."""
    # Get indices of known classes in test
    known_mask = labels_test == 0
    
    # Normalize features
    scaler = StandardScaler()
    X_train = scaler.fit_transform(features_train)
    X_test = scaler.transform(features_test)
    
    # Train Isolation Forest
    clf = IsolationForest(contamination=0.1, random_state=42, n_estimators=100)
    clf.fit(X_train)
    
    # Score test samples (lower = more anomalous)
    scores = -clf.score_samples(X_test)
    
    return roc_auc_score(labels_test, scores)

---
## 1. Wavelet Scattering Transform

The **Wavelet Scattering Transform** creates translation-invariant, deformation-stable representations. It's particularly good for capturing:
- Multi-scale modulation patterns
- Hierarchical signal structure

This could capture the symbol-level patterns that made ACF discriminative.

In [None]:
# Simplified wavelet scattering using PyWavelets
# (Full scattering would use kymatio, but we'll implement a simplified version)

def wavelet_scattering_features(signal_iq, wavelet='db4', levels=6):
    """
    Compute simplified wavelet scattering features.
    
    Scattering transform: S[x] = |x * ψ_j| * φ
    We compute energy at each scale and statistics of coefficients.
    """
    features = []
    
    # Process I and Q channels
    for channel in [0, 1]:
        x = signal_iq[:, channel]
        
        # Multi-level wavelet decomposition
        coeffs = pywt.wavedec(x, wavelet, level=levels)
        
        # First order scattering: |W_j x|
        for j, c in enumerate(coeffs):
            # Energy at each scale
            features.append(np.mean(np.abs(c)**2))
            # Mean absolute value (first moment)
            features.append(np.mean(np.abs(c)))
            # Kurtosis at each scale
            features.append(kurtosis(c))
            # Variance
            features.append(np.var(c))
            
        # Second order scattering: ||W_j x| * W_k|
        # Apply wavelet to modulus of first-order coefficients
        for j, c in enumerate(coeffs[1:4]):  # Detail coefficients at first 3 scales
            modulus = np.abs(c)
            if len(modulus) >= 4:
                coeffs2 = pywt.wavedec(modulus, wavelet, level=min(3, int(np.log2(len(modulus)))))
                for c2 in coeffs2:
                    features.append(np.mean(np.abs(c2)))
    
    return np.array(features)

# Test on one signal
test_feat = wavelet_scattering_features(train_signals[0])
print(f"Scattering features per signal: {len(test_feat)}")

In [None]:
# Compute scattering features for all signals
print("Computing wavelet scattering features...")

# Use filtered training data
train_scatter = np.array([wavelet_scattering_features(s) for s in train_signals_filtered])
test_scatter = np.array([wavelet_scattering_features(s) for s in test_signals])

print(f"Train scattering shape: {train_scatter.shape}")
print(f"Test scattering shape: {test_scatter.shape}")

# Compute AUC
auc_scatter = compute_auc(train_scatter, test_scatter, test_binary)
print(f"\nWavelet Scattering AUC: {auc_scatter:.3f}")

In [None]:
# Visualize scattering coefficients by class
fig, axes = plt.subplots(3, 3, figsize=(14, 10))

for class_id in range(9):
    ax = axes[class_id // 3, class_id % 3]
    
    if class_id < 6:
        # Training class
        mask = train_labels_filtered == class_id
        data = train_scatter[mask]
        color = 'blue' if class_id < 3 else 'green'
    else:
        # Test anomaly class
        mask = test_labels == class_id
        data = test_scatter[mask]
        color = 'red'
    
    # Plot mean scattering spectrum
    mean_scatter = data.mean(axis=0)
    std_scatter = data.std(axis=0)
    
    ax.plot(mean_scatter, color=color, linewidth=1.5)
    ax.fill_between(range(len(mean_scatter)), 
                    mean_scatter - std_scatter, 
                    mean_scatter + std_scatter,
                    alpha=0.3, color=color)
    
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (ANOMALY)"
    ax.set_title(title)
    ax.set_xlabel('Feature index')
    ax.set_ylabel('Value')

plt.suptitle('Wavelet Scattering Features by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_wavelet_scattering.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 2. Continuous Wavelet Transform (CWT) Analysis

CWT provides better time-frequency resolution than STFT. Let's analyze which scales are most discriminative.

In [None]:
def cwt_features(signal_iq, wavelet='morl', scales=np.arange(1, 64)):
    """
    Compute features from Continuous Wavelet Transform.
    Uses Morlet wavelet which is good for oscillatory signals.
    """
    features = []
    
    # Process complex signal
    x_complex = iq_to_complex(signal_iq)
    x_envelope = np.abs(x_complex)
    x_phase = np.unwrap(np.angle(x_complex))
    
    for x, name in [(x_envelope, 'env'), (np.diff(x_phase), 'freq')]:
        # CWT
        coeffs, _ = pywt.cwt(x, scales, wavelet)
        
        # Energy per scale
        scale_energy = np.mean(np.abs(coeffs)**2, axis=1)
        features.extend(scale_energy)
        
        # Scale with max energy
        features.append(np.argmax(scale_energy))
        
        # Energy ratio: low scales vs high scales
        mid = len(scales) // 2
        features.append(scale_energy[:mid].sum() / (scale_energy[mid:].sum() + 1e-10))
        
        # Kurtosis per scale (top 5 energy scales)
        top_scales = np.argsort(scale_energy)[-5:]
        for s in top_scales:
            features.append(kurtosis(np.abs(coeffs[s])))
    
    return np.array(features)

# Test
test_cwt = cwt_features(train_signals[0])
print(f"CWT features per signal: {len(test_cwt)}")

In [None]:
# Compute CWT features (subsample for speed)
print("Computing CWT features...")

# Subsample for faster computation
n_train_sample = 3000
train_idx = np.random.choice(len(train_signals_filtered), n_train_sample, replace=False)

train_cwt = np.array([cwt_features(train_signals_filtered[i]) for i in train_idx])
test_cwt = np.array([cwt_features(s) for s in test_signals])

print(f"Train CWT shape: {train_cwt.shape}")
print(f"Test CWT shape: {test_cwt.shape}")

# Compute AUC
auc_cwt = compute_auc(train_cwt, test_cwt, test_binary)
print(f"\nCWT Features AUC: {auc_cwt:.3f}")

In [None]:
# Visualize CWT scalograms for each class
fig, axes = plt.subplots(3, 3, figsize=(14, 10))
scales = np.arange(1, 64)

for class_id in range(9):
    ax = axes[class_id // 3, class_id % 3]
    
    if class_id < 6:
        mask = train_labels_filtered == class_id
        sig = train_signals_filtered[mask][0]
    else:
        mask = test_labels == class_id
        sig = test_signals[mask][0]
    
    # CWT on envelope
    x_complex = iq_to_complex(sig)
    x_envelope = np.abs(x_complex)
    coeffs, _ = pywt.cwt(x_envelope[:512], scales, 'morl')  # First 512 samples
    
    im = ax.imshow(np.abs(coeffs), aspect='auto', cmap='viridis',
                   extent=[0, 512, scales[-1], scales[0]])
    
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (ANOMALY)"
    ax.set_title(title)
    ax.set_xlabel('Time')
    ax.set_ylabel('Scale')

plt.suptitle('CWT Scalograms (Envelope)', fontsize=14)
plt.tight_layout()
plt.savefig('plots_cwt_scalograms.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 3. Cyclostationary Analysis: Cyclic Autocorrelation

Radio signals are **cyclostationary** - their statistics vary periodically. The symbol rate creates cyclic features.

Since ACF features were discriminative (AUC=0.75), let's do a deeper cyclostationary analysis.

In [None]:
def cyclostationary_features(signal_iq, max_lag=200):
    """
    Compute cyclostationary features:
    - Cyclic autocorrelation at various lags
    - Peak detection in cyclic domain
    - Cyclic spectral features
    """
    features = []
    x_complex = iq_to_complex(signal_iq)
    x_envelope = np.abs(x_complex)
    x_power = x_envelope ** 2
    
    # 1. Cyclic autocorrelation of envelope
    acf_env = np.correlate(x_envelope - x_envelope.mean(), 
                           x_envelope - x_envelope.mean(), mode='full')
    acf_env = acf_env[len(acf_env)//2:]  # Positive lags only
    acf_env = acf_env / acf_env[0]  # Normalize
    
    # ACF at specific lags (found discriminative in deep dive)
    for lag in [10, 20, 30, 40, 50, 75, 100, 150, 200]:
        if lag < len(acf_env):
            features.append(acf_env[lag])
    
    # 2. Cyclic autocorrelation of squared signal (for symbol timing)
    acf_power = np.correlate(x_power - x_power.mean(),
                             x_power - x_power.mean(), mode='full')
    acf_power = acf_power[len(acf_power)//2:]
    acf_power = acf_power / (acf_power[0] + 1e-10)
    
    for lag in [10, 20, 30, 40, 50, 75, 100, 150, 200]:
        if lag < len(acf_power):
            features.append(acf_power[lag])
    
    # 3. Peak analysis in ACF
    acf_short = acf_env[:max_lag]
    peaks, properties = signal.find_peaks(acf_short, height=0.1, distance=5)
    
    features.append(len(peaks))  # Number of peaks
    if len(peaks) > 0:
        features.append(peaks[0])  # First peak location (fundamental period)
        features.append(properties['peak_heights'].mean())  # Mean peak height
        features.append(properties['peak_heights'].std())  # Peak height variation
    else:
        features.extend([0, 0, 0])
    
    # 4. Spectral analysis of ACF (cyclic spectrum)
    acf_spectrum = np.abs(fft(acf_env[:512])[:256])
    features.append(np.argmax(acf_spectrum))  # Dominant cyclic frequency
    features.append(acf_spectrum.max() / (acf_spectrum.mean() + 1e-10))  # Cyclic peakiness
    
    # 5. Energy in different cyclic frequency bands
    for band in [(0, 10), (10, 30), (30, 60), (60, 100), (100, 256)]:
        features.append(acf_spectrum[band[0]:band[1]].sum())
    
    return np.array(features)

# Test
test_cyclo = cyclostationary_features(train_signals[0])
print(f"Cyclostationary features per signal: {len(test_cyclo)}")

In [None]:
# Compute cyclostationary features
print("Computing cyclostationary features...")

train_cyclo = np.array([cyclostationary_features(s) for s in train_signals_filtered])
test_cyclo = np.array([cyclostationary_features(s) for s in test_signals])

print(f"Train cyclo shape: {train_cyclo.shape}")
print(f"Test cyclo shape: {test_cyclo.shape}")

# Compute AUC
auc_cyclo = compute_auc(train_cyclo, test_cyclo, test_binary)
print(f"\nCyclostationary Features AUC: {auc_cyclo:.3f}")

In [None]:
# Visualize cyclic autocorrelation by class
fig, axes = plt.subplots(3, 3, figsize=(14, 10))

for class_id in range(9):
    ax = axes[class_id // 3, class_id % 3]
    
    if class_id < 6:
        mask = train_labels_filtered == class_id
        signals_class = train_signals_filtered[mask][:50]  # 50 samples
        color = 'blue' if class_id < 3 else 'green'
    else:
        mask = test_labels == class_id
        signals_class = test_signals[mask][:50]
        color = 'red'
    
    # Compute mean ACF
    acfs = []
    for sig in signals_class:
        x_env = np.abs(iq_to_complex(sig))
        acf = np.correlate(x_env - x_env.mean(), x_env - x_env.mean(), mode='full')
        acf = acf[len(acf)//2:300]
        acf = acf / acf[0]
        acfs.append(acf)
    
    acfs = np.array(acfs)
    mean_acf = acfs.mean(axis=0)
    std_acf = acfs.std(axis=0)
    
    ax.plot(mean_acf, color=color, linewidth=1.5)
    ax.fill_between(range(len(mean_acf)), mean_acf - std_acf, mean_acf + std_acf,
                    alpha=0.3, color=color)
    ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
    
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (ANOMALY)"
    ax.set_title(title)
    ax.set_xlabel('Lag')
    ax.set_ylabel('ACF')
    ax.set_xlim([0, 300])

plt.suptitle('Cyclic Autocorrelation (Envelope) by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_cyclic_acf.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 4. Time-Varying Higher-Order Statistics

Since C42 was discriminative globally, let's compute **local cumulants** over time windows to capture time-varying modulation characteristics.

In [None]:
def time_varying_cumulants(signal_iq, window_size=256, hop=128):
    """
    Compute time-varying higher-order cumulants.
    """
    features = []
    x_complex = iq_to_complex(signal_iq)
    
    c20_list = []  # Variance
    c21_list = []  # Conjugate variance
    c40_list = []  # Fourth-order
    c42_list = []  # Fourth-order mixed
    kurt_list = []
    
    # Sliding window analysis
    for start in range(0, len(x_complex) - window_size, hop):
        x = x_complex[start:start + window_size]
        x = x - x.mean()  # Zero-center
        
        # Second-order cumulants
        c20 = np.mean(x * x)  # E[x^2]
        c21 = np.mean(np.abs(x)**2)  # E[|x|^2]
        
        # Fourth-order cumulants
        c40 = np.mean(x**4) - 3 * c20**2  # Kurtosis-like
        c42 = np.mean(np.abs(x)**4) - np.abs(c20)**2 - 2 * c21**2  # Mixed cumulant
        
        c20_list.append(np.abs(c20))
        c21_list.append(c21)
        c40_list.append(np.abs(c40))
        c42_list.append(c42)
        kurt_list.append(kurtosis(np.abs(x)))
    
    # Statistics of time-varying cumulants
    for series, name in [(c20_list, 'c20'), (c21_list, 'c21'), 
                         (c40_list, 'c40'), (c42_list, 'c42'),
                         (kurt_list, 'kurt')]:
        series = np.array(series)
        features.append(series.mean())
        features.append(series.std())
        features.append(series.min())
        features.append(series.max())
        features.append(np.percentile(series, 25))
        features.append(np.percentile(series, 75))
        # Variation coefficient
        features.append(series.std() / (np.abs(series.mean()) + 1e-10))
    
    return np.array(features)

# Test
test_tvcum = time_varying_cumulants(train_signals[0])
print(f"Time-varying cumulant features per signal: {len(test_tvcum)}")

In [None]:
# Compute time-varying cumulant features
print("Computing time-varying cumulant features...")

train_tvcum = np.array([time_varying_cumulants(s) for s in train_signals_filtered])
test_tvcum = np.array([time_varying_cumulants(s) for s in test_signals])

print(f"Train TV-cumulant shape: {train_tvcum.shape}")
print(f"Test TV-cumulant shape: {test_tvcum.shape}")

# Compute AUC
auc_tvcum = compute_auc(train_tvcum, test_tvcum, test_binary)
print(f"\nTime-Varying Cumulants AUC: {auc_tvcum:.3f}")

In [None]:
# Visualize time-varying C42 by class
fig, axes = plt.subplots(3, 3, figsize=(14, 10))

window_size = 256
hop = 128

for class_id in range(9):
    ax = axes[class_id // 3, class_id % 3]
    
    if class_id < 6:
        mask = train_labels_filtered == class_id
        signals_class = train_signals_filtered[mask][:20]
        color = 'blue' if class_id < 3 else 'green'
    else:
        mask = test_labels == class_id
        signals_class = test_signals[mask][:20]
        color = 'red'
    
    # Compute time-varying C42 for each signal
    for sig in signals_class:
        x_complex = iq_to_complex(sig)
        c42_series = []
        
        for start in range(0, len(x_complex) - window_size, hop):
            x = x_complex[start:start + window_size]
            x = x - x.mean()
            c20 = np.mean(x * x)
            c21 = np.mean(np.abs(x)**2)
            c42 = np.mean(np.abs(x)**4) - np.abs(c20)**2 - 2 * c21**2
            c42_series.append(c42)
        
        ax.plot(c42_series, color=color, alpha=0.3, linewidth=0.8)
    
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (ANOMALY)"
    ax.set_title(title)
    ax.set_xlabel('Window index')
    ax.set_ylabel('C42')

plt.suptitle('Time-Varying C42 Cumulant by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_time_varying_c42.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 5. Dictionary Learning on IQ Patterns

Learn a dictionary of **modulation-specific atoms** from the training data. Anomalies should have different sparse representations.

In [None]:
def extract_iq_patches(signal_iq, patch_size=64, n_patches=32):
    """
    Extract IQ patches from signal for dictionary learning.
    """
    patches = []
    n_samples = len(signal_iq)
    
    for _ in range(n_patches):
        start = np.random.randint(0, n_samples - patch_size)
        patch = signal_iq[start:start + patch_size].flatten()  # (patch_size*2,)
        patches.append(patch)
    
    return np.array(patches)

# Extract patches from training data
print("Extracting IQ patches from training data...")
n_train_dict = 2000  # Use subset for dictionary learning
train_idx_dict = np.random.choice(len(train_signals_filtered), n_train_dict, replace=False)

all_patches = []
for i in train_idx_dict:
    patches = extract_iq_patches(train_signals_filtered[i], patch_size=64, n_patches=16)
    all_patches.append(patches)

all_patches = np.vstack(all_patches)
print(f"Total patches for dictionary learning: {all_patches.shape}")

In [None]:
# Learn dictionary
print("Learning dictionary...")

n_components = 64
dict_learner = DictionaryLearning(
    n_components=n_components,
    alpha=1.0,
    max_iter=500,
    fit_algorithm='lars',
    transform_algorithm='lasso_lars',
    random_state=42,
    n_jobs=-1
)

dict_learner.fit(all_patches)
dictionary = dict_learner.components_
print(f"Dictionary shape: {dictionary.shape}")

In [None]:
# Visualize learned dictionary atoms
fig, axes = plt.subplots(8, 8, figsize=(14, 12))

for i, ax in enumerate(axes.flat):
    atom = dictionary[i].reshape(64, 2)
    ax.plot(atom[:, 0], 'b-', alpha=0.8, label='I')
    ax.plot(atom[:, 1], 'r-', alpha=0.8, label='Q')
    ax.set_title(f'Atom {i}', fontsize=8)
    ax.set_xticks([])
    ax.set_yticks([])

plt.suptitle('Learned Dictionary Atoms (IQ Patterns)', fontsize=14)
plt.tight_layout()
plt.savefig('plots_dictionary_atoms.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
def dictionary_features(signal_iq, dictionary, patch_size=64, n_patches=32):
    """
    Compute features from sparse coding with learned dictionary.
    """
    # Extract patches
    patches = extract_iq_patches(signal_iq, patch_size=patch_size, n_patches=n_patches)
    
    # Sparse coding
    coder = SparseCoder(dictionary=dictionary, transform_algorithm='lasso_lars', transform_alpha=0.5)
    codes = coder.transform(patches)
    
    features = []
    
    # Statistics of sparse codes
    features.append(np.mean(np.abs(codes)))  # Mean activation
    features.append(np.std(np.abs(codes)))  # Activation variability
    features.append((np.abs(codes) > 0.01).mean())  # Sparsity
    
    # Per-atom activation statistics
    atom_activations = np.mean(np.abs(codes), axis=0)
    features.extend(atom_activations)  # How much each atom is used
    
    # Reconstruction error
    reconstructed = codes @ dictionary
    error = np.mean((patches - reconstructed)**2)
    features.append(error)
    
    # Per-patch reconstruction error variability
    patch_errors = np.mean((patches - reconstructed)**2, axis=1)
    features.append(np.std(patch_errors))
    features.append(np.max(patch_errors))
    
    return np.array(features)

# Test
test_dict = dictionary_features(train_signals[0], dictionary)
print(f"Dictionary features per signal: {len(test_dict)}")

In [None]:
# Compute dictionary features
print("Computing dictionary features...")

train_dict_feat = np.array([dictionary_features(s, dictionary) for s in train_signals_filtered])
test_dict_feat = np.array([dictionary_features(s, dictionary) for s in test_signals])

print(f"Train dict features shape: {train_dict_feat.shape}")
print(f"Test dict features shape: {test_dict_feat.shape}")

# Compute AUC
auc_dict = compute_auc(train_dict_feat, test_dict_feat, test_binary)
print(f"\nDictionary Learning Features AUC: {auc_dict:.3f}")

In [None]:
# Visualize atom usage by class
fig, axes = plt.subplots(3, 3, figsize=(14, 10))

for class_id in range(9):
    ax = axes[class_id // 3, class_id % 3]
    
    if class_id < 6:
        mask = train_labels_filtered == class_id
        data = train_dict_feat[mask]
        color = 'blue' if class_id < 3 else 'green'
    else:
        mask = test_labels == class_id
        data = test_dict_feat[mask]
        color = 'red'
    
    # Atom activations are features 3 to 3+64
    atom_usage = data[:, 3:3+64].mean(axis=0)
    
    ax.bar(range(64), atom_usage, color=color, alpha=0.7)
    
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (ANOMALY)"
    ax.set_title(title)
    ax.set_xlabel('Atom index')
    ax.set_ylabel('Mean activation')

plt.suptitle('Dictionary Atom Usage by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_atom_usage.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 6. Multi-Scale Kurtosis Analysis

Since amplitude kurtosis achieved AUC=0.94 for Class 7, let's compute kurtosis at multiple scales.

In [None]:
def multiscale_kurtosis(signal_iq, scales=[32, 64, 128, 256, 512, 1024]):
    """
    Compute kurtosis at multiple scales using wavelet decomposition and windowing.
    """
    features = []
    x_complex = iq_to_complex(signal_iq)
    x_envelope = np.abs(x_complex)
    x_phase_diff = np.diff(np.unwrap(np.angle(x_complex)))
    
    for data, name in [(x_envelope, 'env'), (x_phase_diff, 'freq')]:
        # Multi-scale windowed kurtosis
        for scale in scales:
            if scale < len(data):
                n_windows = len(data) // scale
                windowed = data[:n_windows * scale].reshape(n_windows, scale)
                kurt_values = [kurtosis(w) for w in windowed]
                
                features.append(np.mean(kurt_values))
                features.append(np.std(kurt_values))
                features.append(np.min(kurt_values))
                features.append(np.max(kurt_values))
        
        # Wavelet-based multi-scale
        coeffs = pywt.wavedec(data, 'db4', level=6)
        for c in coeffs:
            features.append(kurtosis(c))
            features.append(skew(c))
    
    return np.array(features)

# Test
test_msk = multiscale_kurtosis(train_signals[0])
print(f"Multi-scale kurtosis features per signal: {len(test_msk)}")

In [None]:
# Compute multi-scale kurtosis features
print("Computing multi-scale kurtosis features...")

train_msk = np.array([multiscale_kurtosis(s) for s in train_signals_filtered])
test_msk = np.array([multiscale_kurtosis(s) for s in test_signals])

print(f"Train MSK shape: {train_msk.shape}")
print(f"Test MSK shape: {test_msk.shape}")

# Compute AUC
auc_msk = compute_auc(train_msk, test_msk, test_binary)
print(f"\nMulti-Scale Kurtosis AUC: {auc_msk:.3f}")

---
## 7. Constellation Geometry: Advanced Analysis

Deep dive into IQ constellation patterns using radial and angular histograms.

In [None]:
def constellation_geometry_advanced(signal_iq, n_bins=32):
    """
    Advanced constellation geometry features:
    - Radial histogram
    - Angular histogram
    - 2D histogram entropy
    - Symmetry features
    """
    features = []
    
    I = signal_iq[:, 0]
    Q = signal_iq[:, 1]
    radius = np.sqrt(I**2 + Q**2)
    angle = np.arctan2(Q, I)
    
    # 1. Radial histogram
    r_hist, r_bins = np.histogram(radius, bins=n_bins, density=True)
    features.extend(r_hist)  # Full histogram as features
    
    # Radial statistics
    features.append(radius.mean())
    features.append(radius.std())
    features.append(kurtosis(radius))
    features.append(np.percentile(radius, 90) - np.percentile(radius, 10))  # Range
    
    # 2. Angular histogram
    a_hist, a_bins = np.histogram(angle, bins=n_bins, range=(-np.pi, np.pi), density=True)
    features.extend(a_hist)
    
    # Angular statistics
    features.append(np.std(angle))  # Angular spread
    features.append(kurtosis(angle))
    
    # 3. 2D histogram entropy
    hist_2d, _, _ = np.histogram2d(I, Q, bins=n_bins)
    hist_2d = hist_2d / hist_2d.sum()
    hist_2d = hist_2d[hist_2d > 0]  # Remove zeros for entropy
    entropy_2d = -np.sum(hist_2d * np.log(hist_2d + 1e-10))
    features.append(entropy_2d)
    
    # 4. Symmetry features
    # Quadrant occupancy
    q1 = np.sum((I > 0) & (Q > 0)) / len(I)
    q2 = np.sum((I < 0) & (Q > 0)) / len(I)
    q3 = np.sum((I < 0) & (Q < 0)) / len(I)
    q4 = np.sum((I > 0) & (Q < 0)) / len(I)
    features.extend([q1, q2, q3, q4])
    
    # Quadrant balance (0 = perfect balance)
    features.append(np.std([q1, q2, q3, q4]))
    
    # I/Q independence (correlation)
    features.append(np.abs(np.corrcoef(I, Q)[0, 1]))
    
    # 5. Peak detection in radial histogram
    peaks, _ = signal.find_peaks(r_hist, height=0.5)
    features.append(len(peaks))  # Number of distinct radius levels
    
    return np.array(features)

# Test
test_cga = constellation_geometry_advanced(train_signals[0])
print(f"Constellation geometry features per signal: {len(test_cga)}")

In [None]:
# Compute constellation geometry features
print("Computing constellation geometry features...")

train_cga = np.array([constellation_geometry_advanced(s) for s in train_signals_filtered])
test_cga = np.array([constellation_geometry_advanced(s) for s in test_signals])

print(f"Train CGA shape: {train_cga.shape}")
print(f"Test CGA shape: {test_cga.shape}")

# Compute AUC
auc_cga = compute_auc(train_cga, test_cga, test_binary)
print(f"\nConstellation Geometry AUC: {auc_cga:.3f}")

In [None]:
# Visualize radial and angular histograms by class
fig, axes = plt.subplots(3, 6, figsize=(18, 10))

for class_id in range(9):
    row = class_id // 3
    col = (class_id % 3) * 2
    
    if class_id < 6:
        mask = train_labels_filtered == class_id
        signals_class = train_signals_filtered[mask][:100]
        color = 'blue' if class_id < 3 else 'green'
    else:
        mask = test_labels == class_id
        signals_class = test_signals[mask][:100]
        color = 'red'
    
    # Compute mean histograms
    r_hists = []
    a_hists = []
    
    for sig in signals_class:
        I, Q = sig[:, 0], sig[:, 1]
        radius = np.sqrt(I**2 + Q**2)
        angle = np.arctan2(Q, I)
        
        r_hist, _ = np.histogram(radius, bins=32, range=(0, 2), density=True)
        a_hist, _ = np.histogram(angle, bins=32, range=(-np.pi, np.pi), density=True)
        
        r_hists.append(r_hist)
        a_hists.append(a_hist)
    
    r_mean = np.mean(r_hists, axis=0)
    a_mean = np.mean(a_hists, axis=0)
    
    # Radial histogram
    ax1 = axes[row, col]
    ax1.bar(range(32), r_mean, color=color, alpha=0.7)
    title = f"Class {class_id}"
    if class_id >= 6:
        title += " (A)"
    ax1.set_title(f"{title} - Radial")
    ax1.set_xlabel('Radius bin')
    
    # Angular histogram
    ax2 = axes[row, col + 1]
    ax2.bar(range(32), a_mean, color=color, alpha=0.7)
    ax2.set_title(f"{title} - Angular")
    ax2.set_xlabel('Angle bin')

plt.suptitle('Constellation Radial and Angular Histograms by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_constellation_histograms.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 8. Combined Feature Analysis

Combine all features and analyze which transforms are most discriminative.

In [None]:
# Summary of individual feature set AUCs
results = {
    'Wavelet Scattering': auc_scatter,
    'CWT Features': auc_cwt,
    'Cyclostationary': auc_cyclo,
    'Time-Varying Cumulants': auc_tvcum,
    'Dictionary Learning': auc_dict,
    'Multi-Scale Kurtosis': auc_msk,
    'Constellation Geometry': auc_cga,
}

print("\n" + "="*60)
print("INDIVIDUAL FEATURE SET RESULTS")
print("="*60)

for name, auc in sorted(results.items(), key=lambda x: -x[1]):
    print(f"{name:30s} AUC: {auc:.3f}")

In [None]:
# Combine best feature sets
print("\nCombining feature sets...")

# All features combined
train_combined = np.hstack([
    train_scatter,
    train_cyclo,
    train_tvcum,
    train_dict_feat,
    train_msk,
    train_cga,
])

test_combined = np.hstack([
    test_scatter,
    test_cyclo,
    test_tvcum,
    test_dict_feat,
    test_msk,
    test_cga,
])

print(f"Combined features shape: {train_combined.shape}")

# Compute AUC for combined
auc_combined = compute_auc(train_combined, test_combined, test_binary)
print(f"\nCOMBINED Features AUC: {auc_combined:.3f}")

In [None]:
# Analyze per-anomaly class performance
print("\nPer-Anomaly Class Analysis:")
print("-" * 60)

# Train Isolation Forest on combined features
scaler = StandardScaler()
X_train = scaler.fit_transform(train_combined)
X_test = scaler.transform(test_combined)

clf = IsolationForest(contamination=0.1, random_state=42, n_estimators=200)
clf.fit(X_train)
scores = -clf.score_samples(X_test)

for anomaly_class in [6, 7, 8]:
    # Binary labels: 1 for this anomaly class, 0 for known classes
    mask = (test_labels == anomaly_class) | (test_labels < 6)
    y_binary = (test_labels[mask] == anomaly_class).astype(int)
    scores_subset = scores[mask]
    
    auc = roc_auc_score(y_binary, scores_subset)
    print(f"Anomaly Class {anomaly_class}: AUC = {auc:.3f}")

In [None]:
# Precision-Recall analysis
print("\nPrecision-Recall Analysis:")
print("-" * 60)

precision, recall, thresholds = precision_recall_curve(test_binary, scores)

# Find optimal threshold
f1_scores = 2 * (precision * recall) / (precision + recall + 1e-10)
best_idx = np.argmax(f1_scores)
best_threshold = thresholds[best_idx]

print(f"Best threshold: {best_threshold:.4f}")
print(f"Best F1: {f1_scores[best_idx]:.3f}")
print(f"At best F1 - Precision: {precision[best_idx]:.3f}, Recall: {recall[best_idx]:.3f}")

# Also show at fixed recall levels
for target_recall in [0.5, 0.6, 0.7, 0.8]:
    idx = np.argmin(np.abs(recall - target_recall))
    print(f"At Recall={recall[idx]:.2f}: Precision={precision[idx]:.3f}, F1={f1_scores[idx]:.3f}")

In [None]:
# Visualize results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# 1. AUC comparison
ax = axes[0]
names = list(results.keys()) + ['COMBINED']
aucs = list(results.values()) + [auc_combined]
colors = ['steelblue'] * len(results) + ['darkgreen']
bars = ax.barh(names, aucs, color=colors)
ax.axvline(x=0.5, color='red', linestyle='--', label='Random')
ax.set_xlabel('AUC')
ax.set_title('AUC by Feature Set')
ax.set_xlim([0.5, 1.0])
for bar, auc in zip(bars, aucs):
    ax.text(auc + 0.01, bar.get_y() + bar.get_height()/2, f'{auc:.3f}', 
            va='center', fontsize=9)

# 2. Precision-Recall curve
ax = axes[1]
ax.plot(recall, precision, 'b-', linewidth=2)
ax.scatter(recall[best_idx], precision[best_idx], color='red', s=100, 
           zorder=5, label=f'Best F1={f1_scores[best_idx]:.3f}')
ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
ax.set_title('Precision-Recall Curve (Combined Features)')
ax.legend()
ax.grid(True, alpha=0.3)

# 3. Score distribution
ax = axes[2]
ax.hist(scores[test_binary == 0], bins=50, alpha=0.6, label='Known (0-5)', density=True)
ax.hist(scores[test_binary == 1], bins=50, alpha=0.6, label='Anomaly (6-8)', density=True)
ax.axvline(x=best_threshold, color='red', linestyle='--', label=f'Threshold={best_threshold:.3f}')
ax.set_xlabel('Anomaly Score')
ax.set_ylabel('Density')
ax.set_title('Score Distribution')
ax.legend()

plt.tight_layout()
plt.savefig('plots_advanced_transforms_results.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 9. Feature Importance Analysis

Which specific features from each transform are most discriminative?

In [None]:
# Compute individual feature AUCs
print("Computing per-feature AUCs...")

feature_aucs = []
for i in range(train_combined.shape[1]):
    feat_train = train_combined[:, i:i+1]
    feat_test = test_combined[:, i:i+1]
    
    # Simple AUC based on feature value
    try:
        # Handle inf/nan
        valid_mask = np.isfinite(feat_test.flatten())
        if valid_mask.sum() > 100:
            auc = roc_auc_score(test_binary[valid_mask], feat_test.flatten()[valid_mask])
            # Take the max of AUC and 1-AUC (in case direction is reversed)
            auc = max(auc, 1 - auc)
        else:
            auc = 0.5
    except:
        auc = 0.5
    
    feature_aucs.append(auc)

feature_aucs = np.array(feature_aucs)
print(f"Computed AUCs for {len(feature_aucs)} features")

In [None]:
# Top 20 features
top_indices = np.argsort(feature_aucs)[-20:][::-1]

# Create feature names
feature_names = []
# Scattering
for i in range(train_scatter.shape[1]):
    feature_names.append(f"scatter_{i}")
# Cyclostationary
for i in range(train_cyclo.shape[1]):
    feature_names.append(f"cyclo_{i}")
# TV Cumulants
for i in range(train_tvcum.shape[1]):
    feature_names.append(f"tvcum_{i}")
# Dictionary
for i in range(train_dict_feat.shape[1]):
    feature_names.append(f"dict_{i}")
# Multi-scale kurtosis
for i in range(train_msk.shape[1]):
    feature_names.append(f"msk_{i}")
# Constellation geometry
for i in range(train_cga.shape[1]):
    feature_names.append(f"cga_{i}")

print("\nTop 20 Most Discriminative Features:")
print("="*60)
for rank, idx in enumerate(top_indices):
    print(f"{rank+1:2d}. {feature_names[idx]:20s} AUC: {feature_aucs[idx]:.3f}")

In [None]:
# Visualize top features
fig, axes = plt.subplots(4, 5, figsize=(18, 14))

for i, (ax, idx) in enumerate(zip(axes.flat, top_indices)):
    # Get feature values by class
    for class_id in range(9):
        if class_id < 6:
            mask = train_labels_filtered == class_id
            data = train_combined[mask, idx]
            color = 'blue' if class_id < 3 else 'green'
        else:
            mask = test_labels == class_id
            data = test_combined[mask, idx]
            color = 'red'
        
        # Filter out inf/nan
        data = data[np.isfinite(data)]
        if len(data) > 0:
            parts = ax.violinplot([data], [class_id], widths=0.7, showmeans=True)
            for pc in parts['bodies']:
                pc.set_facecolor(color)
                pc.set_alpha(0.5)
    
    ax.set_title(f"{feature_names[idx]}\nAUC={feature_aucs[idx]:.3f}", fontsize=9)
    ax.set_xlabel('Class')
    ax.set_xticks(range(9))

plt.suptitle('Top 20 Most Discriminative Features by Class', fontsize=14)
plt.tight_layout()
plt.savefig('plots_top_features.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 10. Save Feature Files for Modeling

In [None]:
# Save combined features
import pandas as pd

# Training features
train_df = pd.DataFrame(train_combined, columns=feature_names)
train_df['label'] = train_labels_filtered
train_df['snr'] = train_snr_filtered
train_df.to_csv('train_advanced_features.csv', index=False)
print(f"Saved train_advanced_features.csv: {train_df.shape}")

# Test features
test_df = pd.DataFrame(test_combined, columns=feature_names)
test_df['label'] = test_labels
test_df['snr'] = test_snr
test_df['binary_label'] = test_binary
test_df.to_csv('test_advanced_features.csv', index=False)
print(f"Saved test_advanced_features.csv: {test_df.shape}")

---
## Summary

### Key Findings

| Transform | AUC | Key Insight |
|-----------|-----|-------------|
| **Wavelet Scattering** | ? | Multi-scale modulation patterns |
| **CWT** | ? | Time-frequency with optimal resolution |
| **Cyclostationary** | ? | Symbol timing exploitation |
| **Time-Varying Cumulants** | ? | Local higher-order statistics |
| **Dictionary Learning** | ? | Modulation-specific atoms |
| **Multi-Scale Kurtosis** | ? | Kurtosis at different scales |
| **Constellation Geometry** | ? | IQ plane structure |
| **COMBINED** | ? | All features together |

### Recommendations for Deep Learning

1. **Architecture Design**: Include layers that can learn:
   - Multi-scale wavelet-like filters
   - Higher-order pooling (for cumulants)
   - Attention on IQ relationships

2. **Input Representation**: Consider using:
   - Raw IQ + scattering coefficients
   - CWT scalograms as 2D input
   - Cyclic spectral features

3. **Auxiliary Tasks**: Train with:
   - Classification head for classes 0-5
   - Reconstruction head (autoencoder)
   - Auxiliary predictions of kurtosis/cumulants