# Harmonic Analysis and Signal Processing for Machine Learning
## CNNs, Frequency Domain Methods, and Spectral Learning

Welcome to the **frequency domain**! Harmonic analysis provides the mathematical foundation for understanding signals, images, and data through their frequency components.

### What You'll Master
By the end of this notebook, you'll understand:
1. **Fourier transforms** - Decomposing signals into frequencies
2. **Convolution theorem** - Why CNNs work so well
3. **Wavelets** - Multi-scale analysis of signals
4. **Spectral methods** - Learning in the frequency domain
5. **Graph signal processing** - Harmonic analysis on graphs
6. **Time-frequency analysis** - Joint time-frequency representations

### Why This is Revolutionary
- **CNNs**: Convolution in spatial domain = multiplication in frequency domain
- **Efficiency**: FFT makes convolution O(n log n) instead of O(n²)
- **Feature extraction**: Frequency components reveal hidden patterns
- **Denoising**: Separate signal from noise in frequency domain

### Real-World Applications
- **Computer vision**: Edge detection, texture analysis, image compression
- **Audio processing**: Speech recognition, music analysis, noise reduction
- **Medical imaging**: MRI reconstruction, signal enhancement
- **Finance**: Time series analysis, market trend detection

Let's dive into the frequency domain! 🎵📊

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
from scipy import signal, fft
from scipy.ndimage import gaussian_filter
import pywt
from sklearn.datasets import load_digits, make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("viridis")
np.random.seed(42)

print("🎵 Harmonic Analysis toolkit loaded!")
print("Ready to explore the frequency domain!")

## 1. Fourier Transforms and Frequency Analysis

### The Fourier Transform
The **Fourier transform** decomposes a signal into its frequency components:

**Continuous Fourier Transform**:
```
F(ω) = ∫_{-∞}^{∞} f(t) e^{-iωt} dt
```

**Discrete Fourier Transform (DFT)**:
```
X[k] = Σ_{n=0}^{N-1} x[n] e^{-i2πkn/N}
```

### Key Properties
1. **Linearity**: ℱ{af + bg} = aℱ{f} + bℱ{g}
2. **Time shifting**: ℱ{f(t-a)} = e^{-iωa}ℱ{f}
3. **Frequency shifting**: ℱ{e^{iω₀t}f(t)} = F(ω-ω₀)
4. **Parseval's theorem**: ∫|f(t)|²dt = (1/2π)∫|F(ω)|²dω

### Convolution Theorem
The most important property for machine learning:
```
ℱ{f * g} = ℱ{f} · ℱ{g}
```
**Convolution in time = multiplication in frequency**

This is why CNNs are so efficient and why frequency domain methods work!

### Fast Fourier Transform (FFT)
- **Complexity**: O(N log N) instead of O(N²)
- **Algorithm**: Divide-and-conquer using symmetries
- **Implementation**: Cooley-Tukey algorithm

### Why Frequency Analysis Matters in ML
- **Feature extraction**: Frequency components are informative features
- **Dimensionality reduction**: Often only a few frequencies matter
- **Noise reduction**: Separate signal from noise
- **Pattern recognition**: Many patterns have characteristic frequencies

In [None]:
def demonstrate_fourier_analysis():
    """Explore Fourier transforms and frequency domain analysis"""
    
    print("🎵 Fourier Analysis: Decomposing Signals into Frequencies")
    print("=" * 56)
    
    fig = plt.figure(figsize=(20, 15))
    
    # 1. Basic Fourier transform example
    print("\n1. Basic Fourier Transform")
    
    # Create a composite signal
    t = np.linspace(0, 2, 1000, endpoint=False)
    dt = t[1] - t[0]
    
    # Signal components
    f1, f2, f3 = 5, 15, 30  # Frequencies in Hz
    signal1 = np.sin(2 * np.pi * f1 * t)
    signal2 = 0.5 * np.sin(2 * np.pi * f2 * t)
    signal3 = 0.3 * np.sin(2 * np.pi * f3 * t)
    noise = 0.1 * np.random.randn(len(t))
    
    composite_signal = signal1 + signal2 + signal3 + noise
    
    # Compute FFT
    fft_signal = fft.fft(composite_signal)
    freqs = fft.fftfreq(len(t), dt)
    
    # Plot time domain
    ax1 = fig.add_subplot(3, 4, 1)
    ax1.plot(t[:200], composite_signal[:200], 'b-', linewidth=2, label='Composite')
    ax1.plot(t[:200], signal1[:200], 'r--', alpha=0.7, label=f'5 Hz')
    ax1.plot(t[:200], signal2[:200], 'g--', alpha=0.7, label=f'15 Hz')
    ax1.plot(t[:200], signal3[:200], 'm--', alpha=0.7, label=f'30 Hz')
    ax1.set_xlabel('Time (s)')
    ax1.set_ylabel('Amplitude')
    ax1.set_title('Time Domain Signal')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot frequency domain
    ax2 = fig.add_subplot(3, 4, 2)
    magnitude = np.abs(fft_signal)
    ax2.plot(freqs[:len(freqs)//2], magnitude[:len(freqs)//2], 'b-', linewidth=2)
    ax2.axvline(x=f1, color='red', linestyle='--', alpha=0.7, label=f'{f1} Hz')
    ax2.axvline(x=f2, color='green', linestyle='--', alpha=0.7, label=f'{f2} Hz')
    ax2.axvline(x=f3, color='magenta', linestyle='--', alpha=0.7, label=f'{f3} Hz')
    ax2.set_xlabel('Frequency (Hz)')
    ax2.set_ylabel('Magnitude')
    ax2.set_title('Frequency Domain (FFT)')
    ax2.set_xlim(0, 50)
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    print(f"   Original signal: mix of {f1}, {f2}, {f3} Hz + noise")
    print(f"   FFT perfectly identifies the frequency components")
    print(f"   Peak detection in frequency domain reveals hidden structure")
    
    # 2. Convolution theorem demonstration
    print("\n2. Convolution Theorem: Foundation of CNNs")
    
    # Create signals for convolution
    n = 128
    x = np.zeros(n)
    x[n//4:3*n//4] = 1  # Box signal
    
    # Gaussian kernel (like CNN filter)
    kernel_size = 15
    kernel = signal.windows.gaussian(kernel_size, std=3)
    kernel = kernel / np.sum(kernel)  # Normalize
    
    # Method 1: Direct convolution
    conv_direct = np.convolve(x, kernel, mode='same')
    
    # Method 2: FFT-based convolution
    X_fft = fft.fft(x, n)
    K_fft = fft.fft(kernel, n)
    conv_fft = np.real(fft.ifft(X_fft * K_fft))
    
    # Plot convolution comparison
    ax3 = fig.add_subplot(3, 4, 3)
    ax3.plot(x, 'b-', linewidth=2, label='Original signal')
    ax3.plot(conv_direct, 'r-', linewidth=2, label='Direct convolution')
    ax3.plot(conv_fft, 'g--', linewidth=2, alpha=0.8, label='FFT convolution')
    ax3.set_xlabel('Sample')
    ax3.set_ylabel('Amplitude')
    ax3.set_title('Convolution: Direct vs FFT')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Show frequency domain multiplication
    ax4 = fig.add_subplot(3, 4, 4)
    freqs_conv = fft.fftfreq(n)
    ax4.plot(freqs_conv[:n//2], np.abs(X_fft)[:n//2], 'b-', label='Signal FFT')
    ax4.plot(freqs_conv[:n//2], np.abs(K_fft)[:n//2], 'r-', label='Kernel FFT')
    ax4.plot(freqs_conv[:n//2], np.abs(X_fft * K_fft)[:n//2], 'g-', 
             linewidth=2, label='Product (convolution)')
    ax4.set_xlabel('Frequency')
    ax4.set_ylabel('Magnitude')
    ax4.set_title('Frequency Domain Multiplication')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    print(f"   Convolution theorem: f * g ↔ F · G")
    print(f"   Direct convolution: O(n²), FFT convolution: O(n log n)")
    print(f"   This is why large CNN kernels use FFT!")
    
    # 3. 2D FFT for images
    print("\n3. 2D FFT for Image Analysis")
    
    # Create a simple image with different frequency components
    x_img = np.linspace(-1, 1, 64)
    y_img = np.linspace(-1, 1, 64)
    X_img, Y_img = np.meshgrid(x_img, y_img)
    
    # Create image with different frequency patterns
    image = (np.sin(5 * np.pi * X_img) * np.cos(3 * np.pi * Y_img) + 
             0.5 * np.sin(15 * np.pi * X_img) + 
             0.3 * np.cos(10 * np.pi * Y_img))
    
    # Add some noise
    image += 0.2 * np.random.randn(*image.shape)
    
    # 2D FFT
    fft_image = fft.fft2(image)
    fft_magnitude = np.abs(fft.fftshift(fft_image))
    
    # Plot original image
    ax5 = fig.add_subplot(3, 4, 5)
    im1 = ax5.imshow(image, cmap='viridis', aspect='equal')
    ax5.set_title('Original Image')
    ax5.axis('off')
    plt.colorbar(im1, ax=ax5, shrink=0.8)
    
    # Plot FFT magnitude
    ax6 = fig.add_subplot(3, 4, 6)
    im2 = ax6.imshow(np.log(1 + fft_magnitude), cmap='hot', aspect='equal')
    ax6.set_title('2D FFT Magnitude (log scale)')
    ax6.axis('off')
    plt.colorbar(im2, ax=ax6, shrink=0.8)
    
    print(f"   2D FFT reveals spatial frequency content")
    print(f"   Bright spots = dominant spatial frequencies")
    print(f"   Center = low frequencies, edges = high frequencies")
    
    # 4. Frequency-based filtering
    print("\n4. Frequency Domain Filtering")
    
    # Low-pass filter (remove high frequencies)
    def low_pass_filter(fft_img, cutoff=0.2):
        h, w = fft_img.shape
        center_h, center_w = h // 2, w // 2
        Y, X = np.ogrid[:h, :w]
        dist = np.sqrt((X - center_w)**2 + (Y - center_h)**2)
        mask = dist <= cutoff * min(h, w) / 2
        filtered_fft = fft_img.copy()
        filtered_fft[~mask] = 0
        return filtered_fft
    
    # High-pass filter (remove low frequencies)
    def high_pass_filter(fft_img, cutoff=0.1):
        h, w = fft_img.shape
        center_h, center_w = h // 2, w // 2
        Y, X = np.ogrid[:h, :w]
        dist = np.sqrt((X - center_w)**2 + (Y - center_h)**2)
        mask = dist > cutoff * min(h, w) / 2
        filtered_fft = fft_img.copy()
        filtered_fft[~mask] = 0
        return filtered_fft
    
    # Apply filters
    fft_shifted = fft.fftshift(fft_image)
    low_passed_fft = low_pass_filter(fft_shifted)
    high_passed_fft = high_pass_filter(fft_shifted)
    
    # Convert back to spatial domain
    low_passed = np.real(fft.ifft2(fft.ifftshift(low_passed_fft)))
    high_passed = np.real(fft.ifft2(fft.ifftshift(high_passed_fft)))
    
    # Plot filtered results
    ax7 = fig.add_subplot(3, 4, 7)
    im3 = ax7.imshow(low_passed, cmap='viridis', aspect='equal')
    ax7.set_title('Low-pass Filtered (Smooth)')
    ax7.axis('off')
    plt.colorbar(im3, ax=ax7, shrink=0.8)
    
    ax8 = fig.add_subplot(3, 4, 8)
    im4 = ax8.imshow(high_passed, cmap='viridis', aspect='equal')
    ax8.set_title('High-pass Filtered (Edges)')
    ax8.axis('off')
    plt.colorbar(im4, ax=ax8, shrink=0.8)
    
    print(f"   Low-pass filter: Keeps low frequencies (smooth features)")
    print(f"   High-pass filter: Keeps high frequencies (edges, details)")
    print(f"   Same principle as CNN pooling and convolution!")
    
    # 5. Spectral analysis of real data
    print("\n5. Spectral Analysis for Feature Extraction")
    
    # Load digit dataset
    digits = load_digits()
    sample_digit = digits.images[0]  # First digit (8x8 image)
    
    # Compute 2D FFT of digit
    digit_fft = fft.fft2(sample_digit)
    digit_magnitude = np.abs(fft.fftshift(digit_fft))
    
    # Plot original digit
    ax9 = fig.add_subplot(3, 4, 9)
    im5 = ax9.imshow(sample_digit, cmap='gray', aspect='equal')
    ax9.set_title(f'Digit: {digits.target[0]}')
    ax9.axis('off')
    plt.colorbar(im5, ax=ax9, shrink=0.8)
    
    # Plot FFT
    ax10 = fig.add_subplot(3, 4, 10)
    im6 = ax10.imshow(np.log(1 + digit_magnitude), cmap='hot', aspect='equal')
    ax10.set_title('Digit FFT Spectrum')
    ax10.axis('off')
    plt.colorbar(im6, ax=ax10, shrink=0.8)
    
    # Extract spectral features for classification
    def extract_spectral_features(images):
        """Extract frequency domain features from images"""
        features = []
        for img in images:
            # 2D FFT
            fft_img = fft.fft2(img)
            magnitude = np.abs(fft_img)
            
            # Extract features: low frequency energy, high frequency energy, etc.
            total_energy = np.sum(magnitude**2)
            low_freq_energy = np.sum(magnitude[:2, :2]**2)
            high_freq_energy = np.sum(magnitude[-2:, -2:]**2)
            
            # Radial frequency profile
            h, w = magnitude.shape
            center_h, center_w = h // 2, w // 2
            Y, X = np.ogrid[:h, :w]
            dist = np.sqrt((X - center_w)**2 + (Y - center_h)**2)
            
            radial_profile = []
            for r in range(1, min(h, w) // 2):
                mask = (dist >= r-0.5) & (dist < r+0.5)
                if np.any(mask):
                    radial_profile.append(np.mean(magnitude[mask]))
            
            # Combine features
            spectral_features = [
                total_energy,
                low_freq_energy / total_energy,
                high_freq_energy / total_energy
            ] + radial_profile[:3]  # First few radial components
            
            features.append(spectral_features)
        
        return np.array(features)
    
    # Extract features for a subset of digits
    n_samples = 500
    sample_images = digits.images[:n_samples]
    sample_targets = digits.target[:n_samples]
    
    spectral_features = extract_spectral_features(sample_images)
    
    # Simple classification using spectral features
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import cross_val_score
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        spectral_features, sample_targets, test_size=0.3, random_state=42)
    
    # Train classifier
    rf_spectral = RandomForestClassifier(n_estimators=50, random_state=42)
    rf_spectral.fit(X_train, y_train)
    spectral_accuracy = rf_spectral.score(X_test, y_test)
    
    # Compare with pixel-based features
    pixel_features = sample_images.reshape(n_samples, -1)
    X_train_pixel, X_test_pixel, _, _ = train_test_split(
        pixel_features, sample_targets, test_size=0.3, random_state=42)
    
    rf_pixel = RandomForestClassifier(n_estimators=50, random_state=42)
    rf_pixel.fit(X_train_pixel, y_train)
    pixel_accuracy = rf_pixel.score(X_test_pixel, y_test)
    
    # Plot feature importance
    ax11 = fig.add_subplot(3, 4, 11)
    feature_names = ['Total Energy', 'Low Freq %', 'High Freq %', 'Radial 1', 'Radial 2', 'Radial 3']
    importances = rf_spectral.feature_importances_
    ax11.bar(range(len(importances)), importances)
    ax11.set_xticks(range(len(importances)))
    ax11.set_xticklabels(feature_names, rotation=45, ha='right')
    ax11.set_ylabel('Importance')
    ax11.set_title('Spectral Feature Importance')
    ax11.grid(True, alpha=0.3)
    
    # Plot accuracy comparison
    ax12 = fig.add_subplot(3, 4, 12)
    methods = ['Spectral Features', 'Pixel Features']
    accuracies = [spectral_accuracy, pixel_accuracy]
    bars = ax12.bar(methods, accuracies, color=['orange', 'skyblue'])
    ax12.set_ylabel('Accuracy')
    ax12.set_title('Classification Accuracy')
    ax12.set_ylim(0, 1)
    
    # Add accuracy values on bars
    for bar, acc in zip(bars, accuracies):
        ax12.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
                 f'{acc:.3f}', ha='center', va='bottom')
    
    ax12.grid(True, alpha=0.3)
    
    print(f"   Spectral features: {len(spectral_features[0])} dimensions vs {pixel_features.shape[1]} pixels")
    print(f"   Spectral accuracy: {spectral_accuracy:.3f}")
    print(f"   Pixel accuracy: {pixel_accuracy:.3f}")
    print(f"   Frequency domain often provides more informative features!")
    
    plt.tight_layout()
    plt.show()
    
    return {
        'composite_signal': composite_signal,
        'fft_signal': fft_signal,
        'freqs': freqs,
        'image': image,
        'fft_image': fft_image,
        'spectral_features': spectral_features,
        'spectral_accuracy': spectral_accuracy,
        'pixel_accuracy': pixel_accuracy
    }

fourier_results = demonstrate_fourier_analysis()