[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GabbyTab/boofun/blob/main/notebooks/lecture6_spectral_concentration.ipynb)

# CS 294-92: Lecture 6 - Spectral Concentration and Low-Degree Learning

**Instructor:** Avishay Tal  
**Scribe & Notebook by:** Gabriel Taboada  
**Reference:** O'Donnell, *Analysis of Boolean Functions*, Chapter 3

---

## Overview

This notebook explores the connection between Boolean function analysis and learning theory:

1. **Spectral Concentration**: When is a function's Fourier weight concentrated on low-degree coefficients?
2. **Decision Trees**: How decision tree depth relates to spectral concentration
3. **PAC Learning**: The LMN Theorem for learning functions with spectral concentration
4. **Fourier Coefficient Estimation**: Using samples to estimate Fourier coefficients

---

In [None]:
# Install/upgrade boofun (required for Colab)
# This ensures you have the latest version with all features
!pip install --upgrade boofun -q

import boofun as bf
print(f"BooFun version: {bf.__version__}")

In [None]:
# Setup
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.insert(0, '../src')

import boofun as bf
from boofun.analysis import learning, complexity

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

print("boofun loaded - using clean API")
print(f"  bf.majority(5).total_influence() = {bf.majority(5).total_influence()}")

## 1. Spectral Concentration

**Definition:** A function $f$ is **$\varepsilon$-concentrated** on degree $\leq k$ if:
$$\mathbf{W}^{>k}[f] := \sum_{|S| > k} \hat{f}(S)^2 \leq \varepsilon$$

This means the "Fourier weight" on high-degree coefficients is small.

### Example: Decision Trees and Spectral Concentration

**Theorem (O'Donnell 3.2):** A depth-$d$ decision tree has all its Fourier mass on degree $\leq d$.

In [None]:
# Demonstrate spectral concentration for different functions
# Using clean API: bf.dictator(), bf.majority(), etc.
functions = [
    ("Dictator x₀", bf.dictator(0, 4)),    # Depth 1
    ("Majority-5", bf.majority(5)),         # Complex
    ("Parity-4", bf.parity(4)),             # Max degree
    ("Tribes(2,3)", bf.tribes(2, 3)),       # DNF
]

print("Spectral Concentration Analysis")
print("=" * 70)
print(f"{'Function':<15} {'DT depth':>10} {'W≤1':>10} {'W≤2':>10} {'W≤3':>10}")
print("-" * 70)

for name, f in functions:
    dt_depth = complexity.decision_tree_depth(f)
    
    # Clean API: f.W_leq(k) for spectral concentration up to degree k
    conc = [f.W_leq(k) for k in [1, 2, 3]]
    
    print(f"{name:<15} {dt_depth:>10} {conc[0]:>10.4f} {conc[1]:>10.4f} {conc[2]:>10.4f}")

print("\n💡 Higher DT depth → spectral weight spreads to higher degrees")

---
## 2. Fourier Coefficient Estimation

**Lemma (O'Donnell 3.30):** Given $m = O(\log(1/\delta)/\varepsilon^2)$ samples, we can estimate $\hat{f}(S)$ with error $\leq \varepsilon$ with probability $\geq 1 - \delta$.

The empirical estimator is:
$$\tilde{f}(S) = \frac{1}{m} \sum_{i=1}^{m} f(x^{(i)}) \chi_S(x^{(i)})$$

In [None]:
# Demonstrate Fourier coefficient estimation
from boofun.analysis.learning import FourierLearner

# Create a test function using clean API
f = bf.majority(4)
true_coeffs = f.fourier()  # Clean: f.fourier() instead of SpectralAnalyzer(f).fourier_expansion()

# Estimate coefficients with different sample sizes
sample_sizes = [50, 200, 1000]

print("Fourier Coefficient Estimation (Majority-4)")
print("=" * 70)
print(f"{'Subset':<10} {'True':>12} " + "".join(f"{'m='+str(m):>15}" for m in sample_sizes))
print("-" * 70)

# Show estimation for a few important subsets
subsets_to_show = [0, 1, 2, 4, 8, 3, 5, 15]  # Various subset masks

for s in subsets_to_show:
    subset = [i for i in range(4) if (s >> (3-i)) & 1]
    true_val = true_coeffs[s]
    
    row = f"{str(subset):<10} {true_val:>12.4f}"
    
    for m in sample_sizes:
        # Generate samples
        rng = np.random.default_rng(42)
        samples = []
        labels = []
        
        for _ in range(m):
            x = rng.integers(0, 16)
            y = 1 - 2 * int(f.evaluate(x))  # Convert to ±1
            samples.append(x)
            labels.append(y)
        
        # Estimate using empirical formula
        est = 0.0
        for x, y in zip(samples, labels):
            chi_s = 1 - 2 * (bin(x & s).count("1") % 2)
            est += y * chi_s
        est /= m
        
        error = abs(est - true_val)
        row += f" {est:>8.4f}±{error:.4f}"
    
    print(row)

---
## 3. The LMN Theorem (PAC Learning)

**Theorem (Linial-Mansour-Nisan, 1993):** 
Let $\mathcal{C}$ be a concept class such that every $f \in \mathcal{C}$ is $\varepsilon$-concentrated on Fourier coefficients of degree $\leq k$.
Then $\mathcal{C}$ is $(\varepsilon, \delta)$-PAC learnable in time $\text{poly}(n^k, 1/\varepsilon, \log(1/\delta))$.

### Learning Algorithm

1. Estimate all degree-$\leq k$ Fourier coefficients using samples
2. Construct hypothesis: $h(x) = \text{sgn}\left(\sum_{|S| \leq k} \tilde{f}(S) \chi_S(x)\right)$
3. Fourier concentration bounds the approximation error

In [None]:
# Demonstrate the LMN learning algorithm
from boofun.analysis.learning import LowDegreeLearner

# Create a depth-2 decision tree (has spectral concentration at degree ≤ 2)
# (x₁ AND x₂) OR (x₃ AND x₄)
tt = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1]
target = bf.create(tt)

print("LMN Learning Example")
print("=" * 60)
print(f"Target: (x₁ AND x₂) OR (x₃ AND x₄)")
print(f"Decision tree depth: {complexity.decision_tree_depth(target)}")
print()

# Show spectral concentration using clean API: f.W_leq(k)
for k in [1, 2, 3, 4]:
    conc = target.W_leq(k)  # Clean: target.W_leq(k) instead of analyzer.spectral_concentration(k)
    print(f"W≤{k} = {conc:.4f}")

# Use the learner
learner = LowDegreeLearner(n_vars=4, max_degree=2)
samples = 200

# Generate training data
rng = np.random.default_rng(42)
X = rng.integers(0, 16, size=samples)
y = np.array([1 - 2 * int(target.evaluate(x)) for x in X])

# Train
learner.fit(X, y)

# Test accuracy
correct = 0
for x in range(16):
    true_label = 1 - 2 * int(target.evaluate(x))
    pred_label = learner.predict(x)
    if pred_label == true_label:
        correct += 1

print(f"\nLearning accuracy: {correct}/16 = {correct/16:.2%}")

---
## 4. The Goldreich-Levin Algorithm

**Problem:** Find all "heavy" Fourier coefficients without enumerating all $2^n$ subsets.

**Theorem (Goldreich-Levin, 1989):** There exists an algorithm that, given oracle access to $f$, finds all $S$ with $|\hat{f}(S)| \geq \tau$ using $O(n/\tau^4)$ queries.

### Key Idea: Self-Correction via Restrictions

For a random subset $T \subseteq [n]$:
$$\mathbf{E}_{T,b}[\hat{f|_{T=b}}(S|_{\bar{T}})] = \hat{f}(S)$$

This allows recursively finding heavy coefficients by restricting variables.

In [None]:
# Demonstrate Goldreich-Levin algorithm
from boofun.analysis.learning import GoldreichLevin

# Create a sparse function using clean API
xor = bf.parity(4)

print("Goldreich-Levin Algorithm Demo")
print("=" * 60)
print("Target: XOR (Parity) on 4 bits")
print(f"Sparsity: {xor.sparsity()} (only one non-zero Fourier coefficient)")
print()

# True Fourier coefficients using clean API
true_coeffs = xor.fourier()

print("True heavy coefficients (|f̂(S)| > 0.1):")
for s in range(len(true_coeffs)):
    if abs(true_coeffs[s]) > 0.1:
        subset = [i for i in range(4) if (s >> (3-i)) & 1]
        print(f"  S={subset}: f̂(S) = {true_coeffs[s]:.4f}")

# Use Goldreich-Levin to find heavy coefficients
gl = GoldreichLevin(xor.n_vars, threshold=0.3)

# Define oracle function
def oracle(x):
    return 1 - 2 * int(xor.evaluate(x))

heavy = gl.find_heavy_coefficients(oracle)

print(f"\nGoldreich-Levin found {len(heavy)} heavy coefficient(s):")
for s, coeff in heavy:
    subset = [i for i in range(4) if (s >> (3-i)) & 1]
    print(f"  S={subset}: estimated f̂(S) = {coeff:.4f}")

---
## Summary

### Key Concepts

1. **Spectral Concentration**: Functions with bounded decision tree depth have Fourier weight concentrated on low degrees

2. **LMN Theorem**: If a function class has spectral concentration at degree $k$, it's PAC-learnable in time $n^{O(k)}$

3. **Fourier Coefficient Estimation**: $O(\log(1/\delta)/\varepsilon^2)$ samples suffice to estimate any $\hat{f}(S)$ within $\varepsilon$

4. **Goldreich-Levin**: Find heavy Fourier coefficients efficiently without enumeration

### Corollaries (from lecture notes)

- **Depth-d decision trees**: Learnable in time $n^{O(d)}$
- **Size-s decision trees**: Learnable in time $n^{O(\log s)}$
- **Linear Threshold Functions**: Learnable in time $n^{O(1/\varepsilon^2)}$

### Open Questions

- Can depth-$d$ decision trees be learned in $\text{poly}(n, 2^d)$ time?
- Can $k$-juntas be learned in $\text{poly}(n)$ time for $k = \log n$?
- Efficient learning of small DNF/CNF formulas?