In [None]:
# ============================================================================
# C03: Statistical Significance for Connectivity
# ============================================================================
#
# This notebook covers how to determine whether connectivity values are
# statistically significant. We'll learn to generate surrogate data,
# build null distributions, and apply proper multiple comparisons correction.
#
# Duration: ~70 minutes
# Prerequisites: C02 (Connectivity Matrices), basic statistics
#
# ============================================================================

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.signal import butter, filtfilt, hilbert
from scipy.fft import fft, ifft
from typing import Any, Dict, List, Optional, Tuple
from numpy.typing import NDArray

# Import from local src
import sys
sys.path.insert(0, '../../..')

from src.colors import COLORS

# Define color shortcuts for this notebook
PRIMARY_BLUE = COLORS['signal_1']      # Sky Blue
PRIMARY_RED = COLORS['negative']        # Coral Red
PRIMARY_GREEN = COLORS['signal_3']      # Sage Green
SECONDARY_PURPLE = COLORS['signal_5']   # Lavender
SECONDARY_ORANGE = COLORS['signal_4']   # Golden
SUBJECT_1 = COLORS['signal_1']          # Sky Blue
SUBJECT_2 = COLORS['signal_2']          # Rose Pink

# Plotting defaults
plt.rcParams['figure.facecolor'] = 'white'
plt.rcParams['axes.facecolor'] = 'white'
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

# Constants
fs = 256  # Sampling frequency (Hz)

print("Imports successful!")
print(f"NumPy version: {np.__version__}")
print(f"Sampling frequency: {fs} Hz")

## Section 1: Introduction ‚Äî Why Significance Matters

Connectivity metrics **always** give you a number. You compute PLV between two channels and get 0.35. But what does that mean? Is it "high"? "Low"? "Significant"?

The answer depends on what you would expect **by chance**. Even two completely unrelated signals will show some non-zero connectivity due to random fluctuations. The critical question is: *Is our observed value unlikely to occur by chance alone?*

### Why This Matters

Without proper statistical testing:
- **False positives**: You claim connectivity that isn't really there
- **False negatives**: You miss true connectivity
- **Non-reproducible results**: Your findings won't replicate

Scientific claims require statistical validation. This notebook teaches you how to do it **correctly**.

> **Key message**: *"A connectivity value without a p-value is just a number."*

---

## Section 2: The Null Hypothesis for Connectivity

In hypothesis testing, we define two competing hypotheses:

- **Null hypothesis (H‚ÇÄ)**: There is NO true connectivity between the signals. Any measured connectivity is due to chance.
- **Alternative hypothesis (H‚ÇÅ)**: True connectivity exists between the signals.

### What Does "No Connectivity" Look Like?

Under H‚ÇÄ, signals may have:
- Similar spectral properties (same frequency content)
- Similar amplitude characteristics
- **But NO consistent phase or amplitude relationship**

### The Testing Procedure

1. Determine the **distribution of connectivity under H‚ÇÄ** (null distribution)
2. Ask: Is our observed value unlikely under this distribution?
3. If unlikely (p < Œ±) ‚Üí Reject H‚ÇÄ ‚Üí Claim significant connectivity

The key challenge is: **How do we build the null distribution?**

In [None]:
# ============================================================================
# VISUALIZATION 1: Conceptual Null Distribution
# ============================================================================

# Create a conceptual null distribution
np.random.seed(42)
null_distribution = np.random.beta(2, 5, 1000) * 0.5 + 0.1  # Skewed towards low values
observed_value = 0.42

# Compute p-value
pvalue = np.mean(null_distribution >= observed_value)

fig, ax = plt.subplots(figsize=(10, 6))

# Plot histogram
n, bins, patches = ax.hist(null_distribution, bins=40, density=True, 
                            color=PRIMARY_BLUE, alpha=0.7, edgecolor='white')

# Color the tail
for i, (patch, left_edge) in enumerate(zip(patches, bins[:-1])):
    if left_edge >= observed_value:
        patch.set_facecolor(PRIMARY_RED)
        patch.set_alpha(0.8)

# Add observed value line
ax.axvline(observed_value, color=PRIMARY_RED, linewidth=3, linestyle='--',
           label=f'Observed = {observed_value}')

# Annotations
ax.annotate(f'p-value = {pvalue:.3f}\n(shaded area)', 
            xy=(observed_value + 0.02, 1.5),
            fontsize=12, fontweight='bold', color=PRIMARY_RED)

ax.set_xlabel('Connectivity Value (PLV)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Is Our Observation in the Tail of the Null Distribution?', 
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.set_xlim(0, 0.7)

plt.tight_layout()
plt.show()

print(f"Observed value: {observed_value}")
print(f"P-value: {pvalue:.3f}")
if pvalue < 0.05:
    print("‚Üí Result is SIGNIFICANT at Œ± = 0.05")
else:
    print("‚Üí Result is NOT significant at Œ± = 0.05")

---

## Section 3: Surrogate Data Methods

To build a null distribution, we need data that satisfies H‚ÇÄ ‚Äî signals with **no true connectivity**. We create this using **surrogate data**.

### What is Surrogate Data?

Surrogate data is artificial data that:
- **Preserves** certain properties of the original (e.g., power spectrum)
- **Destroys** the property we're testing (e.g., phase relationship)

### Common Surrogate Methods

| Method | Preserves | Destroys | Best For |
|--------|-----------|----------|----------|
| **Phase shuffling** | Power spectrum | Phase relationships | PLV, coherence |
| **Time shifting** | Amplitude, approximate spectrum | Temporal alignment | Quick checks |
| **Trial shuffling** | Individual trials | Trial pairing | Across-trial analyses |
| **AAFT** | Amplitude distribution + spectrum | Phase relationships | Strict tests |

### The Procedure

1. Generate surrogate data (many times)
2. Compute connectivity for each surrogate
3. Build histogram of surrogate connectivity values
4. This histogram IS the null distribution!

Let's implement the most common methods.

---

## Section 4: Phase Shuffling ‚Äî The Gold Standard

Phase shuffling is the **most common** method for testing phase-based connectivity (PLV, coherence). The idea is elegant:

### How It Works

1. Transform the signal to the frequency domain (FFT)
2. **Randomly shuffle the phases** while keeping magnitudes intact
3. Transform back to time domain (IFFT)

### What This Preserves and Destroys

‚úÖ **Preserves**: Power spectrum (all magnitudes unchanged)  
‚ùå **Destroys**: Any phase relationship between signals

### Why This Works

If there's true phase synchronization:
- The original phases have a consistent relationship
- Shuffling makes them random ‚Üí connectivity drops

If there's NO true synchronization:
- Phases were already random
- Shuffling makes no difference ‚Üí similar connectivity values

Let's implement it step by step.

In [None]:
# ============================================================================
# VISUALIZATION 2: Understanding Phase Shuffling Step by Step
# ============================================================================

def phase_shuffle(signal: NDArray[np.floating]) -> NDArray[np.floating]:
    """
    Create a phase-shuffled surrogate of a signal.
    
    Preserves the power spectrum while randomizing phase relationships.
    
    Parameters
    ----------
    signal : NDArray[np.floating]
        Input signal (1D array).
    
    Returns
    -------
    NDArray[np.floating]
        Phase-shuffled surrogate signal.
    """
    n = len(signal)
    
    # FFT
    spectrum = fft(signal)
    
    # Get magnitude and phase
    magnitude = np.abs(spectrum)
    
    # Generate random phases (symmetric for real output)
    random_phases = np.random.uniform(0, 2 * np.pi, n // 2 + 1)
    
    # Build symmetric phase array for real signal
    if n % 2 == 0:  # Even length
        new_phases = np.concatenate([
            [0],  # DC component (no phase)
            random_phases[1:-1],
            [0],  # Nyquist (no phase)
            -random_phases[-2:0:-1]  # Negative frequencies
        ])
    else:  # Odd length
        new_phases = np.concatenate([
            [0],  # DC component
            random_phases[1:],
            -random_phases[-1:0:-1]  # Negative frequencies
        ])
    
    # Reconstruct spectrum with new phases
    surrogate_spectrum = magnitude * np.exp(1j * new_phases)
    
    # Inverse FFT
    surrogate = np.real(ifft(surrogate_spectrum))
    
    return surrogate


# Create example signal
np.random.seed(42)
t = np.arange(0, 2, 1/fs)
original = np.sin(2 * np.pi * 10 * t) + 0.3 * np.sin(2 * np.pi * 25 * t)
original += 0.2 * np.random.randn(len(t))

# Create surrogate
surrogate = phase_shuffle(original)

# Compute spectra
freq = np.fft.fftfreq(len(original), 1/fs)
spectrum_original = np.abs(fft(original))
spectrum_surrogate = np.abs(fft(surrogate))

# Plot
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Time domain - original
axes[0, 0].plot(t[:256], original[:256], color=PRIMARY_BLUE, linewidth=1.5)
axes[0, 0].set_title('Original Signal', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Time (s)')
axes[0, 0].set_ylabel('Amplitude')
axes[0, 0].set_xlim(0, 1)

# Time domain - surrogate
axes[0, 1].plot(t[:256], surrogate[:256], color=SECONDARY_ORANGE, linewidth=1.5)
axes[0, 1].set_title('Phase-Shuffled Surrogate', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Time (s)')
axes[0, 1].set_ylabel('Amplitude')
axes[0, 1].set_xlim(0, 1)

# Frequency domain - original
pos_freq = freq[:len(freq)//2]
axes[1, 0].plot(pos_freq, spectrum_original[:len(freq)//2], 
                color=PRIMARY_BLUE, linewidth=1.5)
axes[1, 0].set_title('Power Spectrum (Original)', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Frequency (Hz)')
axes[1, 0].set_ylabel('Magnitude')
axes[1, 0].set_xlim(0, 50)

# Frequency domain - surrogate
axes[1, 1].plot(pos_freq, spectrum_surrogate[:len(freq)//2], 
                color=SECONDARY_ORANGE, linewidth=1.5)
axes[1, 1].set_title('Power Spectrum (Surrogate)', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Frequency (Hz)')
axes[1, 1].set_ylabel('Magnitude')
axes[1, 1].set_xlim(0, 50)

plt.suptitle('Phase Shuffling: Time Domain Changes, Spectrum Preserved', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("‚úì Original and surrogate look different in time domain")
print("‚úì But their power spectra are IDENTICAL!")
print(f"  Correlation of spectra: {np.corrcoef(spectrum_original, spectrum_surrogate)[0,1]:.6f}")

---

## Section 5: Time Shifting ‚Äî A Faster Alternative

Time shifting is simpler and faster than phase shuffling. It's useful for quick exploratory analyses.

### How It Works

1. Shift one signal by a **random time lag**
2. This breaks the temporal alignment between signals

### What This Preserves and Destroys

‚úÖ **Preserves**: Exact waveform, amplitude distribution  
‚ùå **Destroys**: Temporal alignment (and thus phase relationships)

### Pros and Cons

| Aspect | Assessment |
|--------|------------|
| Speed | ‚ö° Very fast |
| Simplicity | üëç Easy to implement |
| Spectrum preservation | ‚ö†Ô∏è Approximate (edge effects) |
| Statistical rigor | ‚ö†Ô∏è Less rigorous than phase shuffling |

### When to Use

- Quick exploratory analyses
- Very long signals where edge effects are negligible
- When computational speed matters

In [None]:
# ============================================================================
# VISUALIZATION 3: Time Shifting Method
# ============================================================================

def time_shift(signal: NDArray[np.floating], 
               min_shift: int = None,
               max_shift: int = None) -> NDArray[np.floating]:
    """
    Create a time-shifted surrogate of a signal.
    
    Parameters
    ----------
    signal : NDArray[np.floating]
        Input signal (1D array).
    min_shift : int, optional
        Minimum shift (samples). Default: 10% of signal length.
    max_shift : int, optional
        Maximum shift (samples). Default: 90% of signal length.
    
    Returns
    -------
    NDArray[np.floating]
        Time-shifted surrogate signal (circular shift).
    """
    n = len(signal)
    
    if min_shift is None:
        min_shift = n // 10
    if max_shift is None:
        max_shift = 9 * n // 10
    
    # Random shift
    shift = np.random.randint(min_shift, max_shift)
    
    # Circular shift (wraps around)
    surrogate = np.roll(signal, shift)
    
    return surrogate


# Create example with two related signals
np.random.seed(42)
t = np.arange(0, 2, 1/fs)

# Signal 1
signal1 = np.sin(2 * np.pi * 10 * t) + 0.3 * np.random.randn(len(t))

# Signal 2: related to signal 1 (phase-locked)
signal2 = np.sin(2 * np.pi * 10 * t + np.pi/4) + 0.3 * np.random.randn(len(t))

# Create time-shifted version
signal2_shifted = time_shift(signal2)

# Plot
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

# Signal 1
axes[0].plot(t, signal1, color=SUBJECT_1, linewidth=1.2, label='Signal 1')
axes[0].set_ylabel('Amplitude', fontsize=11)
axes[0].set_title('Signal 1 (Reference)', fontsize=12, fontweight='bold')
axes[0].legend(loc='upper right')
axes[0].set_xlim(0, 1)

# Signal 2 original
axes[1].plot(t, signal2, color=SUBJECT_2, linewidth=1.2, label='Signal 2 (original)')
axes[1].set_ylabel('Amplitude', fontsize=11)
axes[1].set_title('Signal 2 ‚Äî Original (Phase-Locked to Signal 1)', fontsize=12, fontweight='bold')
axes[1].legend(loc='upper right')

# Signal 2 shifted
axes[2].plot(t, signal2_shifted, color=SECONDARY_ORANGE, linewidth=1.2, 
             label='Signal 2 (time-shifted)')
axes[2].set_ylabel('Amplitude', fontsize=11)
axes[2].set_xlabel('Time (s)', fontsize=11)
axes[2].set_title('Signal 2 ‚Äî Time-Shifted (Phase Relationship Broken)', fontsize=12, fontweight='bold')
axes[2].legend(loc='upper right')

plt.suptitle('Time Shifting: Breaks Temporal Alignment', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("Time shifting is a simple way to break phase relationships.")
print("The shifted signal has the exact same samples, just in a different order.")

---

## Section 6: Building the Null Distribution

Now we combine surrogate methods with a connectivity metric to build a proper null distribution.

### The Process

1. **Compute observed connectivity** between original signals
2. **Generate N surrogates** (typically 1000-10000)
3. **Compute connectivity** for each surrogate pair
4. **The distribution of surrogate values = null distribution**

### Why N = 1000?

The number of surrogates determines the **precision** of your p-value:

| N surrogates | Minimum p-value | Precision |
|--------------|-----------------|-----------|
| 100 | 0.01 | ¬±0.01 |
| 1000 | 0.001 | ¬±0.001 |
| 10000 | 0.0001 | ¬±0.0001 |

For typical significance threshold Œ± = 0.05, N = 1000 is usually sufficient.

Let's implement this with PLV (Phase Locking Value).

In [None]:
# ============================================================================
# VISUALIZATION 4: Building a Null Distribution with PLV
# ============================================================================

def compute_plv(signal1: NDArray[np.floating], 
                signal2: NDArray[np.floating]) -> float:
    """
    Compute Phase Locking Value between two signals.
    
    Parameters
    ----------
    signal1 : NDArray[np.floating]
        First signal.
    signal2 : NDArray[np.floating]
        Second signal.
    
    Returns
    -------
    float
        PLV value between 0 and 1.
    """
    # Get instantaneous phases
    phase1 = np.angle(hilbert(signal1))
    phase2 = np.angle(hilbert(signal2))
    
    # Compute phase difference
    phase_diff = phase1 - phase2
    
    # PLV is the mean resultant length
    plv = np.abs(np.mean(np.exp(1j * phase_diff)))
    
    return plv


def build_null_distribution(signal1: NDArray[np.floating],
                            signal2: NDArray[np.floating],
                            n_surrogates: int = 1000,
                            method: str = 'phase_shuffle') -> NDArray[np.floating]:
    """
    Build null distribution of PLV using surrogate data.
    
    Parameters
    ----------
    signal1 : NDArray[np.floating]
        First signal.
    signal2 : NDArray[np.floating]
        Second signal.
    n_surrogates : int
        Number of surrogates to generate.
    method : str
        'phase_shuffle' or 'time_shift'.
    
    Returns
    -------
    NDArray[np.floating]
        Array of PLV values under the null hypothesis.
    """
    null_values = np.zeros(n_surrogates)
    
    for i in range(n_surrogates):
        # Create surrogate of signal2
        if method == 'phase_shuffle':
            surrogate = phase_shuffle(signal2)
        else:
            surrogate = time_shift(signal2)
        
        # Compute PLV
        null_values[i] = compute_plv(signal1, surrogate)
    
    return null_values


# Create bandpass filter for alpha band (8-12 Hz)
def bandpass_filter(signal: NDArray[np.floating], 
                    low: float, high: float, 
                    fs: int) -> NDArray[np.floating]:
    """Apply bandpass filter to signal."""
    nyq = fs / 2
    b, a = butter(4, [low/nyq, high/nyq], btype='band')
    return filtfilt(b, a, signal)


# Create test signals: weakly phase-locked
np.random.seed(42)
t = np.arange(0, 5, 1/fs)  # 5 seconds

# Base oscillation
alpha = np.sin(2 * np.pi * 10 * t)

# Signal 1: alpha + noise
signal1 = alpha + 0.5 * np.random.randn(len(t))
signal1 = bandpass_filter(signal1, 8, 12, fs)

# Signal 2: phase-shifted alpha + noise (weak coupling)
phase_jitter = 0.3 * np.random.randn(len(t))  # Add some phase jitter
signal2 = np.sin(2 * np.pi * 10 * t + np.pi/3 + np.cumsum(phase_jitter) * 0.01)
signal2 = signal2 + 0.5 * np.random.randn(len(t))
signal2 = bandpass_filter(signal2, 8, 12, fs)

# Compute observed PLV
observed_plv = compute_plv(signal1, signal2)

# Build null distribution (use fewer for demo speed)
print("Building null distribution (500 surrogates)...")
null_dist = build_null_distribution(signal1, signal2, n_surrogates=500, method='phase_shuffle')

# Compute p-value
p_value = np.mean(null_dist >= observed_plv)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: signals
axes[0].plot(t[:512], signal1[:512], color=SUBJECT_1, linewidth=1.2, label='Signal 1', alpha=0.8)
axes[0].plot(t[:512], signal2[:512], color=SUBJECT_2, linewidth=1.2, label='Signal 2', alpha=0.8)
axes[0].set_xlabel('Time (s)', fontsize=11)
axes[0].set_ylabel('Amplitude', fontsize=11)
axes[0].set_title('Filtered Signals (Alpha Band: 8-12 Hz)', fontsize=12, fontweight='bold')
axes[0].legend(loc='upper right')
axes[0].set_xlim(0, 2)

# Right: null distribution
n, bins, patches = axes[1].hist(null_dist, bins=40, density=True, 
                                 color=PRIMARY_BLUE, alpha=0.7, edgecolor='white')

# Color the tail
for patch, left_edge in zip(patches, bins[:-1]):
    if left_edge >= observed_plv:
        patch.set_facecolor(PRIMARY_RED)
        patch.set_alpha(0.8)

axes[1].axvline(observed_plv, color=PRIMARY_RED, linewidth=3, linestyle='--',
                label=f'Observed PLV = {observed_plv:.3f}')
axes[1].set_xlabel('PLV (Phase Locking Value)', fontsize=11)
axes[1].set_ylabel('Density', fontsize=11)
axes[1].set_title('Null Distribution from Phase-Shuffled Surrogates', fontsize=12, fontweight='bold')
axes[1].legend(loc='upper right')

# Add p-value annotation
significance = "SIGNIFICANT" if p_value < 0.05 else "NOT significant"
axes[1].annotate(f'p = {p_value:.3f}\n({significance} at Œ±=0.05)', 
                 xy=(observed_plv, axes[1].get_ylim()[1] * 0.8),
                 fontsize=12, fontweight='bold', 
                 color=PRIMARY_GREEN if p_value < 0.05 else PRIMARY_RED)

plt.tight_layout()
plt.show()

print(f"\nObserved PLV: {observed_plv:.4f}")
print(f"Null distribution: mean = {np.mean(null_dist):.4f}, std = {np.std(null_dist):.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: {significance} at Œ± = 0.05")

---

## Section 7: Computing P-Values

The **p-value** is the probability of observing a value at least as extreme as our observed value, assuming the null hypothesis is true.

### Formula

For connectivity (where higher = more evidence of connectivity):

$$p = \frac{\text{Number of surrogates} \geq \text{observed}}{N_{\text{surrogates}}}$$

### Interpretation

| P-value | Interpretation |
|---------|---------------|
| p < 0.001 | Strong evidence against H‚ÇÄ |
| p < 0.01 | Moderate evidence against H‚ÇÄ |
| p < 0.05 | Weak evidence against H‚ÇÄ |
| p ‚â• 0.05 | Insufficient evidence to reject H‚ÇÄ |

### Important Notes

‚ö†Ô∏è **P-value is NOT the probability that H‚ÇÄ is true!**

It's the probability of getting data this extreme IF H‚ÇÄ were true.

‚ö†Ô∏è **Threshold Œ± is chosen BEFORE looking at data!**

Common choices: 0.05, 0.01, 0.001

In [None]:
# ============================================================================
# VISUALIZATION 5: P-Value Computation
# ============================================================================

def compute_pvalue(observed: float, 
                   null_distribution: NDArray[np.floating],
                   alternative: str = 'greater') -> float:
    """
    Compute p-value from null distribution.
    
    Parameters
    ----------
    observed : float
        Observed connectivity value.
    null_distribution : NDArray[np.floating]
        Null distribution values.
    alternative : str
        'greater': test if observed > null (typical for connectivity)
        'less': test if observed < null
        'two-sided': test if observed differs from null
    
    Returns
    -------
    float
        P-value.
    """
    n = len(null_distribution)
    
    if alternative == 'greater':
        p = (np.sum(null_distribution >= observed) + 1) / (n + 1)
    elif alternative == 'less':
        p = (np.sum(null_distribution <= observed) + 1) / (n + 1)
    else:  # two-sided
        mean_null = np.mean(null_distribution)
        deviation = np.abs(observed - mean_null)
        p = (np.sum(np.abs(null_distribution - mean_null) >= deviation) + 1) / (n + 1)
    
    return p


# Demonstrate with different observed values
np.random.seed(42)
demo_null = np.random.beta(2, 8, 1000) * 0.5  # Null distribution

observed_values = [0.15, 0.25, 0.35, 0.45]
colors = [PRIMARY_BLUE, SECONDARY_ORANGE, SECONDARY_PURPLE, PRIMARY_RED]

fig, ax = plt.subplots(figsize=(12, 6))

# Plot null distribution
ax.hist(demo_null, bins=50, density=True, color='lightgray', 
        edgecolor='white', alpha=0.7, label='Null distribution')

# Add observed values
for obs, color in zip(observed_values, colors):
    pval = compute_pvalue(obs, demo_null)
    ax.axvline(obs, color=color, linewidth=2.5, linestyle='--',
               label=f'Obs = {obs:.2f}, p = {pval:.3f}')

ax.set_xlabel('Connectivity Value', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Different Observed Values ‚Üí Different P-Values', fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=10)

# Add significance threshold
ax.axhline(y=0, color='black', linewidth=0.5)

plt.tight_layout()
plt.show()

print("Observations further in the tail ‚Üí smaller p-values")
print("The further from the null, the more 'surprising' the result")

---

## Section 8: The Multiple Comparisons Problem

In connectivity analysis, we often test **many pairs** simultaneously. With a 64-channel EEG:

$$\text{Number of pairs} = \frac{64 \times 63}{2} = 2016 \text{ pairs}$$

### The Problem

If we test each pair at Œ± = 0.05:
- Expected false positives = 2016 √ó 0.05 ‚âà **101 false connections!**

This is called the **multiple comparisons problem** or **family-wise error rate (FWER)** inflation.

### Visual Intuition

Imagine flipping a fair coin 2016 times. You'd expect about 101 heads by chance alone. Similarly, testing 2016 pairs will give ~101 "significant" results even when there's NO true connectivity!

### Solutions

We need to **correct** our significance threshold. Two main approaches:

1. **Bonferroni correction**: Control the family-wise error rate (FWER)
2. **FDR correction**: Control the false discovery rate (FDR)

In [None]:
# ============================================================================
# VISUALIZATION 6: The Multiple Comparisons Problem
# ============================================================================

np.random.seed(42)

# Simulate testing 100 pairs with NO true connectivity
n_tests = 100
alpha = 0.05

# Generate p-values under null (uniform distribution)
# When H0 is true, p-values are uniformly distributed between 0 and 1
pvalues_null = np.random.uniform(0, 1, n_tests)

# Count false positives
false_positives = np.sum(pvalues_null < alpha)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: p-value distribution
axes[0].hist(pvalues_null, bins=20, color=PRIMARY_BLUE, edgecolor='white', alpha=0.7)
axes[0].axvline(alpha, color=PRIMARY_RED, linewidth=2, linestyle='--', 
                label=f'Œ± = {alpha}')
axes[0].fill_between([0, alpha], 0, axes[0].get_ylim()[1] + 5, 
                      color=PRIMARY_RED, alpha=0.2, label=f'Rejected ({false_positives})')
axes[0].set_xlabel('P-value', fontsize=12)
axes[0].set_ylabel('Count', fontsize=12)
axes[0].set_title('P-Values When H‚ÇÄ is TRUE for ALL Tests', fontsize=12, fontweight='bold')
axes[0].legend()
axes[0].set_ylim(0, 15)

# Right: expected vs observed false positives
n_simulations = 1000
false_positive_counts = []

for _ in range(n_simulations):
    pvals = np.random.uniform(0, 1, n_tests)
    fp = np.sum(pvals < alpha)
    false_positive_counts.append(fp)

axes[1].hist(false_positive_counts, bins=range(0, 20), color=PRIMARY_RED, 
             edgecolor='white', alpha=0.7, density=True)
axes[1].axvline(n_tests * alpha, color='black', linewidth=2, linestyle='-',
                label=f'Expected: {n_tests * alpha:.0f}')
axes[1].set_xlabel('Number of False Positives', fontsize=12)
axes[1].set_ylabel('Probability', fontsize=12)
axes[1].set_title(f'False Positives Distribution ({n_tests} tests, Œ±={alpha})', 
                  fontsize=12, fontweight='bold')
axes[1].legend()

plt.suptitle('The Multiple Comparisons Problem: False Positives Accumulate!', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"With {n_tests} tests at Œ± = {alpha}:")
print(f"  Expected false positives: {n_tests * alpha:.0f}")
print(f"  Actual false positives (this simulation): {false_positives}")
print(f"\nWith 2016 EEG pairs: expected ~{int(2016 * 0.05)} false connections!")

---

## Section 9: Bonferroni Correction

The **Bonferroni correction** is the simplest and most conservative approach.

### The Idea

Divide the significance threshold by the number of tests:

$$\alpha_{\text{corrected}} = \frac{\alpha}{N_{\text{tests}}}$$

### Example

With 100 tests and Œ± = 0.05:

$$\alpha_{\text{corrected}} = \frac{0.05}{100} = 0.0005$$

### Properties

| Aspect | Assessment |
|--------|------------|
| Controls | Family-Wise Error Rate (FWER) |
| Conservative? | ‚úì Very conservative |
| Power | ‚Üì Reduced (misses true effects) |
| Best for | Few tests, need strict control |

### When to Use

- You have **few** tests (< 20)
- False positives are **very costly**
- You need to claim "at least one" significant result

### When NOT to Use

- Many tests (> 100) ‚Äî too conservative
- Exploratory analysis ‚Äî misses true effects

In [None]:
# ============================================================================
# VISUALIZATION 7: Bonferroni Correction
# ============================================================================

def bonferroni_correction(pvalues: NDArray[np.floating], 
                          alpha: float = 0.05) -> Tuple[NDArray[np.bool_], float]:
    """
    Apply Bonferroni correction to p-values.
    
    Parameters
    ----------
    pvalues : NDArray[np.floating]
        Array of p-values.
    alpha : float
        Desired family-wise error rate.
    
    Returns
    -------
    Tuple[NDArray[np.bool_], float]
        Boolean mask of significant tests, and corrected alpha.
    """
    n_tests = len(pvalues)
    alpha_corrected = alpha / n_tests
    significant = pvalues < alpha_corrected
    
    return significant, alpha_corrected


# Simulate scenario with some true effects
np.random.seed(42)
n_tests = 50
n_true_effects = 5  # 5 pairs with real connectivity

# Generate p-values
pvalues = np.random.uniform(0, 1, n_tests)
# Make some small (true effects)
true_effect_indices = np.random.choice(n_tests, n_true_effects, replace=False)
pvalues[true_effect_indices] = np.random.uniform(0.001, 0.03, n_true_effects)

# Apply corrections
alpha = 0.05
alpha_bonf = alpha / n_tests

# Uncorrected
sig_uncorrected = pvalues < alpha
# Bonferroni
sig_bonf, _ = bonferroni_correction(pvalues, alpha)

# Sort for visualization
sort_idx = np.argsort(pvalues)
pvalues_sorted = pvalues[sort_idx]

# Track which are true effects
is_true_effect = np.isin(np.arange(n_tests), true_effect_indices)
is_true_effect_sorted = is_true_effect[sort_idx]

# Plot
fig, ax = plt.subplots(figsize=(14, 6))

x = np.arange(n_tests)

# Plot all p-values
colors_bars = [PRIMARY_GREEN if te else PRIMARY_BLUE for te in is_true_effect_sorted]
bars = ax.bar(x, pvalues_sorted, color=colors_bars, edgecolor='white', alpha=0.7)

# Threshold lines
ax.axhline(alpha, color=SECONDARY_ORANGE, linewidth=2, linestyle='--',
           label=f'Uncorrected Œ± = {alpha}')
ax.axhline(alpha_bonf, color=PRIMARY_RED, linewidth=2, linestyle='-',
           label=f'Bonferroni Œ± = {alpha_bonf:.4f}')

ax.set_xlabel('Test (sorted by p-value)', fontsize=12)
ax.set_ylabel('P-value', fontsize=12)
ax.set_title('Bonferroni Correction: Very Conservative Threshold', fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=10)
ax.set_ylim(0, 0.1)
ax.set_xlim(-1, n_tests)

# Add legend for bar colors
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=PRIMARY_GREEN, alpha=0.7, label='True effect'),
                   Patch(facecolor=PRIMARY_BLUE, alpha=0.7, label='Null (no effect)')]
ax.legend(handles=legend_elements + ax.get_legend_handles_labels()[0], 
          loc='upper right', fontsize=10)

plt.tight_layout()
plt.show()

# Summary
print(f"Total tests: {n_tests}")
print(f"True effects: {n_true_effects}")
print(f"\nUncorrected (Œ± = {alpha}):")
print(f"  Significant: {np.sum(sig_uncorrected)}")
print(f"  True positives: {np.sum(sig_uncorrected & is_true_effect)}")
print(f"  False positives: {np.sum(sig_uncorrected & ~is_true_effect)}")
print(f"\nBonferroni (Œ± = {alpha_bonf:.5f}):")
print(f"  Significant: {np.sum(sig_bonf)}")
print(f"  True positives: {np.sum(sig_bonf & is_true_effect)}")
print(f"  False positives: {np.sum(sig_bonf & ~is_true_effect)}")

---

## Section 10: False Discovery Rate (FDR) Correction

**FDR correction** (Benjamini-Hochberg procedure) is less conservative and more commonly used in neuroimaging.

### What Does FDR Control?

Instead of controlling the probability of **any** false positive (FWER), FDR controls the **proportion** of false positives among all discoveries.

$$\text{FDR} = \mathbb{E}\left[\frac{\text{False Positives}}{\text{Total Discoveries}}\right]$$

### The Benjamini-Hochberg Procedure

1. **Sort** p-values from smallest to largest: $p_{(1)} \leq p_{(2)} \leq ... \leq p_{(N)}$
2. **Find** the largest $k$ such that: $p_{(k)} \leq \frac{k}{N} \cdot \alpha$
3. **Reject** all hypotheses with $p_{(i)} \leq p_{(k)}$

### Comparison with Bonferroni

| Aspect | Bonferroni | FDR (BH) |
|--------|------------|----------|
| Controls | FWER | FDR |
| Stringency | Very conservative | Moderate |
| Power | Low | Higher |
| Interpretation | "No false positives" | "‚â§5% of discoveries are false" |
| Best for | Confirmatory | Exploratory |

In [None]:
# ============================================================================
# VISUALIZATION 8: FDR Correction (Benjamini-Hochberg)
# ============================================================================

def fdr_correction(pvalues: NDArray[np.floating], 
                   alpha: float = 0.05) -> Tuple[NDArray[np.bool_], NDArray[np.floating]]:
    """
    Apply FDR correction using Benjamini-Hochberg procedure.
    
    Parameters
    ----------
    pvalues : NDArray[np.floating]
        Array of p-values.
    alpha : float
        Desired false discovery rate.
    
    Returns
    -------
    Tuple[NDArray[np.bool_], NDArray[np.floating]]
        Boolean mask of significant tests, and adjusted p-values.
    """
    n = len(pvalues)
    
    # Sort p-values and keep track of original order
    sorted_idx = np.argsort(pvalues)
    sorted_pvals = pvalues[sorted_idx]
    
    # Compute BH threshold for each rank
    ranks = np.arange(1, n + 1)
    bh_threshold = ranks / n * alpha
    
    # Find the largest p-value below its threshold
    below_threshold = sorted_pvals <= bh_threshold
    if np.any(below_threshold):
        max_below = np.max(np.where(below_threshold)[0])
        reject_sorted = np.arange(n) <= max_below
    else:
        reject_sorted = np.zeros(n, dtype=bool)
    
    # Map back to original order
    reject = np.zeros(n, dtype=bool)
    reject[sorted_idx] = reject_sorted
    
    # Compute adjusted p-values
    adjusted = np.zeros(n)
    adjusted[sorted_idx] = np.minimum.accumulate(
        (sorted_pvals * n / ranks)[::-1]
    )[::-1]
    adjusted = np.minimum(adjusted, 1.0)
    
    return reject, adjusted


# Use same data as before
np.random.seed(42)
n_tests = 50
n_true_effects = 5

pvalues = np.random.uniform(0, 1, n_tests)
true_effect_indices = np.random.choice(n_tests, n_true_effects, replace=False)
pvalues[true_effect_indices] = np.random.uniform(0.001, 0.03, n_true_effects)

alpha = 0.05

# Apply corrections
sig_uncorr = pvalues < alpha
sig_bonf, alpha_bonf = bonferroni_correction(pvalues, alpha)
sig_fdr, adj_pvals = fdr_correction(pvalues, alpha)

# Track true effects
is_true_effect = np.isin(np.arange(n_tests), true_effect_indices)

# Sort for visualization
sort_idx = np.argsort(pvalues)
pvalues_sorted = pvalues[sort_idx]
is_true_effect_sorted = is_true_effect[sort_idx]

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: BH procedure visualization
x = np.arange(n_tests)
bh_line = (x + 1) / n_tests * alpha

colors_bars = [PRIMARY_GREEN if te else PRIMARY_BLUE for te in is_true_effect_sorted]
axes[0].bar(x, pvalues_sorted, color=colors_bars, edgecolor='white', alpha=0.7)
axes[0].plot(x, bh_line, color=PRIMARY_RED, linewidth=2, label='BH threshold line')
axes[0].axhline(alpha_bonf, color=SECONDARY_ORANGE, linewidth=2, linestyle='--',
                label=f'Bonferroni (Œ± = {alpha_bonf:.4f})')

axes[0].set_xlabel('Rank', fontsize=12)
axes[0].set_ylabel('P-value', fontsize=12)
axes[0].set_title('Benjamini-Hochberg: Adaptive Threshold', fontsize=12, fontweight='bold')
axes[0].legend(loc='upper left', fontsize=10)
axes[0].set_ylim(0, 0.15)

# Right: Comparison of methods
methods = ['Uncorrected', 'Bonferroni', 'FDR (BH)']
true_positives = [np.sum(sig_uncorr & is_true_effect),
                  np.sum(sig_bonf & is_true_effect),
                  np.sum(sig_fdr & is_true_effect)]
false_positives = [np.sum(sig_uncorr & ~is_true_effect),
                   np.sum(sig_bonf & ~is_true_effect),
                   np.sum(sig_fdr & ~is_true_effect)]

x_bar = np.arange(len(methods))
width = 0.35

bars1 = axes[1].bar(x_bar - width/2, true_positives, width, 
                    label='True Positives', color=PRIMARY_GREEN, alpha=0.8)
bars2 = axes[1].bar(x_bar + width/2, false_positives, width, 
                    label='False Positives', color=PRIMARY_RED, alpha=0.8)

axes[1].axhline(n_true_effects, color='black', linestyle='--', alpha=0.5,
                label=f'Max possible TP = {n_true_effects}')

axes[1].set_xlabel('Correction Method', fontsize=12)
axes[1].set_ylabel('Count', fontsize=12)
axes[1].set_title('Comparison: True vs False Positives', fontsize=12, fontweight='bold')
axes[1].set_xticks(x_bar)
axes[1].set_xticklabels(methods)
axes[1].legend(loc='upper right', fontsize=10)
axes[1].set_ylim(0, max(true_positives + false_positives) + 1)

# Add value labels on bars
for bar in bars1:
    height = bar.get_height()
    axes[1].annotate(f'{int(height)}', xy=(bar.get_x() + bar.get_width()/2, height),
                     ha='center', va='bottom', fontsize=10, fontweight='bold')
for bar in bars2:
    height = bar.get_height()
    axes[1].annotate(f'{int(height)}', xy=(bar.get_x() + bar.get_width()/2, height),
                     ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.suptitle('FDR vs Bonferroni: Better Balance of Power vs Control', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"True effects in data: {n_true_effects}")
print(f"\nMethod comparison:")
print(f"  Uncorrected: {np.sum(sig_uncorr)} significant ({np.sum(sig_uncorr & is_true_effect)} TP, {np.sum(sig_uncorr & ~is_true_effect)} FP)")
print(f"  Bonferroni:  {np.sum(sig_bonf)} significant ({np.sum(sig_bonf & is_true_effect)} TP, {np.sum(sig_bonf & ~is_true_effect)} FP)")
print(f"  FDR (BH):    {np.sum(sig_fdr)} significant ({np.sum(sig_fdr & is_true_effect)} TP, {np.sum(sig_fdr & ~is_true_effect)} FP)")

---

## Section 11: Permutation Testing for Group Comparisons

So far we tested if connectivity is **different from zero**. But often we want to compare **two groups**:

- Is connectivity higher in patients than controls?
- Is there a difference between conditions?

### Permutation Testing

Permutation testing is a **non-parametric** approach that makes no assumptions about the data distribution.

### The Procedure

1. Compute the **observed difference** between groups (e.g., mean connectivity)
2. **Pool** all observations together
3. **Randomly permute** group labels (shuffle who belongs to which group)
4. Compute the difference for this permuted data
5. Repeat steps 3-4 many times (e.g., 1000)
6. The distribution of permuted differences = **null distribution**
7. Compare observed difference to null distribution ‚Üí p-value

### Why This Works

Under H‚ÇÄ, group membership doesn't matter. So shuffling labels should give similar results to the true labels. If the observed difference is extreme, it's unlikely to be due to chance.

In [None]:
# ============================================================================
# VISUALIZATION 9: Permutation Testing for Group Comparison
# ============================================================================

def permutation_test(group1: NDArray[np.floating],
                     group2: NDArray[np.floating],
                     n_permutations: int = 1000,
                     statistic: str = 'mean_diff') -> Tuple[float, float, NDArray[np.floating]]:
    """
    Perform a permutation test comparing two groups.
    
    Parameters
    ----------
    group1 : NDArray[np.floating]
        Connectivity values for group 1.
    group2 : NDArray[np.floating]
        Connectivity values for group 2.
    n_permutations : int
        Number of permutations.
    statistic : str
        'mean_diff' or 't_stat'.
    
    Returns
    -------
    Tuple[float, float, NDArray[np.floating]]
        Observed statistic, p-value, and null distribution.
    """
    n1, n2 = len(group1), len(group2)
    pooled = np.concatenate([group1, group2])
    
    # Compute observed statistic
    if statistic == 'mean_diff':
        observed = np.mean(group1) - np.mean(group2)
    else:
        observed = stats.ttest_ind(group1, group2)[0]
    
    # Generate null distribution
    null_stats = np.zeros(n_permutations)
    
    for i in range(n_permutations):
        # Shuffle and split
        np.random.shuffle(pooled)
        perm_g1 = pooled[:n1]
        perm_g2 = pooled[n1:]
        
        if statistic == 'mean_diff':
            null_stats[i] = np.mean(perm_g1) - np.mean(perm_g2)
        else:
            null_stats[i] = stats.ttest_ind(perm_g1, perm_g2)[0]
    
    # Two-sided p-value
    p_value = np.mean(np.abs(null_stats) >= np.abs(observed))
    
    return observed, p_value, null_stats


# Simulate connectivity data from two groups
np.random.seed(42)

# Group 1 (e.g., controls): lower connectivity
group1_plv = np.random.beta(2, 5, 20) * 0.5 + 0.15
# Group 2 (e.g., patients): higher connectivity
group2_plv = np.random.beta(3, 4, 20) * 0.5 + 0.25

# Permutation test
observed_diff, pvalue, null_dist = permutation_test(group2_plv, group1_plv, n_permutations=1000)

# Also do parametric t-test for comparison
t_stat, t_pvalue = stats.ttest_ind(group2_plv, group1_plv)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Group comparison
positions = [1, 2]
bp = axes[0].boxplot([group1_plv, group2_plv], positions=positions, widths=0.5,
                      patch_artist=True)
bp['boxes'][0].set_facecolor(SUBJECT_1)
bp['boxes'][1].set_facecolor(SUBJECT_2)
for box in bp['boxes']:
    box.set_alpha(0.7)

axes[0].scatter(np.ones(len(group1_plv)) + np.random.randn(len(group1_plv)) * 0.05, 
                group1_plv, color=SUBJECT_1, alpha=0.6, s=50)
axes[0].scatter(np.ones(len(group2_plv)) * 2 + np.random.randn(len(group2_plv)) * 0.05, 
                group2_plv, color=SUBJECT_2, alpha=0.6, s=50)

axes[0].set_xticks([1, 2])
axes[0].set_xticklabels(['Group 1\n(Controls)', 'Group 2\n(Patients)'])
axes[0].set_ylabel('PLV', fontsize=12)
axes[0].set_title('Connectivity by Group', fontsize=12, fontweight='bold')

# Add means
axes[0].scatter([1, 2], [np.mean(group1_plv), np.mean(group2_plv)], 
                color='black', s=100, marker='D', zorder=5, label='Mean')
axes[0].legend()

# Right: Null distribution
axes[1].hist(null_dist, bins=40, density=True, color=PRIMARY_BLUE, 
             edgecolor='white', alpha=0.7)
axes[1].axvline(observed_diff, color=PRIMARY_RED, linewidth=3, linestyle='--',
                label=f'Observed diff = {observed_diff:.3f}')
axes[1].axvline(-observed_diff, color=PRIMARY_RED, linewidth=3, linestyle='--', alpha=0.5)

axes[1].set_xlabel('Difference (Group 2 - Group 1)', fontsize=12)
axes[1].set_ylabel('Density', fontsize=12)
axes[1].set_title('Permutation Null Distribution', fontsize=12, fontweight='bold')
axes[1].legend(loc='upper right')

plt.suptitle(f'Permutation Test: p = {pvalue:.3f} (t-test p = {t_pvalue:.3f})', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"Group 1 (Controls): mean = {np.mean(group1_plv):.3f}")
print(f"Group 2 (Patients): mean = {np.mean(group2_plv):.3f}")
print(f"Observed difference: {observed_diff:.3f}")
print(f"\nPermutation test p-value: {pvalue:.3f}")
print(f"Parametric t-test p-value: {t_pvalue:.3f}")
print(f"\n‚Üí {'Significant' if pvalue < 0.05 else 'Not significant'} difference at Œ± = 0.05")

---

## Section 12: Bootstrap Confidence Intervals

P-values tell us whether an effect exists. **Confidence intervals** tell us **how big** it is.

### What is Bootstrapping?

Bootstrapping is a resampling technique:

1. **Resample with replacement** from your data (same size as original)
2. Compute the statistic of interest
3. Repeat many times (e.g., 1000)
4. The distribution of statistics gives you uncertainty estimates

### Confidence Interval

A 95% confidence interval is the range from the 2.5th to 97.5th percentile of bootstrap samples.

**Interpretation**: If we repeated the experiment many times, 95% of the intervals would contain the true value.

### Why Bootstrapping for Connectivity?

- No assumptions about the distribution
- Works for complex statistics
- Provides intuitive uncertainty quantification

In [None]:
# ============================================================================
# VISUALIZATION 10: Bootstrap Confidence Intervals
# ============================================================================

def bootstrap_ci(data: NDArray[np.floating],
                 statistic: callable = np.mean,
                 n_bootstrap: int = 1000,
                 ci: float = 0.95) -> Tuple[float, float, float, NDArray[np.floating]]:
    """
    Compute bootstrap confidence interval for a statistic.
    
    Parameters
    ----------
    data : NDArray[np.floating]
        Input data array.
    statistic : callable
        Function to compute the statistic (e.g., np.mean).
    n_bootstrap : int
        Number of bootstrap resamples.
    ci : float
        Confidence level (e.g., 0.95 for 95%).
    
    Returns
    -------
    Tuple[float, float, float, NDArray[np.floating]]
        Point estimate, CI lower, CI upper, and bootstrap distribution.
    """
    n = len(data)
    point_estimate = statistic(data)
    
    # Bootstrap resampling
    bootstrap_stats = np.zeros(n_bootstrap)
    for i in range(n_bootstrap):
        resample = np.random.choice(data, size=n, replace=True)
        bootstrap_stats[i] = statistic(resample)
    
    # Percentile method for CI
    alpha = (1 - ci) / 2
    ci_lower = np.percentile(bootstrap_stats, alpha * 100)
    ci_upper = np.percentile(bootstrap_stats, (1 - alpha) * 100)
    
    return point_estimate, ci_lower, ci_upper, bootstrap_stats


# Simulate PLV measurements from multiple trials
np.random.seed(42)
n_trials = 30

# Subject 1: moderate connectivity
plv_trials_s1 = np.random.beta(4, 3, n_trials) * 0.6 + 0.2

# Subject 2: lower connectivity
plv_trials_s2 = np.random.beta(2, 4, n_trials) * 0.5 + 0.1

# Bootstrap for each subject
mean1, ci1_low, ci1_up, boot1 = bootstrap_ci(plv_trials_s1, n_bootstrap=1000)
mean2, ci2_low, ci2_up, boot2 = bootstrap_ci(plv_trials_s2, n_bootstrap=1000)

# Bootstrap for the difference
diff_trials = plv_trials_s1 - plv_trials_s2
mean_diff, ci_diff_low, ci_diff_up, boot_diff = bootstrap_ci(diff_trials, n_bootstrap=1000)

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Left: Raw data with CIs
x = [1, 2]
means = [mean1, mean2]
ci_lows = [ci1_low, ci2_low]
ci_ups = [ci1_up, ci2_up]

for i, (m, low, up, color) in enumerate(zip(means, ci_lows, ci_ups, [SUBJECT_1, SUBJECT_2])):
    axes[0].bar(x[i], m, color=color, alpha=0.7, width=0.5)
    axes[0].errorbar(x[i], m, yerr=[[m - low], [up - m]], 
                     color='black', capsize=5, capthick=2, linewidth=2)

axes[0].set_xticks([1, 2])
axes[0].set_xticklabels(['Subject 1', 'Subject 2'])
axes[0].set_ylabel('Mean PLV', fontsize=12)
axes[0].set_title('Mean Connectivity with 95% CI', fontsize=12, fontweight='bold')
axes[0].set_ylim(0, 0.8)

# Middle: Bootstrap distributions
axes[1].hist(boot1, bins=40, density=True, color=SUBJECT_1, alpha=0.6, label='Subject 1')
axes[1].hist(boot2, bins=40, density=True, color=SUBJECT_2, alpha=0.6, label='Subject 2')
axes[1].axvline(mean1, color=SUBJECT_1, linewidth=2, linestyle='--')
axes[1].axvline(mean2, color=SUBJECT_2, linewidth=2, linestyle='--')
axes[1].set_xlabel('Mean PLV', fontsize=12)
axes[1].set_ylabel('Density', fontsize=12)
axes[1].set_title('Bootstrap Distributions', fontsize=12, fontweight='bold')
axes[1].legend()

# Right: Difference distribution
axes[2].hist(boot_diff, bins=40, density=True, color=SECONDARY_PURPLE, 
             alpha=0.7, edgecolor='white')
axes[2].axvline(mean_diff, color=PRIMARY_RED, linewidth=3, linestyle='--',
                label=f'Mean diff = {mean_diff:.3f}')
axes[2].axvline(0, color='black', linewidth=2, linestyle='-', alpha=0.5)
axes[2].axvspan(ci_diff_low, ci_diff_up, color=SECONDARY_PURPLE, alpha=0.2,
                label=f'95% CI: [{ci_diff_low:.3f}, {ci_diff_up:.3f}]')
axes[2].set_xlabel('Difference (S1 - S2)', fontsize=12)
axes[2].set_ylabel('Density', fontsize=12)
axes[2].set_title('Difference: Does CI Include 0?', fontsize=12, fontweight='bold')
axes[2].legend(loc='upper left', fontsize=9)

plt.tight_layout()
plt.show()

print(f"Subject 1: mean = {mean1:.3f}, 95% CI = [{ci1_low:.3f}, {ci1_up:.3f}]")
print(f"Subject 2: mean = {mean2:.3f}, 95% CI = [{ci2_low:.3f}, {ci2_up:.3f}]")
print(f"\nDifference (S1 - S2):")
print(f"  Mean = {mean_diff:.3f}")
print(f"  95% CI = [{ci_diff_low:.3f}, {ci_diff_up:.3f}]")
if ci_diff_low > 0 or ci_diff_up < 0:
    print("  ‚Üí CI does NOT include 0 ‚Üí Significant difference!")
else:
    print("  ‚Üí CI includes 0 ‚Üí NOT significant")

---

## Section 13: Effect Size ‚Äî Beyond P-Values

P-values can be misleading! With large samples, even **tiny** effects become "significant." Effect size tells us **how meaningful** the effect is.

### Common Effect Size Measures

| Measure | Formula | Interpretation |
|---------|---------|----------------|
| Cohen's d | $d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}$ | 0.2 small, 0.5 medium, 0.8 large |
| Pearson's r | Correlation coefficient | Strength of relationship |
| Hedge's g | Corrected Cohen's d | Better for small samples |

### Why Effect Size Matters

- **Statistical significance ‚â† Practical significance**
- A "significant" PLV difference of 0.01 may be meaningless
- Always report both p-values AND effect sizes

In [None]:
# ============================================================================
# VISUALIZATION 11: Effect Size Demonstration
# ============================================================================

def cohens_d(group1: NDArray[np.floating], 
             group2: NDArray[np.floating]) -> float:
    """
    Compute Cohen's d effect size.
    
    Parameters
    ----------
    group1 : NDArray[np.floating]
        First group values.
    group2 : NDArray[np.floating]
        Second group values.
    
    Returns
    -------
    float
        Cohen's d effect size.
    """
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    
    # Pooled standard deviation
    pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
    
    return (np.mean(group1) - np.mean(group2)) / pooled_std


# Demonstrate: same p-value, different effect sizes
np.random.seed(42)

# Scenario 1: Small sample, large effect
g1_small = np.random.normal(0.5, 0.1, 15)
g2_small = np.random.normal(0.3, 0.1, 15)

# Scenario 2: Large sample, small effect
g1_large = np.random.normal(0.35, 0.1, 200)
g2_large = np.random.normal(0.32, 0.1, 200)

# Statistics
_, p_small = stats.ttest_ind(g1_small, g2_small)
_, p_large = stats.ttest_ind(g1_large, g2_large)

d_small = cohens_d(g1_small, g2_small)
d_large = cohens_d(g1_large, g2_large)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Small sample, large effect
bp1 = axes[0].boxplot([g1_small, g2_small], patch_artist=True)
bp1['boxes'][0].set_facecolor(SUBJECT_1)
bp1['boxes'][1].set_facecolor(SUBJECT_2)
for box in bp1['boxes']:
    box.set_alpha(0.7)
axes[0].set_xticklabels(['Group 1', 'Group 2'])
axes[0].set_ylabel('Connectivity', fontsize=12)
axes[0].set_title(f'Small Sample (n=15 per group)\np = {p_small:.3f}, Cohen\'s d = {d_small:.2f}', 
                  fontsize=12, fontweight='bold')
axes[0].set_ylim(0, 0.8)

# Right: Large sample, small effect
bp2 = axes[1].boxplot([g1_large, g2_large], patch_artist=True)
bp2['boxes'][0].set_facecolor(SUBJECT_1)
bp2['boxes'][1].set_facecolor(SUBJECT_2)
for box in bp2['boxes']:
    box.set_alpha(0.7)
axes[1].set_xticklabels(['Group 1', 'Group 2'])
axes[1].set_ylabel('Connectivity', fontsize=12)
axes[1].set_title(f'Large Sample (n=200 per group)\np = {p_large:.3f}, Cohen\'s d = {d_large:.2f}', 
                  fontsize=12, fontweight='bold')
axes[1].set_ylim(0, 0.8)

plt.suptitle('Same P-Value, Different Effect Sizes!', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("Scenario 1 (small sample):")
print(f"  p = {p_small:.3f}, Cohen's d = {d_small:.2f} ‚Üí LARGE effect")
print(f"\nScenario 2 (large sample):")
print(f"  p = {p_large:.3f}, Cohen's d = {d_large:.2f} ‚Üí SMALL effect")
print("\n‚ö†Ô∏è Both have similar p-values, but very different practical importance!")

---

## Section 14: Best Practices Checklist

### Before Analysis

- [ ] Define your hypothesis BEFORE looking at data
- [ ] Pre-register your analysis plan if possible
- [ ] Choose Œ± level (typically 0.05) BEFORE analysis
- [ ] Decide on correction method (FDR vs Bonferroni) BEFORE analysis

### During Analysis

- [ ] Use appropriate surrogate method for your connectivity metric
- [ ] Generate enough surrogates (‚â•1000 for Œ±=0.05)
- [ ] Apply multiple comparisons correction
- [ ] Compute effect sizes alongside p-values
- [ ] Report confidence intervals

### Reporting Results

- [ ] Report exact p-values (not just "p < 0.05")
- [ ] Report correction method used
- [ ] Report effect sizes (Cohen's d, etc.)
- [ ] Report confidence intervals
- [ ] Be transparent about number of tests performed

### Common Pitfalls to Avoid

‚ùå **P-hacking**: Running many analyses until you find p < 0.05  
‚ùå **HARKing**: Hypothesizing After Results are Known  
‚ùå **Ignoring multiple comparisons**  
‚ùå **Confusing statistical and practical significance**  
‚ùå **Reporting only "significant" results**

---

## Section 15: Complete Significance Testing Pipeline

Let's put everything together in a complete pipeline for testing connectivity significance across multiple channel pairs.

In [None]:
# ============================================================================
# VISUALIZATION 12: Complete Significance Testing Pipeline
# ============================================================================

def significance_pipeline(signals: List[NDArray[np.floating]],
                          n_surrogates: int = 500,
                          alpha: float = 0.05,
                          correction: str = 'fdr') -> Dict[str, Any]:
    """
    Complete pipeline for connectivity significance testing.
    
    Parameters
    ----------
    signals : List[NDArray[np.floating]]
        List of signals (channels).
    n_surrogates : int
        Number of surrogates for null distribution.
    alpha : float
        Significance level.
    correction : str
        'bonferroni', 'fdr', or 'none'.
    
    Returns
    -------
    Dict[str, Any]
        Results dictionary with PLV values, p-values, and significance masks.
    """
    n_channels = len(signals)
    n_pairs = n_channels * (n_channels - 1) // 2
    
    # Initialize arrays
    plv_matrix = np.zeros((n_channels, n_channels))
    pvalue_matrix = np.ones((n_channels, n_channels))
    
    # Compute PLV and p-values for all pairs
    pair_idx = 0
    pvalues_flat = []
    pairs_list = []
    
    for i in range(n_channels):
        for j in range(i + 1, n_channels):
            # Observed PLV
            plv = compute_plv(signals[i], signals[j])
            plv_matrix[i, j] = plv
            plv_matrix[j, i] = plv
            
            # Null distribution
            null_values = build_null_distribution(signals[i], signals[j], 
                                                  n_surrogates=n_surrogates,
                                                  method='phase_shuffle')
            
            # P-value
            pval = compute_pvalue(plv, null_values)
            pvalue_matrix[i, j] = pval
            pvalue_matrix[j, i] = pval
            
            pvalues_flat.append(pval)
            pairs_list.append((i, j))
            pair_idx += 1
    
    pvalues_flat = np.array(pvalues_flat)
    
    # Apply correction
    if correction == 'bonferroni':
        significant_flat, alpha_corr = bonferroni_correction(pvalues_flat, alpha)
    elif correction == 'fdr':
        significant_flat, _ = fdr_correction(pvalues_flat, alpha)
        alpha_corr = alpha  # Not directly applicable for FDR
    else:
        significant_flat = pvalues_flat < alpha
        alpha_corr = alpha
    
    # Build significance matrix
    sig_matrix = np.zeros((n_channels, n_channels), dtype=bool)
    for k, (i, j) in enumerate(pairs_list):
        sig_matrix[i, j] = significant_flat[k]
        sig_matrix[j, i] = significant_flat[k]
    
    return {
        'plv_matrix': plv_matrix,
        'pvalue_matrix': pvalue_matrix,
        'significant_matrix': sig_matrix,
        'n_significant': np.sum(significant_flat),
        'correction': correction
    }


# Create simulated multi-channel data
np.random.seed(42)
n_channels = 6
n_samples = 5 * fs  # 5 seconds
channel_names = ['Fz', 'Cz', 'Pz', 'F3', 'F4', 'Oz']

# Generate signals with some true connectivity
signals = []
base_alpha = np.sin(2 * np.pi * 10 * np.arange(n_samples) / fs)

for i in range(n_channels):
    noise = 0.5 * np.random.randn(n_samples)
    if i < 3:  # First 3 channels share some phase
        phase_shift = np.random.uniform(0, np.pi/4)
        sig = np.sin(2 * np.pi * 10 * np.arange(n_samples) / fs + phase_shift) + noise
    else:  # Last 3 channels are independent
        sig = np.sin(2 * np.pi * 10 * np.arange(n_samples) / fs + np.random.uniform(0, 2*np.pi)) + noise
    sig = bandpass_filter(sig, 8, 12, fs)
    signals.append(sig)

# Run pipeline
print("Running significance pipeline (this may take a moment)...")
results = significance_pipeline(signals, n_surrogates=200, alpha=0.05, correction='fdr')

# Plot results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# PLV matrix
im1 = axes[0].imshow(results['plv_matrix'], cmap='Blues', vmin=0, vmax=1)
axes[0].set_xticks(range(n_channels))
axes[0].set_yticks(range(n_channels))
axes[0].set_xticklabels(channel_names)
axes[0].set_yticklabels(channel_names)
axes[0].set_title('PLV Matrix', fontsize=12, fontweight='bold')
plt.colorbar(im1, ax=axes[0], shrink=0.8)

# P-value matrix
im2 = axes[1].imshow(results['pvalue_matrix'], cmap='Reds_r', vmin=0, vmax=0.1)
axes[1].set_xticks(range(n_channels))
axes[1].set_yticks(range(n_channels))
axes[1].set_xticklabels(channel_names)
axes[1].set_yticklabels(channel_names)
axes[1].set_title('P-Value Matrix', fontsize=12, fontweight='bold')
plt.colorbar(im2, ax=axes[1], shrink=0.8, label='p-value')

# Significance matrix
im3 = axes[2].imshow(results['significant_matrix'].astype(float), cmap='Greens', vmin=0, vmax=1)
axes[2].set_xticks(range(n_channels))
axes[2].set_yticks(range(n_channels))
axes[2].set_xticklabels(channel_names)
axes[2].set_yticklabels(channel_names)
axes[2].set_title(f'Significant (FDR, Œ±=0.05)\n{results["n_significant"]} pairs', 
                  fontsize=12, fontweight='bold')

# Add text annotations for significance
for i in range(n_channels):
    for j in range(n_channels):
        if results['significant_matrix'][i, j]:
            axes[2].text(j, i, '‚úì', ha='center', va='center', 
                        fontsize=14, fontweight='bold', color='white')

plt.suptitle('Complete Significance Testing Pipeline', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"\nResults:")
print(f"  Total pairs tested: {n_channels * (n_channels - 1) // 2}")
print(f"  Significant pairs (FDR corrected): {results['n_significant']}")

---

## Section 16: Exercises

### Exercise 1: Surrogate Comparison

Compare phase shuffling and time shifting for the same signal pair:
- Generate 500 surrogates with each method
- Plot both null distributions
- Are the p-values similar?

```python
# Your code here
# Hint: Use phase_shuffle() and time_shift() functions
```

### Exercise 2: Effect of Number of Surrogates

Investigate how the number of surrogates affects p-value stability:
- Test with N = 100, 500, 1000, 5000
- Repeat each 10 times and compute the standard deviation of p-values
- At what N does the p-value stabilize?

```python
# Your code here
```

### Exercise 3: Multiple Comparisons Impact

Simulate the multiple comparisons problem:
- Generate 100 pairs of independent signals (no true connectivity)
- Test all pairs at Œ± = 0.05
- How many false positives with no correction?
- How many with Bonferroni? With FDR?

```python
# Your code here
```

### Exercise 4: Power Analysis

Investigate statistical power:
- Generate pairs with known connectivity (PLV = 0.3, 0.5, 0.7)
- For each, run the significance test 100 times
- What proportion of tests correctly reject H‚ÇÄ?
- How does signal length affect power?

```python
# Your code here
```

In [None]:
# ============================================================================
# EXERCISE 1 SOLUTION: Surrogate Comparison
# ============================================================================

# Generate test signals
np.random.seed(42)
t_ex = np.arange(0, 3, 1/fs)

# Create two weakly coupled signals
sig1_ex = np.sin(2 * np.pi * 10 * t_ex) + 0.3 * np.random.randn(len(t_ex))
sig2_ex = np.sin(2 * np.pi * 10 * t_ex + np.pi/4) + 0.3 * np.random.randn(len(t_ex))

# Filter to alpha band
sig1_ex = bandpass_filter(sig1_ex, 8, 12, fs)
sig2_ex = bandpass_filter(sig2_ex, 8, 12, fs)

# Observed PLV
plv_observed_ex = compute_plv(sig1_ex, sig2_ex)

# Generate null distributions with both methods
n_surr = 500

null_phase_shuffle = np.zeros(n_surr)
null_time_shift = np.zeros(n_surr)

for i in range(n_surr):
    # Phase shuffle
    surr_ps = phase_shuffle(sig2_ex)
    null_phase_shuffle[i] = compute_plv(sig1_ex, surr_ps)
    
    # Time shift
    surr_ts = time_shift(sig2_ex)
    null_time_shift[i] = compute_plv(sig1_ex, surr_ts)

# Compute p-values
pval_ps = compute_pvalue(plv_observed_ex, null_phase_shuffle)
pval_ts = compute_pvalue(plv_observed_ex, null_time_shift)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].hist(null_phase_shuffle, bins=30, alpha=0.7, color=PRIMARY_BLUE, 
             edgecolor='white', label='Phase Shuffle')
axes[0].axvline(plv_observed_ex, color=PRIMARY_RED, linewidth=2, linestyle='--',
                label=f'Observed PLV = {plv_observed_ex:.3f}')
axes[0].set_xlabel('PLV', fontsize=11)
axes[0].set_ylabel('Count', fontsize=11)
axes[0].set_title(f'Phase Shuffling\np = {pval_ps:.4f}', fontsize=12, fontweight='bold')
axes[0].legend()

axes[1].hist(null_time_shift, bins=30, alpha=0.7, color=SECONDARY_ORANGE, 
             edgecolor='white', label='Time Shift')
axes[1].axvline(plv_observed_ex, color=PRIMARY_RED, linewidth=2, linestyle='--',
                label=f'Observed PLV = {plv_observed_ex:.3f}')
axes[1].set_xlabel('PLV', fontsize=11)
axes[1].set_ylabel('Count', fontsize=11)
axes[1].set_title(f'Time Shifting\np = {pval_ts:.4f}', fontsize=12, fontweight='bold')
axes[1].legend()

plt.suptitle('Exercise 1: Comparing Surrogate Methods', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"Observed PLV: {plv_observed_ex:.4f}")
print(f"Phase shuffling: mean null = {np.mean(null_phase_shuffle):.4f}, p = {pval_ps:.4f}")
print(f"Time shifting:   mean null = {np.mean(null_time_shift):.4f}, p = {pval_ts:.4f}")
print("\n‚Üí Both methods give similar results, but phase shuffling is more rigorous.")

In [None]:
# ============================================================================
# EXERCISE 2 SOLUTION: Effect of Number of Surrogates
# ============================================================================

np.random.seed(42)

# Test different numbers of surrogates
n_surrogates_list = [100, 500, 1000, 5000]
n_repetitions = 10

# Store results
pvalue_results = {n: [] for n in n_surrogates_list}

# Use same signals as before
for n_surr in n_surrogates_list:
    for rep in range(n_repetitions):
        # Build null distribution
        null_vals = np.zeros(n_surr)
        for i in range(n_surr):
            surr = phase_shuffle(sig2_ex)
            null_vals[i] = compute_plv(sig1_ex, surr)
        
        # Compute p-value
        pval = compute_pvalue(plv_observed_ex, null_vals)
        pvalue_results[n_surr].append(pval)

# Compute statistics
means = [np.mean(pvalue_results[n]) for n in n_surrogates_list]
stds = [np.std(pvalue_results[n]) for n in n_surrogates_list]

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: Boxplot of p-values
bp = axes[0].boxplot([pvalue_results[n] for n in n_surrogates_list], 
                      patch_artist=True, labels=[str(n) for n in n_surrogates_list])
for patch in bp['boxes']:
    patch.set_facecolor(PRIMARY_BLUE)
    patch.set_alpha(0.7)
axes[0].set_xlabel('Number of Surrogates', fontsize=11)
axes[0].set_ylabel('P-value', fontsize=11)
axes[0].set_title('P-value Variability vs N Surrogates', fontsize=12, fontweight='bold')

# Right: Standard deviation
axes[1].bar(range(len(n_surrogates_list)), stds, color=SECONDARY_ORANGE, alpha=0.7)
axes[1].set_xticks(range(len(n_surrogates_list)))
axes[1].set_xticklabels([str(n) for n in n_surrogates_list])
axes[1].set_xlabel('Number of Surrogates', fontsize=11)
axes[1].set_ylabel('Std of P-values', fontsize=11)
axes[1].set_title('P-value Stability', fontsize=12, fontweight='bold')

plt.suptitle('Exercise 2: More Surrogates = More Stable P-values', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("P-value statistics by number of surrogates:")
for n, m, s in zip(n_surrogates_list, means, stds):
    print(f"  N = {n:4d}: mean = {m:.4f}, std = {s:.4f}")
print("\n‚Üí P-values stabilize around N = 1000. Use N ‚â• 1000 for reliable results.")

In [None]:
# ============================================================================
# EXERCISE 3 SOLUTION: Multiple Comparisons Impact
# ============================================================================

np.random.seed(42)

n_pairs = 100
alpha_test = 0.05
n_surrogates_ex3 = 200  # Fewer for speed

# Generate independent signal pairs (no true connectivity)
pvalues_ex3 = []

print("Testing 100 independent signal pairs (no true connectivity)...")
for pair in range(n_pairs):
    # Generate two independent signals
    s1 = np.random.randn(len(t_ex))
    s2 = np.random.randn(len(t_ex))
    s1 = bandpass_filter(s1, 8, 12, fs)
    s2 = bandpass_filter(s2, 8, 12, fs)
    
    # Observed PLV
    plv = compute_plv(s1, s2)
    
    # Null distribution (fast: time shifting)
    null_vals = np.zeros(n_surrogates_ex3)
    for i in range(n_surrogates_ex3):
        null_vals[i] = compute_plv(s1, time_shift(s2))
    
    # P-value
    pval = compute_pvalue(plv, null_vals)
    pvalues_ex3.append(pval)

pvalues_ex3 = np.array(pvalues_ex3)

# Apply corrections
sig_none = pvalues_ex3 < alpha_test
sig_bonf, _ = bonferroni_correction(pvalues_ex3, alpha_test)
sig_fdr, _ = fdr_correction(pvalues_ex3, alpha_test)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: P-value distribution
axes[0].hist(pvalues_ex3, bins=20, color=PRIMARY_BLUE, edgecolor='white', alpha=0.7)
axes[0].axvline(alpha_test, color=PRIMARY_RED, linewidth=2, linestyle='--',
                label=f'Œ± = {alpha_test}')
axes[0].set_xlabel('P-value', fontsize=11)
axes[0].set_ylabel('Count', fontsize=11)
axes[0].set_title('P-value Distribution (H‚ÇÄ true for all)', fontsize=12, fontweight='bold')
axes[0].legend()

# Right: False positives by method
methods_ex3 = ['No correction', 'Bonferroni', 'FDR']
fp_counts = [np.sum(sig_none), np.sum(sig_bonf), np.sum(sig_fdr)]
colors_ex3 = [PRIMARY_RED, PRIMARY_GREEN, SECONDARY_ORANGE]

bars = axes[1].bar(methods_ex3, fp_counts, color=colors_ex3, alpha=0.7)
axes[1].axhline(n_pairs * alpha_test, color='black', linestyle='--', 
                label=f'Expected FP = {n_pairs * alpha_test:.0f}')
axes[1].set_ylabel('False Positives', fontsize=11)
axes[1].set_title('False Positives by Correction Method', fontsize=12, fontweight='bold')
axes[1].legend()

# Add value labels
for bar, count in zip(bars, fp_counts):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                 str(count), ha='center', fontsize=12, fontweight='bold')

plt.suptitle('Exercise 3: Multiple Comparisons Problem Demonstrated', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"\nResults ({n_pairs} pairs tested, ALL with NO true connectivity):")
print(f"  No correction:  {np.sum(sig_none)} false positives (expected: {n_pairs * alpha_test:.0f})")
print(f"  Bonferroni:     {np.sum(sig_bonf)} false positives")
print(f"  FDR:            {np.sum(sig_fdr)} false positives")
print("\n‚Üí Without correction, ~5% of null pairs are falsely declared significant!")

In [None]:
# ============================================================================
# EXERCISE 4 SOLUTION: Power Analysis
# ============================================================================

def generate_coupled_signals(plv_target: float, 
                              n_samples: int,
                              fs: int) -> Tuple[NDArray, NDArray]:
    """
    Generate two signals with approximately the target PLV.
    
    The coupling strength is controlled by mixing a shared oscillation
    with independent noise.
    """
    t = np.arange(n_samples) / fs
    
    # Shared oscillation (10 Hz)
    shared = np.sin(2 * np.pi * 10 * t)
    
    # Coupling factor (empirically tuned)
    coupling = plv_target ** 0.5  # Approximate
    
    # Signal 1: shared + noise
    s1 = coupling * shared + (1 - coupling) * np.random.randn(n_samples)
    
    # Signal 2: phase-shifted shared + noise
    phase_shift = np.random.uniform(0, np.pi / 4)
    s2 = coupling * np.sin(2 * np.pi * 10 * t + phase_shift) + (1 - coupling) * np.random.randn(n_samples)
    
    # Bandpass filter
    s1 = bandpass_filter(s1, 8, 12, fs)
    s2 = bandpass_filter(s2, 8, 12, fs)
    
    return s1, s2


np.random.seed(42)

# Parameters
plv_targets = [0.3, 0.5, 0.7]
signal_lengths = [1, 2, 5]  # seconds
n_tests = 50
n_surrogates_ex4 = 200

# Store power results
power_results = np.zeros((len(plv_targets), len(signal_lengths)))

print("Running power analysis (this may take a moment)...")

for i, plv_target in enumerate(plv_targets):
    for j, sig_len in enumerate(signal_lengths):
        n_samples_ex4 = int(sig_len * fs)
        n_significant = 0
        
        for test in range(n_tests):
            # Generate coupled signals
            s1, s2 = generate_coupled_signals(plv_target, n_samples_ex4, fs)
            
            # Observed PLV
            plv = compute_plv(s1, s2)
            
            # Null distribution
            null_vals = np.zeros(n_surrogates_ex4)
            for k in range(n_surrogates_ex4):
                null_vals[k] = compute_plv(s1, time_shift(s2))
            
            # P-value
            pval = compute_pvalue(plv, null_vals)
            
            if pval < 0.05:
                n_significant += 1
        
        power_results[i, j] = n_significant / n_tests

# Plot
fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(len(signal_lengths))
width = 0.25
colors_power = [PRIMARY_BLUE, SECONDARY_ORANGE, PRIMARY_GREEN]

for i, (plv_target, color) in enumerate(zip(plv_targets, colors_power)):
    offset = (i - 1) * width
    bars = ax.bar(x + offset, power_results[i] * 100, width, 
                  label=f'PLV = {plv_target}', color=color, alpha=0.8)

ax.axhline(80, color='red', linestyle='--', alpha=0.5, label='80% power threshold')
ax.set_xlabel('Signal Length (seconds)', fontsize=12)
ax.set_ylabel('Statistical Power (%)', fontsize=12)
ax.set_title('Exercise 4: Power Increases with PLV and Signal Length', 
             fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([f'{s}s' for s in signal_lengths])
ax.legend(loc='lower right')
ax.set_ylim(0, 105)

plt.tight_layout()
plt.show()

print("\nStatistical Power (% tests correctly rejecting H‚ÇÄ):")
print(f"{'PLV':>8}", end='')
for sig_len in signal_lengths:
    print(f"{sig_len}s".rjust(10), end='')
print()
for i, plv_target in enumerate(plv_targets):
    print(f"{plv_target:>8.1f}", end='')
    for j in range(len(signal_lengths)):
        print(f"{power_results[i, j] * 100:>9.0f}%", end='')
    print()
print("\n‚Üí Higher PLV and longer signals = more statistical power!")

---

## Section 17: Summary

### Key Concepts Learned

1. **Null Hypothesis Testing**
   - H‚ÇÄ: No true connectivity (observed values due to chance)
   - Build null distribution using surrogate data
   - P-value = probability of observing data this extreme under H‚ÇÄ

2. **Surrogate Methods**
   - **Phase shuffling**: Preserves spectrum, destroys phase relationships
   - **Time shifting**: Faster, simpler, less rigorous
   - Use enough surrogates (‚â•1000 for Œ± = 0.05)

3. **Multiple Comparisons Correction**
   - Testing many pairs inflates false positive rate
   - **Bonferroni**: Conservative, controls FWER
   - **FDR (Benjamini-Hochberg)**: Less conservative, controls proportion of false discoveries

4. **Beyond P-Values**
   - **Effect size** (Cohen's d): How big is the effect?
   - **Confidence intervals**: Uncertainty quantification
   - Always report both p-values AND effect sizes

5. **Permutation Testing**
   - Non-parametric group comparisons
   - No distributional assumptions
   - Shuffle group labels to build null distribution

### Decision Flowchart

```
Is connectivity significant?
‚îÇ
‚îú‚îÄ‚Üí Single pair ‚Üí Surrogate testing (phase shuffle)
‚îÇ
‚îú‚îÄ‚Üí Many pairs ‚Üí Apply correction (FDR for exploratory, Bonferroni for confirmatory)
‚îÇ
‚îî‚îÄ‚Üí Group comparison ‚Üí Permutation testing
```

### What's Next?

In **C04**, you'll learn about **causality and directionality** ‚Äî determining not just IF signals are connected, but in which DIRECTION information flows.

---

## Section 18: Discussion Questions

1. **When would you choose Bonferroni over FDR correction?**
   - Consider the cost of false positives vs. false negatives in your research context.

2. **How many surrogates are "enough"?**
   - It depends on your desired p-value precision. For Œ± = 0.001, you need at least 1000. For Œ± = 0.0001, at least 10000.

3. **What if you test multiple frequency bands AND multiple channel pairs?**
   - You should correct for ALL tests. Some researchers use cluster-based permutation testing.

4. **Can you trust a significant result without replication?**
   - Single studies can find false positives. Replication is the gold standard.

5. **How do surrogate methods handle non-stationarity?**
   - Standard phase shuffling assumes stationarity. For non-stationary data, consider windowed approaches or AAFT.

---

**Congratulations!** You now understand how to properly test connectivity for statistical significance. This is crucial for making valid scientific claims about brain connectivity.

*Remember: A connectivity value without proper statistical testing is just a number ‚Äî it tells you nothing about whether it's real or just noise.*