# Classification with Forward-Forward Algorithm

## The Problem with Backpropagation

Standard backpropagation requires:
- Global error signal propagated backwards through all layers
- Surrogate gradients for non-differentiable SOEN dynamics
- Weight transport (knowing downstream weights for gradient computation)

**None of these are hardware-compatible.**

## The Forward-Forward Algorithm (Hinton, 2022)

Replace backprop with **two forward passes**:

```
POSITIVE PASS: Real data + correct label → maximize "goodness"
NEGATIVE PASS: Real data + wrong label  → minimize "goodness"
```

**Goodness** = sum of squared activations (or any local measure)

```
                    ┌─────────────────────────────────────┐
                    │     FORWARD-FORWARD LEARNING        │
                    ├─────────────────────────────────────┤
                    │                                     │
                    │  Positive: x ⊕ y_correct            │
                    │     → Layer 1 → goodness₁ ↑         │
                    │     → Layer 2 → goodness₂ ↑         │
                    │                                     │
                    │  Negative: x ⊕ y_wrong              │
                    │     → Layer 1 → goodness₁ ↓         │
                    │     → Layer 2 → goodness₂ ↓         │
                    │                                     │
                    │  Each layer learns LOCALLY!         │
                    └─────────────────────────────────────┘
```

## Why This is Hardware-Compatible

| Property | Backprop | Forward-Forward |
|----------|----------|------------------|
| Error propagation | Global, backwards | None |
| Weight updates | Requires downstream info | Local only |
| Gradient computation | Through all layers | Per-layer |
| Hardware feasibility | Difficult | Possible |

## Our Implementation

1. **Label embedding**: Concatenate one-hot label to input
2. **Goodness metric**: Sum of squared SOEN neuron outputs
3. **Per-layer loss**: Maximize goodness for positive, minimize for negative
4. **Threshold**: Goodness > θ means "positive" prediction

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

from soen_toolkit.core import (
    ConnectionConfig,
    LayerConfig,
    SimulationConfig,
    SOENModelCore,
)

torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")

## 1. Generate Circle-in-Ring Dataset

In [None]:
def generate_circle_ring_data(n_samples=500, inner_radius=0.3, outer_radius_min=0.5, 
                               outer_radius_max=0.8, noise=0.05):
    """
    Generate 2D classification data: circle inside a ring.
    """
    n_each = n_samples // 2
    
    # Class 0: Inner circle
    theta_inner = np.random.uniform(0, 2*np.pi, n_each)
    r_inner = np.random.uniform(0, inner_radius, n_each)
    x_inner = r_inner * np.cos(theta_inner) + np.random.normal(0, noise, n_each)
    y_inner = r_inner * np.sin(theta_inner) + np.random.normal(0, noise, n_each)
    
    # Class 1: Outer ring
    theta_outer = np.random.uniform(0, 2*np.pi, n_each)
    r_outer = np.random.uniform(outer_radius_min, outer_radius_max, n_each)
    x_outer = r_outer * np.cos(theta_outer) + np.random.normal(0, noise, n_each)
    y_outer = r_outer * np.sin(theta_outer) + np.random.normal(0, noise, n_each)
    
    X = np.vstack([
        np.column_stack([x_inner, y_inner]),
        np.column_stack([x_outer, y_outer])
    ])
    y = np.array([0] * n_each + [1] * n_each)
    
    idx = np.random.permutation(len(y))
    X, y = X[idx], y[idx]
    
    # Scale to SOEN operating range
    X = (X + 1) / 2 * 0.25 + 0.025
    
    return torch.FloatTensor(X), torch.LongTensor(y)


N_SAMPLES = 500
X_data, y_data = generate_circle_ring_data(N_SAMPLES)

print(f"Dataset shape: X={X_data.shape}, y={y_data.shape}")
print(f"Class distribution: {(y_data == 0).sum().item()} inner, {(y_data == 1).sum().item()} outer")

# Visualize
plt.figure(figsize=(6, 6))
for c, color in enumerate(['blue', 'red']):
    mask = y_data == c
    plt.scatter(X_data[mask, 0], X_data[mask, 1], c=color, alpha=0.6, s=20)
plt.title('Circle vs Ring Dataset')
plt.axis('equal')
plt.grid(True, alpha=0.3)
plt.show()

## 2. Forward-Forward Data Preparation

Key idea: Embed the label into the input!

```
Original input: [x₁, x₂]           (2D)
With label:     [x₁, x₂, l₀, l₁]   (4D, one-hot label appended)

Positive sample: x with CORRECT label
Negative sample: x with WRONG label
```

In [None]:
N_CLASSES = 2
SEQ_LEN = 50
LABEL_SCALE = 0.25  # Stronger label signal (was 0.15)

def embed_label(X, y, n_classes=2, label_scale=LABEL_SCALE):
    """
    Embed one-hot label into input.
    
    Args:
        X: [N, input_dim] input features
        y: [N] class labels (integers)
        n_classes: number of classes
        label_scale: scale for label embedding (stronger = easier to learn)
    
    Returns:
        X_embedded: [N, input_dim + n_classes]
    """
    N = X.shape[0]
    one_hot = torch.zeros(N, n_classes)
    one_hot.scatter_(1, y.unsqueeze(1), label_scale)
    return torch.cat([X, one_hot], dim=1)


def create_positive_negative_pairs(X, y, n_classes=2, label_scale=LABEL_SCALE):
    """
    Create positive and negative samples for Forward-Forward.
    
    Positive: x with correct label
    Negative: x with random wrong label
    """
    N = X.shape[0]
    
    # Positive: correct labels
    X_pos = embed_label(X, y, n_classes, label_scale)
    
    # Negative: random wrong labels
    y_wrong = (y + torch.randint(1, n_classes, (N,))) % n_classes
    X_neg = embed_label(X, y_wrong, n_classes, label_scale)
    
    return X_pos, X_neg


# Create positive and negative samples
X_pos, X_neg = create_positive_negative_pairs(X_data, y_data, N_CLASSES)

print(f"Positive samples shape: {X_pos.shape}")
print(f"Negative samples shape: {X_neg.shape}")
print(f"Label scale: {LABEL_SCALE} (stronger signal for better separation)")
print(f"\nExample positive sample (class 0): {X_pos[y_data == 0][0]}")
print(f"Example negative sample (class 0): {X_neg[y_data == 0][0]}")

In [None]:
# Expand to sequence for SOEN
X_pos_seq = X_pos.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
X_neg_seq = X_neg.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()

print(f"SOEN input shapes:")
print(f"  Positive: {X_pos_seq.shape}")
print(f"  Negative: {X_neg_seq.shape}")

## 3. Forward-Forward SOEN Model

Architecture:
- Input: 4D (2D features + 2D one-hot label)
- Hidden layers: SingleDendrite neurons
- Goodness: **Mean** of squared outputs per layer (normalized!)

```
Normalized Goodness: G = (1/d) * Σ h_j²

Benefits:
- Threshold ~0.5 works for ANY architecture
- [4] neurons, [16] neurons, [32,32] neurons → same threshold!
- No hyperparameter tuning per layer size
```

In [None]:
def build_ff_soen_model(hidden_dims, input_dim=4, dt=50.0):
    """
    Build a SOEN model for Forward-Forward training.
    
    Args:
        hidden_dims: List of hidden layer dimensions
        input_dim: Input dimension (features + label embedding)
    """
    sim_cfg = SimulationConfig(
        dt=dt,
        input_type="state",
        track_phi=False,
        track_power=False,
    )
    
    layers = []
    connections = []
    
    # Input layer
    layers.append(LayerConfig(
        layer_id=0,
        layer_type="Input",
        params={"dim": input_dim},
    ))
    
    # Hidden layers
    for i, hidden_dim in enumerate(hidden_dims):
        layer_id = i + 1
        
        layers.append(LayerConfig(
            layer_id=layer_id,
            layer_type="SingleDendrite",
            params={
                "dim": hidden_dim,
                "solver": "FE",
                "source_func": "Heaviside_fit_state_dep",
                "phi_offset": 0.02,
                "bias_current": 1.98,
                "gamma_plus": 0.0005,
                "gamma_minus": 1e-6,
                "learnable_params": {
                    "phi_offset": False,
                    "bias_current": False,
                    "gamma_plus": False,
                    "gamma_minus": False,
                },
            },
        ))
        
        connections.append(ConnectionConfig(
            from_layer=layer_id - 1,
            to_layer=layer_id,
            connection_type="all_to_all",
            learnable=True,
            params={"init": "xavier_uniform"},
        ))
    
    model = SOENModelCore(
        sim_config=sim_cfg,
        layers_config=layers,
        connections_config=connections,
    )
    
    return model


# Test model
HIDDEN_DIMS = [16, 16]
test_model = build_ff_soen_model(HIDDEN_DIMS, input_dim=4)
print(f"Model layers: {[l.dim for l in test_model.layers]}")
print(f"Parameters: {sum(p.numel() for p in test_model.parameters() if p.requires_grad)}")

## 4. Goodness Function and Forward-Forward Loss

### Hardware-Compatible Goodness

Goodness = mean of squared activations (no normalization needed):
$$G = \frac{1}{d} \sum_j h_j^2$$

**Hardware mapping**: This is simply the mean power in the layer!
- Each neuron's current squared ∝ power dissipation
- Sum across neurons = total layer power
- Divide by neuron count = mean power (or just use sum)

### Loss Function (Simulation Only)

$$\mathcal{L} = \underbrace{\text{softplus}(\theta - G_{pos})}_{\text{push pos above } \theta} + \underbrace{\text{softplus}(G_{neg} - \theta)}_{\text{push neg below } \theta} + \underbrace{\text{softplus}(m - (G_{pos} - G_{neg}))}_{\text{contrastive margin}}$$

**Note**: The loss function only exists during simulation training. 
The trained weights transfer to hardware; the loss does not.

In [None]:
def compute_goodness(activations):
    """
    Compute goodness as mean of squared activations.
    
    G = (1/d) * Σ h_j²
    
    Hardware-compatible: just measures mean power in the layer.
    No normalization needed - threshold is calibrated for SOEN dynamics.
    
    Args:
        activations: [N, dim] layer activations
    
    Returns:
        goodness: [N] goodness score per sample
    """
    return (activations ** 2).mean(dim=1)


def forward_forward_loss(goodness_pos, goodness_neg, threshold=0.1, margin=0.05):
    """
    Forward-Forward loss with contrastive term.
    
    Hardware note: This loss is only used during simulation training.
    The trained weights transfer to hardware; the loss function does not.
    
    Args:
        threshold: calibrated for SOEN activation magnitudes (~0.05-0.15)
        margin: minimum desired gap between pos and neg goodness
    """
    # Want goodness_pos > threshold
    loss_pos = F.softplus(threshold - goodness_pos).mean()
    
    # Want goodness_neg < threshold
    loss_neg = F.softplus(goodness_neg - threshold).mean()
    
    # Contrastive: directly maximize the gap (training stability)
    contrastive = F.softplus(margin - (goodness_pos - goodness_neg)).mean()
    
    return loss_pos + loss_neg + contrastive


# Test goodness computation
test_activations = torch.randn(5, 16) * 0.3  # SOEN-scale activations
print(f"Test activations shape: {test_activations.shape}")
print(f"Goodness (raw, hardware-compatible): {compute_goodness(test_activations)}")
print(f"Typical SOEN goodness range: 0.01 - 0.2")

## 5. Layer-wise Forward-Forward Training

### True Local Learning (Key for Stability!)

Each layer is trained **completely independently** with its own optimizer:

```
Layer 1:
    optimizer_1 = Adam(layer_1_weights)
    g_pos_1 = goodness(normalize(activations_1_pos))
    g_neg_1 = goodness(normalize(activations_1_neg))
    loss_1 = FF_loss(g_pos_1, g_neg_1) + contrastive(g_pos_1, g_neg_1)
    loss_1.backward()
    optimizer_1.step()

Layer 2:
    optimizer_2 = Adam(layer_2_weights)
    g_pos_2 = goodness(normalize(activations_2_pos))
    g_neg_2 = goodness(normalize(activations_2_neg))
    loss_2 = FF_loss(g_pos_2, g_neg_2) + contrastive(g_pos_2, g_neg_2)
    loss_2.backward()
    optimizer_2.step()
```

### Why This Helps:
1. **No gradient interference** between layers
2. **Each layer learns its own optimal representation**
3. **More stable** than joint optimization
4. **True local learning** - each layer only knows its own goodness

In [None]:
def get_layer_activations(model, X_seq, layer_idx):
    """Get activations from a specific layer."""
    _, layer_states = model(X_seq)
    return layer_states[layer_idx][:, -1, :]


def train_forward_forward_layerwise(model, X_pos_seq, X_neg_seq, n_epochs=200, lr=0.01, 
                                     threshold=0.1, margin=0.05, verbose=True):
    """
    Train SOEN model with Forward-Forward algorithm.
    
    Hardware-compatible design:
    - Goodness = mean squared activations (measurable as power)
    - Layer-wise training (local learning)
    - Contrastive loss for stable training (simulation only)
    
    The trained weights transfer to hardware.
    """
    model.train()
    
    hidden_layer_indices = [i for i, l in enumerate(model.layers) if l.layer_type != 'Input']
    
    # Layer-wise optimizers (simulation construct for finding good weights)
    layer_optimizers = []
    for conn_key in model.connections.keys():
        conn_params = [model.connections[conn_key]]
        layer_optimizers.append(torch.optim.Adam(conn_params, lr=lr))
    
    history = {
        'loss': [],
        'goodness_pos': [],
        'goodness_neg': [],
        'accuracy': [],
        'separation': [],
    }
    
    for epoch in range(n_epochs):
        total_loss = 0
        all_goodness_pos = []
        all_goodness_neg = []
        
        _, layer_states_pos = model(X_pos_seq)
        _, layer_states_neg = model(X_neg_seq)
        
        for layer_idx, opt in zip(hidden_layer_indices, layer_optimizers):
            opt.zero_grad()
            
            act_pos = layer_states_pos[layer_idx][:, -1, :]
            act_neg = layer_states_neg[layer_idx][:, -1, :]
            
            # Hardware-compatible goodness (no normalization)
            g_pos = compute_goodness(act_pos)
            g_neg = compute_goodness(act_neg)
            
            all_goodness_pos.append(g_pos.mean().item())
            all_goodness_neg.append(g_neg.mean().item())
            
            layer_loss = forward_forward_loss(g_pos, g_neg, threshold, margin)
            total_loss = total_loss + layer_loss.item()
            
            layer_loss.backward(retain_graph=True)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            opt.step()
        
        with torch.no_grad():
            acc = evaluate_ff_accuracy(model, X_data, y_data, n_classes=N_CLASSES)
        
        mean_g_pos = np.mean(all_goodness_pos)
        mean_g_neg = np.mean(all_goodness_neg)
        separation = mean_g_pos - mean_g_neg
        
        history['loss'].append(total_loss)
        history['goodness_pos'].append(mean_g_pos)
        history['goodness_neg'].append(mean_g_neg)
        history['accuracy'].append(acc)
        history['separation'].append(separation)
        
        if verbose and (epoch + 1) % 20 == 0:
            print(f"Epoch {epoch+1}: Loss={total_loss:.4f}, "
                  f"G_pos={mean_g_pos:.4f}, G_neg={mean_g_neg:.4f}, "
                  f"Sep={separation:.4f}, Acc={acc:.4f}")
    
    return history


def evaluate_ff_accuracy(model, X, y, n_classes=2, label_scale=LABEL_SCALE):
    """
    Evaluate Forward-Forward model accuracy.
    
    Hardware-compatible inference:
    - Run forward pass for each class hypothesis
    - Sum goodness (power) across layers
    - Predict class with highest total goodness
    """
    model.eval()
    N = X.shape[0]
    
    all_goodness = []
    
    for c in range(n_classes):
        y_test = torch.full((N,), c, dtype=torch.long)
        X_test = embed_label(X, y_test, n_classes, label_scale)
        X_test_seq = X_test.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
        
        with torch.no_grad():
            _, layer_states = model(X_test_seq)
            total_goodness = torch.zeros(N)
            for layer_idx in range(1, len(model.layers)):
                act = layer_states[layer_idx][:, -1, :]
                total_goodness += compute_goodness(act)
        
        all_goodness.append(total_goodness)
    
    goodness_matrix = torch.stack(all_goodness, dim=1)
    predictions = goodness_matrix.argmax(dim=1)
    
    accuracy = (predictions == y).float().mean().item()
    model.train()
    return accuracy


def train_forward_forward(model, X_pos_seq, X_neg_seq, n_epochs=200, lr=0.01, 
                          threshold=0.1, verbose=True):
    """Wrapper for layer-wise training."""
    return train_forward_forward_layerwise(
        model, X_pos_seq, X_neg_seq, 
        n_epochs=n_epochs, lr=lr, threshold=threshold, 
        margin=0.05, verbose=verbose
    )

## 6. Train the Model

In [None]:
# Build model - recreate data with LABEL_SCALE
X_pos, X_neg = create_positive_negative_pairs(X_data, y_data, N_CLASSES, LABEL_SCALE)
X_pos_seq = X_pos.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
X_neg_seq = X_neg.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()

HIDDEN_DIMS = [16, 16]
THRESHOLD = 0.1  # Calibrated for SOEN activation magnitudes
N_EPOCHS = 300
LR = 0.01

print(f"Training Forward-Forward SOEN classifier (HARDWARE-COMPATIBLE)...")
print(f"Hidden dimensions: {HIDDEN_DIMS}")
print(f"Threshold: {THRESHOLD} (calibrated for SOEN)")
print(f"Label scale: {LABEL_SCALE}")
print(f"\nHardware-compatible features:")
print(f"  ✓ Goodness = mean(activations²) = power measurement")
print(f"  ✓ Label embedding = optical input modulation")
print(f"  ✓ Inference = compare goodness across class hypotheses")
print(f"  ✓ No normalization (purely local computation)")
print("=" * 60)

torch.manual_seed(42)
model = build_ff_soen_model(HIDDEN_DIMS, input_dim=4)

history = train_forward_forward(
    model, X_pos_seq, X_neg_seq,
    n_epochs=N_EPOCHS, lr=LR, threshold=THRESHOLD, verbose=True
)

print("=" * 60)
print(f"Final accuracy: {history['accuracy'][-1]:.4f}")
print(f"Final separation (G_pos - G_neg): {history['separation'][-1]:.4f}")

## 7. Training Curves

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Loss
ax1 = axes[0, 0]
ax1.plot(history['loss'], color='steelblue', lw=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Forward-Forward Loss')
ax1.set_title('Training Loss')
ax1.grid(True, alpha=0.3)

# Goodness
ax2 = axes[0, 1]
ax2.plot(history['goodness_pos'], label='Positive', color='green', lw=2)
ax2.plot(history['goodness_neg'], label='Negative', color='red', lw=2)
ax2.axhline(y=THRESHOLD, color='black', linestyle='--', label=f'Threshold={THRESHOLD}')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Mean Normalized Goodness')
ax2.set_title('Goodness Separation')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Separation
ax3 = axes[1, 0]
ax3.plot(history['separation'], color='purple', lw=2)
ax3.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax3.set_xlabel('Epoch')
ax3.set_ylabel('G_pos - G_neg')
ax3.set_title('Goodness Separation (should be positive)')
ax3.grid(True, alpha=0.3)

# Accuracy
ax4 = axes[1, 1]
ax4.plot(history['accuracy'], color='coral', lw=2)
ax4.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='Random')
ax4.set_xlabel('Epoch')
ax4.set_ylabel('Accuracy')
ax4.set_title('Classification Accuracy')
ax4.set_ylim(0.4, 1.05)
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 8. Visualize Goodness Distribution

In [None]:
# Compute final goodness for positive and negative samples
model.eval()
with torch.no_grad():
    _, states_pos = model(X_pos_seq)
    _, states_neg = model(X_neg_seq)
    
    # Average goodness across all layers
    g_pos_final = torch.zeros(len(X_data))
    g_neg_final = torch.zeros(len(X_data))
    n_layers = 0
    
    for layer_idx in range(1, len(model.layers)):
        g_pos_final += compute_goodness(states_pos[layer_idx][:, -1, :])
        g_neg_final += compute_goodness(states_neg[layer_idx][:, -1, :])
        n_layers += 1
    
    g_pos_final /= n_layers
    g_neg_final /= n_layers

# Plot distributions
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

ax1 = axes[0]
ax1.hist(g_pos_final.numpy(), bins=30, alpha=0.7, label='Positive', color='green')
ax1.hist(g_neg_final.numpy(), bins=30, alpha=0.7, label='Negative', color='red')
ax1.axvline(x=THRESHOLD, color='black', linestyle='--', label=f'Threshold={THRESHOLD}')
ax1.set_xlabel('Goodness (mean power)')
ax1.set_ylabel('Count')
ax1.set_title('Goodness Distribution: Positive vs Negative')
ax1.legend()

ax2 = axes[1]
goodness_diff = g_pos_final - g_neg_final
colors = ['blue' if y == 0 else 'red' for y in y_data]
ax2.scatter(range(len(goodness_diff)), goodness_diff.numpy(), c=colors, alpha=0.5, s=10)
ax2.axhline(y=0, color='black', linestyle='--')
ax2.set_xlabel('Sample Index')
ax2.set_ylabel('Goodness(Positive) - Goodness(Negative)')
ax2.set_title('Per-Sample Goodness Margin (should be > 0)')

plt.tight_layout()
plt.show()

print(f"Mean positive goodness: {g_pos_final.mean():.4f}")
print(f"Mean negative goodness: {g_neg_final.mean():.4f}")
print(f"Separation: {(g_pos_final.mean() - g_neg_final.mean()):.4f}")
print(f"Samples with correct margin (pos > neg): {(goodness_diff > 0).sum().item()}/{len(goodness_diff)}")

## 9. Decision Boundary

In [None]:
def plot_ff_decision_boundary(model, X_data, y_data, n_classes=2, label_scale=LABEL_SCALE, 
                               resolution=80, ax=None):
    """Plot decision boundary for Forward-Forward classifier."""
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 8))
    
    x_min, x_max = X_data[:, 0].min() - 0.02, X_data[:, 0].max() + 0.02
    y_min, y_max = X_data[:, 1].min() - 0.02, X_data[:, 1].max() + 0.02
    
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    
    grid_points = torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])
    N_grid = len(grid_points)
    
    model.eval()
    all_goodness = []
    
    for c in range(n_classes):
        y_test = torch.full((N_grid,), c, dtype=torch.long)
        X_test = embed_label(grid_points, y_test, n_classes, label_scale)
        X_test_seq = X_test.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
        
        with torch.no_grad():
            _, layer_states = model(X_test_seq)
            total_goodness = torch.zeros(N_grid)
            for layer_idx in range(1, len(model.layers)):
                act = layer_states[layer_idx][:, -1, :]
                total_goodness += compute_goodness(act)
        
        all_goodness.append(total_goodness)
    
    goodness_matrix = torch.stack(all_goodness, dim=1)
    probs = torch.softmax(goodness_matrix * 10, dim=1)[:, 1].numpy()  # Scale for sharper boundary
    Z = probs.reshape(xx.shape)
    
    ax.contourf(xx, yy, Z, levels=50, cmap='RdBu', alpha=0.7)
    ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    
    for c, color in enumerate(['blue', 'red']):
        mask = y_data == c
        ax.scatter(X_data[mask, 0].numpy(), X_data[mask, 1].numpy(), c=color, 
                   s=15, alpha=0.6, edgecolors='white', linewidths=0.3)
    
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title('Forward-Forward Decision Boundary (Hardware-Compatible)')
    ax.set_aspect('equal')
    
    return ax


fig, ax = plt.subplots(figsize=(8, 8))
plot_ff_decision_boundary(model, X_data, y_data, ax=ax)
plt.show()

## 10. Hyperparameter Exploration

In [None]:
# Try different architectures
experiments = [
    {'hidden_dims': [8]},
    {'hidden_dims': [16]},
    {'hidden_dims': [32]},
    {'hidden_dims': [8, 8]},
    {'hidden_dims': [16, 16]},
]

# Threshold calibrated for SOEN dynamics
SOEN_THRESHOLD = 0.1

results = []

print("Hyperparameter exploration (HARDWARE-COMPATIBLE)...")
print(f"Threshold: {SOEN_THRESHOLD} (calibrated for SOEN)")
print(f"Label scale: {LABEL_SCALE}")
print("=" * 60)

for exp in experiments:
    torch.manual_seed(42)
    
    model = build_ff_soen_model(exp['hidden_dims'], input_dim=4)
    n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    
    X_pos_exp, X_neg_exp = create_positive_negative_pairs(X_data, y_data, N_CLASSES, LABEL_SCALE)
    X_pos_seq_exp = X_pos_exp.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
    X_neg_seq_exp = X_neg_exp.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
    
    history_exp = train_forward_forward(
        model, X_pos_seq_exp, X_neg_seq_exp,
        n_epochs=200, lr=0.01, threshold=SOEN_THRESHOLD, verbose=False
    )
    
    final_acc = history_exp['accuracy'][-1]
    final_sep = history_exp['separation'][-1]
    results.append({
        'hidden_dims': str(exp['hidden_dims']),
        'params': n_params,
        'accuracy': final_acc,
        'separation': final_sep,
    })
    
    print(f"Hidden={str(exp['hidden_dims']):12s} | Params={n_params:5d} | "
          f"Acc={final_acc:.4f} | Sep={final_sep:.4f}")

print("=" * 60)

## 11. Compare with Backprop Baseline

In [None]:
def train_backprop_baseline(X_data, y_data, hidden_dims=[16, 16], n_epochs=300, lr=0.02):
    """Train a SOEN model with standard backpropagation for comparison."""
    
    # Build model (without label embedding, just 2D input)
    sim_cfg = SimulationConfig(
        dt=50.0,
        input_type="state",
        track_phi=False,
        track_power=False,
    )
    
    layers = [LayerConfig(layer_id=0, layer_type="Input", params={"dim": 2})]
    connections = []
    
    for i, hidden_dim in enumerate(hidden_dims):
        layer_id = i + 1
        layers.append(LayerConfig(
            layer_id=layer_id,
            layer_type="SingleDendrite",
            params={
                "dim": hidden_dim,
                "solver": "FE",
                "source_func": "Heaviside_fit_state_dep",
                "phi_offset": 0.02,
                "bias_current": 1.98,
                "gamma_plus": 0.0005,
                "gamma_minus": 1e-6,
            },
        ))
        connections.append(ConnectionConfig(
            from_layer=layer_id - 1,
            to_layer=layer_id,
            connection_type="all_to_all",
            learnable=True,
            params={"init": "xavier_uniform"},
        ))
    
    # Output layer (2 neurons for classification)
    output_id = len(hidden_dims) + 1
    layers.append(LayerConfig(
        layer_id=output_id,
        layer_type="SingleDendrite",
        params={
            "dim": 2,
            "solver": "FE",
            "source_func": "Heaviside_fit_state_dep",
            "phi_offset": 0.2,
            "bias_current": 1.98,
            "gamma_plus": 0.0005,
            "gamma_minus": 1e-6,
        },
    ))
    connections.append(ConnectionConfig(
        from_layer=output_id - 1,
        to_layer=output_id,
        connection_type="all_to_all",
        learnable=True,
        params={"init": "xavier_uniform"},
    ))
    
    model = SOENModelCore(
        sim_config=sim_cfg,
        layers_config=layers,
        connections_config=connections,
    )
    
    # Prepare data
    X_seq = X_data.unsqueeze(1).expand(-1, SEQ_LEN, -1).clone()
    
    # Train with BCE on s1 - s0
    model.train()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.BCEWithLogitsLoss()
    
    y_target = y_data.float().unsqueeze(1)
    accuracies = []
    
    for epoch in range(n_epochs):
        optimizer.zero_grad()
        final_hist, _ = model(X_seq)
        output = final_hist[:, -1, :]
        logits = (output[:, 1] - output[:, 0]).unsqueeze(1)
        
        loss = criterion(logits, y_target)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        
        with torch.no_grad():
            preds = (torch.sigmoid(logits) > 0.5).float()
            acc = (preds == y_target).float().mean().item()
        accuracies.append(acc)
    
    return accuracies


# Train backprop baseline
print("Training backprop baseline...")
torch.manual_seed(42)
backprop_accs = train_backprop_baseline(X_data, y_data, hidden_dims=[16, 16], n_epochs=300)
print(f"Backprop final accuracy: {backprop_accs[-1]:.4f}")

In [None]:
# Compare training curves
fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(history['accuracy'], label='Forward-Forward', color='coral', lw=2)
ax.plot(backprop_accs, label='Backpropagation', color='steelblue', lw=2)
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)

ax.set_xlabel('Epoch')
ax.set_ylabel('Accuracy')
ax.set_title('Forward-Forward vs Backpropagation')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0.4, 1.05)

plt.tight_layout()
plt.show()

print(f"\nFinal Comparison:")
print(f"  Forward-Forward: {history['accuracy'][-1]:.4f}")
print(f"  Backpropagation: {backprop_accs[-1]:.4f}")

## 12. Conclusions

In [None]:
print("=" * 70)
print("CONCLUSIONS: HARDWARE-COMPATIBLE FORWARD-FORWARD FOR SOEN")
print("=" * 70)

print(f"\n1. PERFORMANCE:")
print(f"   Forward-Forward accuracy: {history['accuracy'][-1]:.4f}")
print(f"   Backpropagation accuracy: {backprop_accs[-1]:.4f}")

print(f"\n2. GOODNESS METRICS:")
print(f"   Mean positive goodness: {history['goodness_pos'][-1]:.4f}")
print(f"   Mean negative goodness: {history['goodness_neg'][-1]:.4f}")
print(f"   Separation: {history['separation'][-1]:.4f}")
print(f"   Threshold: {THRESHOLD}")

print(f"\n3. HARDWARE COMPATIBILITY:")
print(f"   ✓ Goodness = mean(I²) = power measurement (local)")
print(f"   ✓ Label embedding = optical input modulation")
print(f"   ✓ No normalization (no global computation)")
print(f"   ✓ Inference = compare goodness across hypotheses")

print(f"\n4. WHAT TRANSFERS TO HARDWARE:")
print(f"   ✓ Trained weights (synaptic strengths)")
print(f"   ✓ Network architecture (layer connectivity)")
print(f"   ✓ Inference procedure (two forward passes)")
print(f"   ✗ Loss function (simulation only)")
print(f"   ✗ Optimizer (simulation only)")

print(f"\n5. INFERENCE PROCEDURE (Hardware-Implementable):")
print(f"   1. Input sample with class-0 label → measure total power")
print(f"   2. Input sample with class-1 label → measure total power")
print(f"   3. Predict class with higher power")

print("\n" + "=" * 70)