# Phase Transition Simulation

This notebook demonstrates the sharp phase transition in RLVR governed by Youden's index:

$$J = \text{TPR} - \text{FPR} = (1 - \text{FN}) - \text{FP}$$

- **J > 0**: Learning proceeds, bad modes decay
- **J = 0**: Neutral drift, no learning
- **J < 0**: Anti-learning, model collapses

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from dataclasses import dataclass, replace

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 12, 'figure.figsize': (10, 6)})

## Configuration

In [None]:
@dataclass
class Config:
    T: float = 500.0      # Total time
    dt: float = 1.0       # Time step
    eta: float = 0.01     # Learning rate
    p0: float = 0.5       # Initial bad mode probability
    FP: float = 0.0       # False positive rate
    FN: float = 0.0       # False negative rate
    n_runs: int = 10      # Ensemble size
    seed: int = 42

## Core Dynamics

The mean-field ODE for the bad mode probability $p(t)$:

$$\frac{dp}{dt} = -\eta \cdot \frac{J}{\sigma} \cdot [p(1-p)]^2$$

where $\sigma = \sqrt{q(1-q)}$ and $q = (1 - \text{FN}) - J \cdot p$.

In [None]:
def compute_J(FP, FN):
    """Compute Youden's index."""
    return 1 - FN - FP

def step_p(p, eta, FP, FN, dt):
    """One step of the mean-field ODE."""
    J = compute_J(FP, FN)
    q = (1 - FN) - J * p
    sigma = np.sqrt(np.clip(q * (1 - q), 1e-12, None))
    dp = -eta * (J / sigma) * (p * (1 - p))**2
    return np.clip(p + dp * dt, 1e-9, 1 - 1e-9)

def simulate(cfg: Config):
    """Simulate the dynamics."""
    steps = int(cfg.T / cfg.dt)
    t = np.linspace(0, cfg.T, steps + 1)
    p = np.empty(steps + 1)
    p[0] = cfg.p0
    
    for k in range(steps):
        p[k+1] = step_p(p[k], cfg.eta, cfg.FP, cfg.FN, cfg.dt)
    
    return t, p, 1 - p  # t, bad_prob, accuracy

## Simulate Different Noise Regimes

In [None]:
base = Config()

scenarios = [
    ("J = 1.0 (Clean)", replace(base, FP=0.0, FN=0.0)),
    ("J = 0.6 (Moderate noise)", replace(base, FP=0.2, FN=0.2)),
    ("J = 0.2 (High noise)", replace(base, FP=0.4, FN=0.4)),
    ("J = 0.0 (Critical)", replace(base, FP=0.5, FN=0.5)),
    ("J = -0.2 (Anti-learning)", replace(base, FP=0.6, FN=0.6)),
]

results = []
for name, cfg in scenarios:
    t, p, acc = simulate(cfg)
    J = compute_J(cfg.FP, cfg.FN)
    results.append((name, J, t, p, acc))

## Visualize the Phase Transition

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

cmap = plt.cm.RdYlBu

for name, J, t, p, acc in results:
    color = cmap((J + 0.3) / 1.3)  # Map J to color
    axes[0].plot(t, p, label=f'{name}', color=color, linewidth=2)
    axes[1].plot(t, acc * 100, label=f'{name}', color=color, linewidth=2)

axes[0].set_xlabel('Step t')
axes[0].set_ylabel('Bad mode probability p(t)')
axes[0].set_title('Bad Mode Decay')
axes[0].legend()
axes[0].set_ylim(0, 1)

axes[1].set_xlabel('Step t')
axes[1].set_ylabel('Accuracy (%)')
axes[1].set_title('Learning Accuracy')
axes[1].legend()
axes[1].set_ylim(0, 100)

plt.suptitle('Phase Transition: Rate vs Fate', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Key Insight

When **J > 0**, the bad mode probability $p(t) \to 0$ as $t \to \infty$. The noise only affects the *rate* of convergence.

When **J = 0**, there's no net learning signal - the model drifts randomly.

When **J < 0**, the dynamics reverse - the model *anti-learns* and collapses to bad modes.