# OrderedLearning Analysis Walkthrough

This notebook demonstrates how to use the OrderedLearning analysis tools to explore experiment results.
It uses **synthetic data** so you can run it without completing an actual experiment first.

## What you'll learn

1. How experiment metric data is structured
2. Loading experiment data with `analysis_tools.data_loader`
3. Plotting metrics with `analysis_tools.visualize`
4. Switching between dark and paper (publication-ready) styles
5. Computing convergence statistics
6. Building summary tables for comparison
7. Using the CLI analysis tools (`analyze_experiment.py`)

## 1. Generate Synthetic Experiment Data

We'll create fake metric logs that mimic what a real `mod_arithmetic` experiment produces.
Two strategies ("stride" and "random") with loss and accuracy curves.

In [None]:
import json
import math
import os
import tempfile

import numpy as np

# Create a temporary output directory
output_dir = tempfile.mkdtemp(prefix="ol_demo_")
experiment = "demo_experiment"

np.random.seed(42)

for strategy in ["stride", "random"]:
    strat_dir = os.path.join(output_dir, experiment, strategy)
    os.makedirs(strat_dir, exist_ok=True)

    # Generate synthetic JSONL metrics
    with open(os.path.join(strat_dir, f"{strategy}.jsonl"), "w") as f:
        for step in range(0, 5001, 10):
            progress = step / 5000
            # Stride strategy converges faster
            rate = 4.0 if strategy == "stride" else 2.5
            base_loss = 4.5 * math.exp(-rate * progress) + 0.05
            val_acc = min(100.0, 100.0 * (1 - math.exp(-(rate + 0.5) * progress)))
            train_acc = min(100.0, 100.0 * (1 - math.exp(-(rate + 1.0) * progress)))

            record = {
                "step": step,
                "hook_point": "SNAPSHOT",
                "training_metrics/loss": round(base_loss + np.random.normal(0, 0.03), 6),
                "training_metrics/val_acc": round(val_acc + np.random.normal(0, 0.8), 4),
                "training_metrics/train_acc": round(train_acc + np.random.normal(0, 0.4), 4),
            }
            f.write(json.dumps(record) + "\n")

    # Write minimal experiment_config.json
    config = {
        "experiment_name": experiment,
        "strategy": strategy,
        "epochs": 5000,
        "seed": 42,
        "lr": 0.001,
        "batch_size": 256,
    }
    with open(os.path.join(strat_dir, "experiment_config.json"), "w") as f:
        json.dump(config, f, indent=2)

print(f"Synthetic data created in: {output_dir}")
print(f"Strategies: stride, random")
print(f"Steps: 0 to 5000 (every 10)")

## 2. Load Experiment Data

The `load_experiment_data()` function discovers strategy directories, loads JSONL (preferred) or CSV files,
normalizes columns, and returns a tidy pandas DataFrame.

In [None]:
from analysis_tools.data_loader import load_experiment_data

df = load_experiment_data(experiment, output_dir=output_dir)

print(f"Loaded {len(df)} rows across {df['strategy'].nunique()} strategies")
print(f"Columns: {list(df.columns)}")
print()
df.head(10)

The DataFrame has one row per (strategy, step) pair. Columns include:
- `step` — training step number
- `strategy` — which strategy produced this data point
- `training_metrics/loss`, `training_metrics/val_acc`, etc. — metric values from hooks

## 3. Plot Metrics

The `analysis_tools.visualize` module provides `OLFigure` for subplot management
and `plot_time_series()` for line plots with optional EMA smoothing.

In [None]:
import matplotlib.pyplot as plt
from analysis_tools.visualize import ema_smooth
from analysis_tools.style import apply_style, STRATEGY_PALETTE

# Apply the dark theme (matches the console theme)
apply_style("dark")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

metrics = ["training_metrics/loss", "training_metrics/val_acc"]
labels = ["Loss", "Validation Accuracy (%)"]

for i, (metric, label) in enumerate(zip(metrics, labels)):
    ax = axes[i]
    for j, strategy in enumerate(["stride", "random"]):
        sdf = df[df["strategy"] == strategy]
        raw = sdf[metric].values
        smoothed = ema_smooth(raw, weight=0.9)
        # Raw data as faint background
        ax.plot(sdf["step"], raw, alpha=0.15, color=STRATEGY_PALETTE[j])
        # Smoothed as solid line
        ax.plot(sdf["step"], smoothed, label=strategy, color=STRATEGY_PALETTE[j])
    ax.set_xlabel("Step")
    ax.set_ylabel(label)
    ax.legend()
    ax.set_title(label)

fig.suptitle("Demo Experiment: Strategy Comparison", fontsize=14)
fig.tight_layout()
plt.show()

## 4. Paper-Ready Style

Switch to the `paper` style for publication-quality figures.
This uses the Wong colorblind-friendly palette (Nature Methods, 2011)
with a white background and clean typography.

In [None]:
# Switch to paper style
apply_style("paper")

fig, ax = plt.subplots(figsize=(7, 4))

for j, strategy in enumerate(["stride", "random"]):
    sdf = df[df["strategy"] == strategy]
    smoothed = ema_smooth(sdf["training_metrics/val_acc"].values, weight=0.95)
    ax.plot(sdf["step"], smoothed, label=strategy, color=STRATEGY_PALETTE[j], linewidth=1.5)

ax.set_xlabel("Training Step")
ax.set_ylabel("Validation Accuracy (%)")
ax.legend(frameon=True)
ax.set_title("Convergence Comparison")
fig.tight_layout()
plt.show()

# Switch back to dark for remaining cells
apply_style("dark")

## 5. Convergence Analysis

Find the training step at which each strategy first reaches a target accuracy threshold.

In [None]:
threshold = 95.0

print(f"Convergence analysis: val_acc >= {threshold}%")
print("-" * 45)

for strategy in ["stride", "random"]:
    sdf = df[df["strategy"] == strategy]
    above = sdf[sdf["training_metrics/val_acc"] >= threshold]
    if len(above) > 0:
        first_step = int(above.iloc[0]["step"])
        print(f"  {strategy:15s}  reached {threshold}% at step {first_step}")
    else:
        print(f"  {strategy:15s}  did not reach {threshold}%")

## 6. Summary Table

Aggregate final metrics across strategies for comparison.

In [None]:
# Get final values for each strategy
summary_rows = []
for strategy in ["stride", "random"]:
    sdf = df[df["strategy"] == strategy]
    final = sdf.iloc[-1]
    summary_rows.append({
        "Strategy": strategy,
        "Final Loss": f"{final['training_metrics/loss']:.4f}",
        "Final Val Acc": f"{final['training_metrics/val_acc']:.2f}%",
        "Min Loss": f"{sdf['training_metrics/loss'].min():.4f}",
        "Max Val Acc": f"{sdf['training_metrics/val_acc'].max():.2f}%",
    })

import pandas as pd
summary_df = pd.DataFrame(summary_rows).set_index("Strategy")
print("Strategy Comparison")
print("=" * 60)
print(summary_df.to_string())

## 7. CLI Analysis Tools

The `analyze_experiment.py` script provides the same analysis capabilities from the command line.
This is useful for batch processing, scripting, or when you prefer not to use a notebook.

Available tools: `metric_plot`, `convergence`, `compare`, `correlation`, `layer_dynamics`, `weight_compare`, `export_table`.

All tools accept `--output-dir` to point at your experiment data. Plots are saved to
`{output_dir}/{experiment}/analysis/{tool_name}/`.

In [None]:
import subprocess

# The project root is two levels up from this notebook
project_root = os.path.abspath(os.path.join(os.getcwd(), "..", ".."))

def run_cli(args):
    """Run an analyze_experiment.py command and print the output."""
    cmd = ["python", "analyze_experiment.py"] + args
    print(f"$ {' '.join(cmd)}\n")
    result = subprocess.run(cmd, capture_output=True, text=True, cwd=project_root)
    if result.stdout:
        print(result.stdout)
    if result.returncode != 0 and result.stderr:
        print(result.stderr)

### metric_plot

Plot loss and accuracy curves with EMA smoothing. This produces the same kind of
overlay plot we made manually in section 3.

In [None]:
# Plot loss and val_acc with smoothing
run_cli([
    "demo_experiment", "metric_plot",
    "--metrics", "training_metrics/loss", "training_metrics/val_acc",
    "--output-dir", output_dir,
    "--smooth", "0.9",
])

### convergence

Find when each strategy reaches a target accuracy, and plot a bar chart comparing them.

In [None]:
# Time-to-threshold: when does val_acc first reach 95%?
run_cli([
    "demo_experiment", "convergence",
    "--metrics", "training_metrics/val_acc",
    "--threshold", "95.0",
    "--output-dir", output_dir,
    "--smooth", "0.9",
])

### export_table

Generate a Markdown or LaTeX table of final metrics, ready to paste into a paper.
Use `--bold-best` to highlight the best strategy per metric.

In [None]:
# Export a Markdown comparison table with best values bolded
run_cli([
    "demo_experiment", "export_table",
    "--metrics", "training_metrics/loss", "training_metrics/val_acc",
    "--output-dir", output_dir,
    "--table-format", "markdown",
    "--bold-best",
])

## 8. Cleanup

Remove the temporary synthetic data.

In [None]:
import shutil
shutil.rmtree(output_dir)
print(f"Cleaned up: {output_dir}")

## Next Steps

- **Run a real experiment:** `python run_experiment.py mod_arithmetic --strategy stride --with-hooks full --hook-jsonl`
- **Use the CLI analysis tools:** `python analyze_experiment.py mod_arithmetic metric_plot --metrics training_metrics/loss`
- **Read the getting-started guide:** [docs/getting-started.md](../../docs/getting-started.md)
- **Explore the hook reference:** [docs/instrumentation-hooks.md](../../docs/instrumentation-hooks.md)