# ALAIN‑Kit: DIY LM Arena on Poe — Tutorial Notebook

**Provider**: Poe

**Default teacher**: GPT-OSS-20B

**Model roster**: GPT-5, Claude-Sonnet-4, Grok-4, Gemini-2.5-Pro, Llama-3.1-405B, Mixtral-8x22B, GPT-OSS-20B

**Generated at**: 2025-09-19 21:50:25 CDT

Purpose: Build a Next.js App Router arena that compares two Poe models side by side, streams outputs, records a vote, updates ratings, and logs telemetry.

**Read me**: This notebook includes runnable setup checks and copy-ready TypeScript blocks you can paste into your Next.js project.

## Environment guidance
Use secrets. Do not hardcode API keys.

- Local app: put keys in `.env.local` and add `.env*` to `.gitignore`.
- Colab: store secrets with the Secrets UI. Avoid printing keys.

The cells below install the minimal Python deps for smoke checks and set the Poe base URL for OpenAI-compatible calls.


In [None]:
!pip -q install "openai>=4.57.0" "ipywidgets>=8.0.0"
%env OPENAI_BASE_URL=https://api.poe.com/v1
# Set OPENAI_API_KEY through Colab/Notebook secrets UI, not here.
# Example (do not run in shared environments):
# %env OPENAI_API_KEY=YOUR_POE_API_KEY

## Node dependencies for your Next.js app
Run in your project root:
```bash
npm install openai zustand d3
# Optional UI libs: tailwindcss @tanstack/react-query recharts framer-motion lucide-react clsx
```


# 1. Scaffold overview
You will create:

- Poe client via OpenAI SDK with baseURL override
- Model guard and defaults from env
- Telemetry types
- Health and smoke API routes
- Starter page

Paste the TypeScript files below into your Next.js project (App Router).


## Background primer: Server-Sent Events
- SSE frames use `event:` and `data:` lines separated by a blank line.
- Browser `EventSource` streams GET only; use POST+NDJSON via fetch for large prompts.
- Ensure server writes actual newlines `\n` characters, not the two-character sequence `\` + `n`.


## Background primer: Elo
- Expected score: `1 / (1 + 10^((Rb - Ra)/400))`
- Update: `new = old + K * (S - E)` with S in {1, 0.5, 0}
- Use K=24 for a balanced feel in small demos; treat Both bad as no change.


**Checkpoint 1**
- Visit `/api/health`
- Run the Home page smoke test
- Verify JSON response includes `usage`

### lib/poe.ts
```ts
import OpenAI from "openai";

export const poe = new OpenAI({
  apiKey: process.env.POE_API_KEY!,
  baseURL: "https://api.poe.com/v1",
});

export type ChatMessage = { role: "system" | "user" | "assistant"; content: string };

export async function chatOnce(model: string, messages: ChatMessage[]) {
  const res = await poe.chat.completions.create({ model, messages, stream: false, max_tokens: 512 });
  const choice = res.choices?.[0];
  return {
    text: choice?.message?.content ?? "",
    usage: res.usage,
    id: res.id,
    model: res.model,
  };
}
```


### lib/models.ts
```ts
const DEFAULTS = [
  "GPT-5",
  "Claude-Sonnet-4",
  "Grok-4",
  "Gemini-2.5-Pro",
  "Llama-3.1-405B",
  "Mixtral-8x22B",
  "GPT-OSS-20B",
] as const;

export const COMPARISON_MODELS: readonly string[] = (process.env.NEXT_PUBLIC_POE_COMPARISON_MODELS?.split(",")
  .map(s => s.trim())
  .filter(Boolean) || DEFAULTS) as readonly string[];

export type PoeModel = typeof DEFAULTS[number] | (string & {});

export function isAllowedModel(m: string): m is PoeModel {
  return COMPARISON_MODELS.includes(m);
}

export const DEFAULT_MODEL: PoeModel = (process.env.NEXT_PUBLIC_POE_DEFAULT_MODEL || "GPT-OSS-20B") as PoeModel;
```


### lib/types.ts
```ts
export type Telemetry = {
  promptHash: string;
  leftModel: string;
  rightModel: string;
  params: { temperature: number; top_p: number; max_tokens: number };
  left: { ttfbMs: number | null; totalMs: number | null; promptTokens?: number; completionTokens?: number; error?: string };
  right:{ ttfbMs: number | null; totalMs: number | null; promptTokens?: number; completionTokens?: number; error?: string };
};

export type VoteOutcome = "LEFT" | "RIGHT" | "TIE" | "BOTH_BAD";

export type DuelRecord = Telemetry & {
  vote: VoteOutcome | null;
  voterNote?: string;
  createdAt: string;
};
```


### app/api/health/route.ts
```ts
import { NextResponse } from "next/server";
import { COMPARISON_MODELS, DEFAULT_MODEL } from "@/lib/models";

export async function GET() {
  const ok = Boolean(process.env.POE_API_KEY);
  return NextResponse.json({ ok, hasKey: ok, defaultModel: DEFAULT_MODEL, models: COMPARISON_MODELS });
}
```


### app/api/smoke/route.ts
```ts
import { NextResponse } from "next/server";
import { poe } from "@/lib/poe";
import { isAllowedModel, DEFAULT_MODEL } from "@/lib/models";

export async function POST(req: Request) {
  const { model = DEFAULT_MODEL, content = "Say 'pong'" } = await req.json().catch(() => ({}));
  if (!isAllowedModel(model)) return NextResponse.json({ error: "Unknown model" }, { status: 400 });
  try {
    const res = await poe.chat.completions.create({
      model,
      messages: [{ role: "user", content }],
      max_tokens: 32,
      stream: false,
      temperature: 0,
    });
    return NextResponse.json({ id: res.id, model: res.model, text: res.choices?.[0]?.message?.content ?? "", usage: res.usage });
  } catch (e: any) {
    return NextResponse.json({ error: e?.message || "request_failed" }, { status: 500 });
  }
}
```


### app/page.tsx
```tsx
"use client";
import { useEffect, useState } from "react";

export default function Home() {
  const [health, setHealth] = useState<any>(null);
  const [loading, setLoading] = useState(false);
  const [output, setOutput] = useState("");

  useEffect(() => { fetch("/api/health").then(r => r.json()).then(setHealth); }, []);

  async function runSmoke() {
    setLoading(true);
    setOutput("");
    const r = await fetch("/api/smoke", { method: "POST", body: JSON.stringify({ content: "pong?" }) });
    const json = await r.json();
    setOutput(JSON.stringify(json, null, 2));
    setLoading(false);
  }

  return (
    <main className="mx-auto max-w-2xl p-6 space-y-4">
      <h1 className="text-2xl font-semibold">ALAIN Kit • Poe Scaffold</h1>
      <pre className="rounded bg-gray-100 p-3 text-sm overflow-auto">{JSON.stringify(health, null, 2)}</pre>
      <button onClick={runSmoke} disabled={loading} className="rounded px-4 py-2 bg-black text-white">
        {loading ? "Running..." : "Run smoke test"}
      </button>
      {output && <pre className="rounded bg-gray-100 p-3 text-sm overflow-auto">{output}</pre>}
    </main>
  );
}
```


# 2. Duel streaming, selector, and voting

You will add a server-sent events API to stream two Poe models and a UI with hidden identities and voting.


**Checkpoint 2**
- Open `/arena` and start a duel
- Confirm both panes stream and Telemetry appears after `done`
- Vote once and see identities reveal

### app/api/duel/route.ts
```ts
import { NextRequest } from "next/server";
import { poe } from "@/lib/poe";
import { isAllowedModel } from "@/lib/models";
import crypto from "crypto";

export const runtime = "nodejs";

type SummarySide = {
  ttfbMs: number | null;
  totalMs: number | null;
  usage?: { prompt_tokens?: number; completion_tokens?: number };
  error?: string;
};

type Summary = {
  promptHash: string;
  params: { temperature: number; top_p: number; max_tokens: number };
  left: SummarySide;
  right: SummarySide;
};

function sseHeaders() {
  return {
    "Content-Type": "text/event-stream; charset=utf-8",
    "Cache-Control": "no-cache, no-transform",
    Connection: "keep-alive",
    "X-Accel-Buffering": "no",
  } as Record<string, string>;
}

function writeEvent(writer: WritableStreamDefaultWriter<Uint8Array>, name: string, data: unknown) {
  const enc = new TextEncoder();
  const payload = `event: ${name}
data: ${JSON.stringify(data)}

`;
  return writer.write(enc.encode(payload));
}

export async function GET(req: NextRequest) {
  const url = new URL(req.url);
  const left = url.searchParams.get("left") || "";
  const right = url.searchParams.get("right") || "";
  const promptRaw = url.searchParams.get("prompt") || "";
  const temperature = Number(url.searchParams.get("temperature") ?? 0.2);
  const top_p = Number(url.searchParams.get("top_p") ?? 1);
  const max_tokens = Number(url.searchParams.get("max_tokens") ?? 512);

  if (!isAllowedModel(left) || !isAllowedModel(right)) return new Response("Unknown model", { status: 400 });

  const prompt = decodeURIComponent(promptRaw);
  const promptHash = crypto.createHash("sha256").update(prompt).digest("hex").slice(0, 16);
  const messages = [{ role: "user" as const, content: prompt }];

  const { readable, writable } = new TransformStream<Uint8Array, Uint8Array>();
  const writer = writable.getWriter();

  const summary: Summary = {
    promptHash,
    params: { temperature, top_p, max_tokens },
    left: { ttfbMs: null, totalMs: null },
    right: { ttfbMs: null, totalMs: null },
  };

  let closed = false;
  const safeClose = () => { if (!closed) { closed = true; try { writer.close(); } catch {} } };

  async function runSide(side: "left" | "right", model: string) {
    const start = Date.now();
    let firstTokenAt: number | null = null;
    let lastChunk = Date.now();
    const softTimeoutMs = 60000;

    try {
      const stream: any = await poe.chat.completions.create({
        model,
        messages,
        temperature,
        top_p,
        max_tokens,
        stream: true,
      });

      for await (const part of stream) {
        const delta = part?.choices?.[0]?.delta?.content ?? "";
        if (delta) {
          lastChunk = Date.now();
          if (firstTokenAt === null) firstTokenAt = lastChunk;
          await writeEvent(writer, "chunk", { side, delta });
        }
        if (Date.now() - lastChunk > softTimeoutMs) {
          summary[side] = { ...summary[side], error: "soft_timeout" };
          break;
        }
      }

      let usage: any = undefined;
      try { const final = await stream.finalChatCompletion?.(); usage = final?.usage; } catch {}

      const totalMs = Date.now() - start;
      const ttfbMs = firstTokenAt ? firstTokenAt - start : null;
      summary[side] = { ...summary[side], ttfbMs, totalMs, usage };
    } catch (e: any) {
      const totalMs = Date.now() - start;
      summary[side] = { ...summary[side], ttfbMs: null, totalMs, error: String(e?.message || e) };
      await writeEvent(writer, "error", { side, message: String(e?.message || e) });
    }
  }

  (async () => {
    try {
      await Promise.all([runSide("left", left), runSide("right", right)]);
      await writeEvent(writer, "done", summary);
    } catch (e: any) {
      await writeEvent(writer, "error", { message: String(e?.message || e) });
    } finally {
      safeClose();
    }
  })();

  req.signal.addEventListener("abort", safeClose);
  return new Response(readable, { headers: sseHeaders() });
}
```


### lib/elo.ts
```ts
export type Outcome = "LEFT" | "RIGHT" | "TIE" | "BOTH_BAD";

export function expectedScore(ra: number, rb: number) {
  return 1 / (1 + Math.pow(10, (rb - ra) / 400));
}

export function updateElo(ra: number, rb: number, outcome: Outcome, K = 24) {
  if (outcome === "BOTH_BAD") return [ra, rb] as const;
  const ea = expectedScore(ra, rb);
  const sa = outcome === "LEFT" ? 1 : outcome === "RIGHT" ? 0 : 0.5;
  const sb = 1 - sa;
  const na = ra + K * (sa - ea);
  const nb = rb + K * (sb - (1 - ea));
  return [Math.round(na), Math.round(nb)] as const;
}

export function recomputeElo(
  pairs: { left: string; right: string; outcome: Outcome }[],
  initial: Record<string, number> = {},
  K = 24
) {
  const ratings: Record<string, number> = { ...initial };
  for (const p of pairs) {
    if (!(p.left in ratings)) ratings[p.left] = 1500;
    if (!(p.right in ratings)) ratings[p.right] = 1500;
    const [na, nb] = updateElo(ratings[p.left], ratings[p.right], p.outcome, K);
    ratings[p.left] = na;
    ratings[p.right] = nb;
  }
  return ratings;
}
```


### lib/store.ts (Zustand client)
```ts
"use client";
import { create } from "zustand";
import { updateElo, Outcome } from "./elo";

export type DuelSummary = {
  promptHash: string;
  leftModel: string;
  rightModel: string;
  left: { ttfbMs: number | null; totalMs: number | null; completionTokens?: number; promptTokens?: number };
  right:{ ttfbMs: number | null; totalMs: number | null; completionTokens?: number; promptTokens?: number };
};

type RatingsState = {
  ratings: Record<string, number>;
  history: { summary: DuelSummary; outcome: Outcome; at: string }[];
  ensure(model: string): void;
  record(outcome: Outcome, leftModel: string, rightModel: string, summary: DuelSummary): void;
  reset(): void;
};

export const useRatings = create<RatingsState>((set, get) => ({
  ratings: {},
  history: [],
  ensure(model: string) {
    const ratings = { ...get().ratings };
    if (!(model in ratings)) ratings[model] = 1500;
    set({ ratings });
  },
  record(outcome, leftModel, rightModel, summary) {
    const ratings = { ...get().ratings };
    if (!(leftModel in ratings)) ratings[leftModel] = 1500;
    if (!(rightModel in ratings)) ratings[rightModel] = 1500;
    const [na, nb] = updateElo(ratings[leftModel], ratings[rightModel], outcome);
    ratings[leftModel] = na;
    ratings[rightModel] = nb;
    const history = [{ summary, outcome, at: new Date().toISOString() }, ...get().history].slice(0, 50);
    set({ ratings, history });
  },
  reset() { set({ ratings: {}, history: [] }); },
}));
```


### app/arena/page.tsx (fixed)
```tsx
"use client";
import { useEffect, useMemo, useRef, useState } from "react";
import { COMPARISON_MODELS } from "@/lib/models";
import { useRatings } from "@/lib/store";

type SSEDone = {
  promptHash: string;
  params: { temperature: number; top_p: number; max_tokens: number };
  left: { ttfbMs: number | null; totalMs: number | null; usage?: { prompt_tokens?: number; completion_tokens?: number }; error?: string };
  right:{ ttfbMs: number | null; totalMs: number | null; usage?: { prompt_tokens?: number; completion_tokens?: number }; error?: string };
};

function Pane({ title, text }: { title: string; text: string }) {
  return (
    <div className="flex-1 rounded border p-3 min-h-[200px]">
      <div className="text-xs uppercase tracking-wide text-gray-500 mb-2">{title}</div>
      <div className="whitespace-pre-wrap text-sm leading-6">{text || ""}</div>
    </div>
  );
}

export default function ArenaPage() {
  const models = COMPARISON_MODELS as string[];
  const [leftModel, setLeftModel] = useState(models[0] || "GPT-OSS-20B");
  const [rightModel, setRightModel] = useState(models[1] || models[0] || "GPT-5");
  const [prompt, setPrompt] = useState("Write a concise explanation of HTTP/1.1 vs HTTP/2.");
  const [temperature, setTemperature] = useState(0.2);
  const [topP, setTopP] = useState(1);
  const [maxTokens, setMaxTokens] = useState(512);
  const [leftText, setLeftText] = useState("");
  const [rightText, setRightText] = useState("");
  const [streaming, setStreaming] = useState(false);
  const [masked, setMasked] = useState(true);
  const [summary, setSummary] = useState<SSEDone | null>(null);
  const esRef = useRef<EventSource | null>(null);
  const ratings = useRatings(s => s.ratings);
  const ensure = useRatings(s => s.ensure);
  const record = useRatings(s => s.record);

  useEffect(() => { ensure(leftModel); ensure(rightModel); }, [leftModel, rightModel, ensure]);
  useEffect(() => () => { esRef.current?.close?.(); }, []);

  const maskedTitles = useMemo(() => ({
    left: masked ? "Model A" : leftModel,
    right: masked ? "Model B" : rightModel,
  }), [masked, leftModel, rightModel]);

  function stopStream() {
    esRef.current?.close?.();
    esRef.current = null;
    setStreaming(false);
  }

  async function startDuel() {
    stopStream();
    setLeftText("");
    setRightText("");
    setSummary(null);
    setStreaming(true);
    const url = `/api/duel?left=${encodeURIComponent(leftModel)}&right=${encodeURIComponent(rightModel)}&prompt=${encodeURIComponent(prompt)}&temperature=${temperature}&top_p=${topP}&max_tokens=${maxTokens}`;
    const es = new EventSource(url);
    esRef.current = es;

    let lastAt = Date.now();
    const inactivityMs = 70000;
    const timer = setInterval(() => {
      if (!esRef.current) return;
      if (Date.now() - lastAt > inactivityMs) { console.warn("SSE inactivity timeout"); stopStream(); }
    }, 5000);

    es.addEventListener("chunk", (e: MessageEvent) => {
      try {
        const { side, delta } = JSON.parse(e.data);
        lastAt = Date.now();
        if (side === "left") setLeftText(prev => prev + delta);
        else if (side === "right") setRightText(prev => prev + delta);
      } catch {}
    });

    es.addEventListener("error", (e) => {
      console.error("SSE error", e);
      clearInterval(timer);
      stopStream();
    });

    es.addEventListener("done", (e: MessageEvent) => {
      try { setSummary(JSON.parse(e.data)); } catch {}
      clearInterval(timer);
      stopStream();
    });
  }

  function vote(outcome: "LEFT" | "RIGHT" | "TIE" | "BOTH_BAD") {
    if (!summary) return;
    const duelSummary = {
      promptHash: summary.promptHash,
      leftModel,
      rightModel,
      left: {
        ttfbMs: summary.left?.ttfbMs ?? null,
        totalMs: summary.left?.totalMs ?? null,
        completionTokens: summary.left?.usage?.completion_tokens,
        promptTokens: summary.left?.usage?.prompt_tokens,
      },
      right: {
        ttfbMs: summary.right?.ttfbMs ?? null,
        totalMs: summary.right?.totalMs ?? null,
        completionTokens: summary.right?.usage?.completion_tokens,
        promptTokens: summary.right?.usage?.prompt_tokens,
      },
    };
    record(outcome, leftModel, rightModel, duelSummary);
    setMasked(false);
  }

  const leftRating = ratings[leftModel] ?? 1500;
  const rightRating = ratings[rightModel] ?? 1500;

  return (
    <main className="mx-auto max-w-5xl p-6 space-y-4">
      <h1 className="text-2xl font-semibold">Duel Arena</h1>
      <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
        <div>
          <label className="block text-sm">Left model</label>
          <select value={leftModel} onChange={e => setLeftModel(e.target.value)} className="w-full border rounded p-2">
            {models.map(m => <option key={m} value={m}>{m}</option>)}
          </select>
        </div>
        <div>
          <label className="block text-sm">Right model</label>
          <select value={rightModel} onChange={e => setRightModel(e.target.value)} className="w-full border rounded p-2">
            {models.map(m => <option key={m} value={m}>{m}</option>)}
          </select>
        </div>
      </div>

      <div>
        <label className="block text-sm">Prompt</label>
        <textarea value={prompt} onChange={e => setPrompt(e.target.value)} className="w-full border rounded p-2 h-28" />
      </div>

      <div className="grid grid-cols-3 gap-3">
        <div>
          <label className="block text-xs">temperature</label>
          <input type="number" step={0.1} min={0} max={1} value={temperature} onChange={e => setTemperature(Number(e.target.value))} className="w-full border rounded p-2" />
        </div>
        <div>
          <label className="block text-xs">top_p</label>
          <input type="number" step={0.1} min={0} max={1} value={topP} onChange={e => setTopP(Number(e.target.value))} className="w-full border rounded p-2" />
        </div>
        <div>
          <label className="block text-xs">max_tokens</label>
          <input type="number" step={1} min={32} max={2048} value={maxTokens} onChange={e => setMaxTokens(Number(e.target.value))} className="w-full border rounded p-2" />
        </div>
      </div>

      <div className="flex items-center gap-3">
        <button onClick={startDuel} disabled={streaming} className="rounded px-4 py-2 bg-black text-white">{streaming ? "Streaming..." : "Start duel"}</button>
        <button onClick={() => setMasked(v => !v)} className="rounded px-3 py-2 border">{masked ? "Reveal after vote" : "Mask names"}</button>
        <div className="text-sm text-gray-600">Ratings • {leftModel}: {leftRating} | {rightModel}: {rightRating}</div>
      </div>

      <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
        <Pane title={maskedTitles.left} text={leftText} />
        <Pane title={maskedTitles.right} text={rightText} />
      </div>

      <div className="flex gap-2">
        <button onClick={() => vote("LEFT")} disabled={!summary} className="rounded px-3 py-2 border">Left wins</button>
        <button onClick={() => vote("RIGHT")} disabled={!summary} className="rounded px-3 py-2 border">Right wins</button>
        <button onClick={() => vote("TIE")} disabled={!summary} className="rounded px-3 py-2 border">Tie</button>
        <button onClick={() => vote("BOTH_BAD")} disabled={!summary} className="rounded px-3 py-2 border">Both bad</button>
      </div>

      {summary && (
        <div className="rounded border p-3 text-sm">
          <div className="font-medium mb-1">Telemetry</div>
          <pre className="whitespace-pre-wrap">{JSON.stringify(summary, null, 2)}</pre>
          <div className="text-xs text-gray-600">Prompt hash: {summary.promptHash}</div>
        </div>
      )}
    </main>
  );
}
```


# 3. Ranking math, leaderboard, and history

Wire votes into ratings and render a leaderboard.


**Checkpoint 3**
- After 3+ duels, open `/leaderboard`
- Ratings and win rates should reflect votes

### lib/metrics.ts
```ts
import { Outcome, recomputeElo } from "./elo";

export type Pair = { left: string; right: string; outcome: Outcome };

export function toPairs(history: { summary: any; outcome: Outcome }[]): Pair[] {
  return history.map(h => ({ left: h.summary.leftModel, right: h.summary.rightModel, outcome: h.outcome }));
}

export function counts(history: { summary: any; outcome: Outcome }[]) {
  const played: Record<string, number> = {};
  const wins: Record<string, number> = {};
  const ties: Record<string, number> = {};
  const losses: Record<string, number> = {};
  for (const h of history) {
    const L = h.summary.leftModel; const R = h.summary.rightModel;
    played[L] = (played[L] || 0) + 1; played[R] = (played[R] || 0) + 1;
    if (h.outcome === "LEFT") { wins[L] = (wins[L] || 0) + 1; losses[R] = (losses[R] || 0) + 1; }
    else if (h.outcome === "RIGHT") { wins[R] = (wins[R] || 0) + 1; losses[L] = (losses[L] || 0) + 1; }
    else if (h.outcome === "TIE") { ties[L] = (ties[L] || 0) + 1; ties[R] = (ties[R] || 0) + 1; }
  }
  const winRate: Record<string, number> = {};
  Object.keys(played).forEach(m => {
    const w = wins[m] || 0, t = ties[m] || 0, p = played[m] || 1;
    winRate[m] = (w + 0.5 * t) / p;
  });
  return { played, wins, losses, ties, winRate };
}

export function bootstrapEloCI(history: { summary: any; outcome: Outcome }[], models: string[], B = 200, K = 24) {
  if (history.length === 0) return Object.fromEntries(models.map(m => [m, { lo: NaN, hi: NaN }])) as Record<string, { lo: number; hi: number }>;
  const pairs = toPairs(history);
  const results: Record<string, number[]> = Object.fromEntries(models.map(m => [m, [] as number[]]));
  for (let b = 0; b < B; b++) {
    const sample: typeof pairs = [];
    for (let i = 0; i < pairs.length; i++) sample.push(pairs[Math.floor(Math.random() * pairs.length)]);
    const r = recomputeElo(sample, {}, K);
    models.forEach(m => results[m].push(r[m] ?? 1500));
  }
  const ci: Record<string, { lo: number; hi: number }> = {};
  for (const m of models) {
    const arr = results[m].sort((a, b) => a - b);
    const lo = arr[Math.floor(0.16 * arr.length)] ?? NaN;
    const hi = arr[Math.floor(0.84 * arr.length)] ?? NaN;
    ci[m] = { lo: Math.round(lo), hi: Math.round(hi) };
  }
  return ci;
}
```


### app/leaderboard/page.tsx
```tsx
"use client";
import { useMemo, useState } from "react";
import { useRatings } from "@/lib/store";
import { counts, bootstrapEloCI } from "@/lib/metrics";

export default function LeaderboardPage() {
  const ratings = useRatings(s => s.ratings);
  const history = useRatings(s => s.history);
  const reset = useRatings(s => s.reset);

  const models = useMemo(() => Object.keys(ratings).sort(), [ratings]);
  const stats = useMemo(() => counts(history), [history]);

  const rows = useMemo(() => {
    return Object.entries(ratings)
      .map(([model, rating]) => ({ model, rating, played: stats.played[model] || 0, winRate: stats.winRate[model] || 0 }))
      .sort((a, b) => b.rating - a.rating);
  }, [ratings, stats]);

  const [ci, setCI] = useState<Record<string, { lo: number; hi: number }> | null>(null);
  function computeCI() { setCI(bootstrapEloCI(history, models, 200, 24)); }

  return (
    <main className="mx-auto max-w-4xl p-6 space-y-4">
      <h1 className="text-2xl font-semibold">Leaderboard</h1>
      <div className="flex gap-2 items-center">
        <button onClick={computeCI} className="rounded px-3 py-2 border">Compute CI</button>
        <button onClick={reset} className="rounded px-3 py-2 border">Reset</button>
      </div>
      <table className="w-full text-sm border-collapse">
        <thead><tr className="text-left border-b"><th>Rank</th><th>Model</th><th>Rating</th><th>68% CI</th><th>Played</th><th>Win rate</th></tr></thead>
        <tbody>
          {rows.map((r, idx) => (
            <tr key={r.model} className="border-b">
              <td className="py-2">{idx + 1}</td>
              <td className="py-2">{r.model}</td>
              <td className="py-2">{r.rating}</td>
              <td className="py-2">{ci ? `${ci[r.model]?.lo ?? "?"}–${ci[r.model]?.hi ?? "?"}` : "-"}</td>
              <td className="py-2">{r.played}</td>
              <td className="py-2">{(r.winRate * 100).toFixed(1)}%</td>
            </tr>
          ))}
        </tbody>
      </table>
    </main>
  );
}
```


# 4. Telemetry capture and normalization

Record TTFB, total latency, token usage, tokens per second, and ms per 1k completion tokens. Store prompt hashes only.


### lib/telemetry.ts (client)
```ts
"use client";
import { create } from "zustand";
import { persist } from "zustand/middleware";

export type SideTelem = {
  model: string;
  ttfbMs: number | null;
  totalMs: number | null;
  promptTokens?: number;
  completionTokens?: number;
  error?: string;
  refusal?: boolean;
  tps?: number | null;
  msPer1k?: number | null;
};

export type TelemetryRecord = {
  promptHash: string;
  params: { temperature: number; top_p: number; max_tokens: number };
  left: SideTelem;
  right: SideTelem;
  at: string;
};

type TelemetryState = {
  records: TelemetryRecord[];
  add: (rec: TelemetryRecord) => void;
  reset: () => void;
};

function enrich(side: Omit<SideTelem, "tps" | "msPer1k">): SideTelem {
  const total = side.totalMs ?? null;
  const ctok = side.completionTokens ?? 0;
  const tps = total && ctok > 0 ? ctok / (total / 1000) : null;
  const msPer1k = total && ctok > 0 ? total / (ctok / 1000) : null;
  return { ...side, tps, msPer1k };
}

export const useTelemetry = create<TelemetryState>()(
  persist(
    (set, get) => ({
      records: [],
      add(rec) {
        const left = enrich(rec.left);
        const right = enrich(rec.right);
        set({ records: [{ ...rec, left, right }, ...get().records].slice(0, 500) });
      },
      reset() { set({ records: [] }); },
    }),
    { name: "alain-telemetry-store" }
  )
);
```


### Quick check: Elo numeric example (Python)
Run this to verify the Elo math locally.


In [None]:
def expected_score(ra, rb):
    import math
    return 1 / (1 + 10 ** ((rb - ra) / 400))

def update_elo(ra, rb, outcome, K=24):
    # outcome: 'LEFT','RIGHT','TIE'
    if outcome == 'BOTH_BAD':
        return ra, rb
    ea = expected_score(ra, rb)
    sa = 1 if outcome=='LEFT' else 0 if outcome=='RIGHT' else 0.5
    sb = 1 - sa
    na = ra + K * (sa - ea)
    nb = rb + K * (sb - (1 - ea))
    return round(na), round(nb)

print('Ea(1500,1500)=', expected_score(1500,1500))
print('Left win ->', update_elo(1500,1500,'LEFT',24))
print('Tie ->', update_elo(1500,1500,'TIE',24))

### Synthetic SSE parsing demo (Python)
This cell simulates SSE chunks and shows how a tolerant parser collects `chunk` events and detects a final `done` message.


In [None]:
import io, time, json

def simulate_sse():
    stream = io.StringIO()
    msgs = [
        ('chunk', {'side':'left','delta':'Hello '}),
        ('chunk', {'side':'right','delta':'Hola '}),
        ('chunk', {'side':'left','delta':'world'}),
        ('done', {'summary': {'left':{'ttfbMs': 120, 'totalMs': 900}, 'right':{'ttfbMs':130,'totalMs':950}}}),
    ]
    for name, data in msgs:
        stream.write(f"event: {name}\n")
        stream.write("data: " + json.dumps(data) + "\n\n")
    return stream.getvalue()

sse_text = simulate_sse()
# Parse
chunks = []
summary = None
for block in sse_text.strip().split("\n\n"):
    lines = block.split("\n")
    ev = None; data = None
    for ln in lines:
        if ln.startswith("event:"): ev = ln.split(":",1)[1].strip()
        if ln.startswith("data:"): data = json.loads(ln.split(":",1)[1].strip())
    if ev == 'chunk': chunks.append(data)
    if ev == 'done': summary = data['summary']

print('Collected chunks:', chunks)
print('Summary:', summary)

# 5. Safety, fairness, and teacher auto-judge

- Add `/api/moderate` for simple local checks and optional teacher classification
- Mask identities until vote and reveal after
- Optional `/api/judge` uses the teacher to output a rubric and verdict JSON


# 6. Packaging, rate limits, tests, and assessments

- Add Tailwind, path aliases, and `.gitignore` entries
- Add leaky bucket and exponential backoff in `lib/ratelimit.ts`
- Provide POST NDJSON streaming variant for long prompts
- Add Vitest tests for Elo and model guard
- Add `/learn` page with MCQs


# 7. Publishing, legal, probes, and troubleshooting

- `/api/probe` runs sequential small streams to measure TTFB and totals per model
- `/admin` page runs probes and shows results
- `/legal` page publishes attribution and disclosures
- `lib/errors.ts` normalizes common upstream failures


## QA summary and commands
- `npm run typecheck` should pass
- `npm run lint` should pass
- `npm run test` should pass Elo and model guard tests
- Manual checks:
  1. GET EventSource duel on `/arena`
  2. Ratings update on vote
  3. `/leaderboard` renders, CI computes
  4. `/telemetry` shows non null averages
  5. `/admin` probe runs and returns usage where available
  6. `/legal` visible

**Fairness checklist**: parameter parity, identity masking, prompt hashing, refusal tracking, normalization per 1k.
