Skip to content

PAIDEIA-codex 26.04.24

Choose a tag to compare

@TaewoooPark TaewoooPark released this 24 Apr 11:35
· 18 commits to main since this release

Update Catalog

26.04.24 update

Codex edition reaches study-graph parity

PAIDEIA-codex was born as a sibling of the Claude Code edition, not a fork. The user interface is different (Codex CLI, AGENTS.md instead of CLAUDE.md, a stdio MCP server instead of inline scripts), but the on-disk artifact is supposed to be byte-for-byte the same: course-index/patterns.md, errors/log.md, weakmap/weakmap_<ts>.md, cheatsheet/final.md. You should be able to fork a course folder out of the Claude edition into the Codex edition — or vice versa — and have the new runner pick up without friction.

For that to actually hold, the semantics that read those artifacts have to agree. In 0.1.x the Codex edition quietly disagreed on three of them. paideia_mcp.course_phase treated quizzes/*.md as evidence of drilling even when no problem had been solved. $paideia-blind wrote a parallel YAML schema (pattern_missed_initial: / strategy_error_type:) that every downstream reader silently skipped. And the answer-PDF lifecycle ended at "OCR succeeded" — the original scan kept living in answers/ forever, so the "most recently modified in answers/" heuristic inside $paideia-grade re-picked yesterday's file when the student uploaded a newer one today.

This release closes those disagreements. The phase ladder now advances only when the student has actually graded something. The error log has one shape across every writer — grade, blind, and any future drill. Graded scans are archived after OCR, so the next run sees the next file.

What this fixes

The richest cases are the ones where a course folder would look correct on disk but the Codex edition interpreted it differently from the Claude one:

  • Audit a seeded patterns file. Dropping a course-index/patterns.md from last semester's fork no longer flips the phase to drill. paideia-mcp.course_phase now gates drill on at least one graded pattern: entry in errors/log.md. An artifact that was never acted on is not the same signal as one the student produced, so it doesn't move the phase forward.
  • Audit a $paideia-blind error. Running $paideia-blind hw3-p2 and failing on the pattern axis now writes the same pattern: / error_type: / source: keys that $paideia-grade writes. The blind entry appears in the next $paideia-weakmap without manual edits, and paideia-mcp.course_phase's top-miss counter picks it up immediately. In 0.1.x those entries were present on disk but invisible to every consumer, because every consumer pattern-matched on pattern: and blind wrote pattern_missed_initial:.
  • Audit a mock-exam phase transition. mock fires only when an errors/log.md entry has a source: containing mock — i.e., a mock was actually graded. Seeding an empty mock/<ts>.md no longer advances the phase, which used to let the display race ahead of the student's real progress.
  • Audit a second scan of the same assignment. After $paideia-grade answers/hw3.pdf succeeds, the MCP moves answers/hw3.pdf into answers/_archive/hw3_<ts>.pdf. Re-running $paideia-grade with no positional argument no longer re-picks the stale hw3.pdf; it picks the genuinely most recent file. The converted answers/converted/hw3.md stays put and is version-controlled — only the bulky scan is archived.
  • Audit a cross-course Qwen3-VL prompt. The qwen3-vl engine used to inject the phrase "math / physics course" into every page prompt regardless of course. It now reads COURSE_NAME from .course-meta and substitutes it, so a Complex Analysis folder no longer transcribes under a generic framing. The default fallback is unchanged ("math / physics") for folders without a .course-meta.

How it is used

Existing Codex users update by pulling main. Course folders require no migration: the phase detector reads errors/log.md the same way it always did, and nothing in 0.1.x could have written the legacy blind schema into a folder that also had graded entries (the legacy keys only appeared in blind-only folders). A fresh $paideia-init-course in a new course folder seeds the canonical schema directly via bootstrap.py's updated ERRORS_LOG_SEED comment.

In a fresh session:

  • $paideia-phase computes setup → diag → drill → mock → cram → cool from three orthogonal signals: artifact-exists, has-graded-entry, mock-was-graded. Create a patterns.md but never quiz: diag. Grade one problem: drill. Grade a mock: mock. Drop a cheatsheet/final.md: cram. Delete them back: the phase regresses. Unlike 0.1.x, the display tracks what the student did, not what the filesystem declares.
  • $paideia-grade returns an extra archived_to field in its paideia-mcp.grade_pdf response. The skill can surface the archive path in its closing line if the user cares; skipping the line is fine too, the archive happens regardless.
  • $paideia-blind writes one schema, and that schema is literally the canonical one from paideia-grade/SKILL.md §6. A student migrating from 0.1.x sees their new blind entries participate in the weakmap immediately — no manual edits, no schema translation pass.

The phase tool owns correctness, not freshness

Claude Code's statusline.py re-renders the phase display on every prompt, which is why that edition introduced an mtime-indexed disk cache: the display has to be cheap enough to recompute hundreds of times per session, and the cache is the only way to keep that bounded on mature course folders. Codex CLI has no persistent statusline slot. The phase is surfaced on demand via $paideia-phase — which calls the MCP tool exactly once per invocation — or by other skills that want to know where the student is in the cycle (also once per call).

Because of that, paideia-mcp.course_phase has no cache layer and does not need one. Freshness is guaranteed by the fact that the tool re-reads errors/log.md, cheatsheet/, mock/, and .course-meta every time it runs. This is the right trade-off under Codex's semantics: the caller decided this was the moment to ask, so we read the disk. Adding a cache would save wall time only if the same skill called the tool repeatedly within a session without the course folder changing — which is not a pattern any existing skill exhibits.

This is also why the "mtime-indexed cache" patch from the Claude edition's 0.6.0 release is explicitly not ported: there is nothing here it would make faster, and introducing it would add an invalidation surface with no compensating win.

What this does not change

The three OCR engines (codex-native / qwen3-vl / tesseract), the ingest_pdfs / grade_pdf / build_course_index / course_phase MCP tool boundaries, the markdown contents of answers/converted/, the 15 skills, the coloring of $paideia-phase's output, and the .mcp.json wiring all behave identically to 0.1.x. A course folder set up under the old version continues to work without migration.

The codex-native engine is also unaffected: its prompt lives in paideia-grade/SKILL.md Step 2a, not in paideia_mcp/ocr/qwen3vl.py, so the course-name parameterization is specific to the Ollama path. Users who stay on the default engine see no prompt-level difference.

The Claude-edition SessionStart hook is intentionally not ported. Codex has no session-start hook mechanism: AGENTS.md is read every turn but it is static markdown, so it cannot inject live phase or top-miss state the way a Python script can. The user-invoked $paideia-phase is the Codex-native equivalent, and it has been the recommended first-turn command since the phase tool shipped.

Technical notes

  • plugins/paideia/paideia-mcp/paideia_mcp/phase.py gains _has_error_entries (regex-searches errors/log.md for ^\s*pattern:\s*P\d+) and _mock_was_graded (iterates source: lines and returns true if any contains mock). detect_phase is split into single-responsibility branches on those helpers. Module docstring is updated to describe the activity-based ladder so future readers don't re-introduce file-existence gating by accident.
  • plugins/paideia/paideia-mcp/paideia_mcp/ocr/qwen3vl.py renames PROMPT to PROMPT_TEMPLATE with a {course} placeholder, introduces build_prompt(course_name), and has transcribe_page / transcribe_pages thread the resolved course name through the Ollama call. _is_noise_sentence is factored out of _dedupe_loops with a Korean + English hedge-prefix tuple. _strip_ngram_tail(text, n=5, min_repeats=3) trims trailing n-gram loops that survive sentence-level dedup. _WARMUP_TIMEOUT = 60.0 is separated from _PER_PAGE_TIMEOUT = 1800.0 so a hung Ollama daemon at startup surfaces quickly instead of hiding under the long per-page ceiling.
  • plugins/paideia/paideia-mcp/paideia_mcp/ocr/__init__.py adds _resolve_course_name(project_root) which reads .course-meta::COURSE_NAME via paideia_mcp.phase.parse_meta. run_ocr plumbs the resolved course name through to qwen3vl.transcribe_pages; tesseract ignores it (no prompt to parameterize).
  • plugins/paideia/paideia-mcp/paideia_mcp/grade.py adds _archive_if_under_answers(pdf, root). Fires only when the resolved PDF path is exactly two components deep under <root>/ and the first component is answers — so absolute-path targets outside the course root are left alone and already-archived files under answers/_archive/ aren't re-archived. Timestamp is UTC YYYYMMDD-HHMMSSZ. Idempotent on missing files (returns None and leaves the source alone). Both response dicts (ocr-complete and rasterize-only) gain an archived_to field.
  • plugins/paideia/skills/paideia-grade/SKILL.md §6 is marked canonical — the single source of truth for the errors/log.md YAML schema — with a note that downstream readers (paideia-mcp.course_phase, $paideia-phase, $paideia-weakmap) pattern-match on pattern: and source:, and any drift silently hides entries. paideia-blind/SKILL.md §7 references this canonical schema and documents the strategy-axis → error_type mapping (pattern axis → pattern-missed, variable axis → wrong-variable, end-form axis → wrong-end-form), rather than defining a parallel schema.
  • plugins/paideia/skills/paideia-init-course/scripts/bootstrap.py's ERRORS_LOG_SEED comment now spells out the canonical six-key schema including source: and points at paideia-grade/SKILL.md §6. The GITIGNORE block is unchanged — answers/**/*.pdf already covers answers/_archive/*.pdf, so the archive is gitignored without an extra rule.
  • plugins/paideia/.codex-plugin/plugin.json and .claude-plugin/marketplace.json bump from 0.1.0 to 0.2.0. No new runtime dependencies. No changes to any skill's argument contract or any MCP tool's input shape.

Notes

  • The _archive_if_under_answers depth check intentionally requires exactly two path components (answers/<name>.pdf), not merely "starts with answers/". That excludes answers/_archive/… (already-archived files should never be re-archived) and answers/converted/… (which shouldn't be PDFs anyway, but the check is belt-and-suspenders). Users who manage scans in a deeper subtree — e.g. answers/scans/hw3.pdf — are left alone on purpose; the caller chose that layout, so archiving would surprise them.
  • The canonical source: values are loose on purpose. answers/converted/<stem>.md, blind/<id>, and mock/<ts>.md are the conventions, but paideia-mcp.course_phase only substring-matches on the word mock. That means any future drill that wants to be treated as a mock for phase purposes simply has to include mock in its source: value — no phase-detector change required. If multiple drills want finer-grained phase rules later, the single _SOURCE_RX helper in phase.py is the right place to extend.
  • The Claude-edition SessionStart hook is listed as "not ported" rather than "cannot be ported." If Codex CLI ever adds a session-start hook mechanism, the same two-line reminder (phase + top miss with a suggested next command) could plug in trivially — paideia-mcp.course_phase returns everything it would need. Until then, the manual workflow is: open Codex CLI in a course folder, run $paideia-phase, then read the suggestion. Three keystrokes instead of zero, but with no statusline slot and no hook to inject, there is no cheaper way.
  • Qwen3-VL's Korean hedge list was drawn from observed failure modes on real scans: 잠깐 / 음 / 아 / 근데 / 사실 are the five that showed up most often. The prefix match is literal (not tokenized), which means a sentence starting with a Korean hedge followed by , or . is dropped — but a hedge buried mid-sentence is left alone, because the surrounding context is usually load-bearing content. Adjustments to the list belong in _HEDGE_PREFIXES inside paideia_mcp/ocr/qwen3vl.py; the rest of the dedup pipeline is prefix-agnostic.
  • Cross-edition migration is now symmetric: a course folder written by the Claude edition reads correctly in the Codex edition, and vice versa. The only artifact that differs between editions is the hook / statusline wiring in .claude/settings.json, which the Claude edition's /paideia:init-course writes and the Codex edition ignores. A student can switch runners mid-semester without losing history.