GitHub - atomicstrata/epistemic

Ξ epistemic

The open source research-discipline coding agent.

epistemic gives your coding agent the norms of good ML research. Instead of running experiments, eyeballing a number, and moving on, it enforces a real method: pre-register a hypothesis, reproduce the baseline, run the experiment, attack your own claim, then decide to ship or kill — with an interactive monitor and gates that make the rules automatic.

The skills are the portable manual the agent follows step by step. The harnesses inject that manual into Claude Code, Codex, or the epistemic TUI. The gates are the safety net that enforces it where the harness supports runtime hooks.

Quickstart

Give your agent epistemic: Claude Code, Codex CLI, Codex App, epistemic TUI.

How it works

It starts from the moment your agent picks up an empirical task. Instead of jumping straight to running code, it steps back and asks what you're really trying to prove.

Once it has a rough claim, it asks one question at a time — Socratic-style — until the hypothesis is falsifiable, the falsifier is concrete, and the budget is realistic. Before locking in, it generates 2–3 competing explanations with unique disconfirming predictions so you pick the strongest one.

After you sign off, the agent locks the hypothesis in a pre-registration file before touching any experiment code. The prereg gate then blocks any experiment-shaped command that has no matching prereg.md — so there's no way to accidentally run something unregistered.

From there it reproduces the competitor's baseline under your locked judge, runs the full experiment, applies proper statistics, and sends the claim to adversary models that each try to disprove it. If any adversary succeeds, the result is blocked. If all pass, it lands in RESULTS.md and you decide: ship, kill, pivot, or refine.

Because the skills trigger automatically, you don't need to orchestrate anything. Your coding agent just has epistemic.

What you type → what happens

$ epistemic "does LoRA at rank 8 outperform rank 4 on math benchmarks"
→ Opens research-question skill, refines to a falsifiable hypothesis

$ epistemic "run the registered experiment H-003"
→ Checks prereg gate, routes to the correct compute target, logs costs

$ epistemic monitor
→ Full-screen experiment tree: running, shipped, killed, pending

$ epistemic fleet
→ Launches a parallel agent fleet across all pending hypotheses

$ /skill:falsification-review
→ Sends the current claim to ≥2 adversary models; blocks if any falsify it

$ /skill:kill-or-ship
→ Decision gate: KILL / PIVOT / REFINE / RECOMMIT / SHIP

Workflows

Ask naturally or use slash commands as shortcuts.

Command	What it does
`/skill:research-question`	Refine a rough idea into a falsifiable, pre-registerable hypothesis
`/skill:preregistration`	Lock hypothesis, judge config, and compute scaffold before running
`/skill:baseline-reproduction`	Reproduce the competitor's result under your locked judge
`/skill:experiment-execution`	Run with discipline — locked env, full sample, cost logging
`/skill:statistical-rigor`	Effect sizes, test selection, multiple-comparison correction, APA reporting
`/skill:falsification-review`	Adversary models try to disprove the claim; blocks if any succeed
`/skill:surprise-triage`	Diagnose results that diverge >15% before they reach `RESULTS.md`
`/skill:kill-or-ship`	Final decision gate with five outcomes
`/skill:verification-before-publication`	Full pre-publish checklist

Installation

epistemic CLI (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/moralespanitz/epistemic/master/install.sh | sh

To pin a version:

curl -fsSL https://raw.githubusercontent.com/moralespanitz/epistemic/master/install.sh | sh -s -- --ref v1.0.0

Installs to ~/.epistemic with a symlink in ~/.local/bin/epistemic. Requires Node.js v18+.

Default model: openrouter/deepseek/deepseek-v4-pro. If you have OpenAI Codex authed, it uses that instead. If nothing is authed, pi prompts /login.

Skills only (Claude Code, Codex, any agent)

Installs just the research methodology skills — no TUI, no runtime dependency.

curl -fsSL https://raw.githubusercontent.com/moralespanitz/epistemic/master/install-skills.sh | sh

With the optional Hugging Face skills:

curl -fsSL https://raw.githubusercontent.com/moralespanitz/epistemic/master/install-skills.sh | sh -s -- --hf

Skills land in ~/.claude/skills/ and are active in Claude Code immediately.

Claude Code plugin marketplace

/plugin install epistemic@claude-plugins-official

Or via the epistemic marketplace:

/plugin marketplace add moralespanitz/epistemic
/plugin install epistemic-skills@epistemic

Hooks (prereg gate + session bootstrap) install automatically via the marketplace, or manually:

epistemic hooks install   # add hooks to ~/.claude/settings.json
epistemic hooks status    # check what's active
epistemic hooks on | off  # toggle without restart

Codex CLI

Open the plugin search interface: /plugins
Search for epistemic and select Install Plugin.

Codex App

In the Codex app, click Plugins in the sidebar.
Find epistemic in the Research section and click +.

Skills Library

Research Methodology

research-question — Socratic refinement from rough idea to falsifiable hypothesis. Generates 2–3 competing alternatives with unique disconfirming predictions; archives the ones you don't pick.
preregistration — Validates all fields, writes experiments/{id}/prereg.md, hashes the judge config → judge.lock, generates the compute scaffold (Dockerfile / modal-app.py / environment.lock), and commits. Locks in the rules before code runs.
baseline-reproduction — Reads the competitor's paper, validates HuggingFace datasets, reproduces the result under your locked judge. Baselines older than 30 days must be refreshed.
experiment-execution — Confirms prereg + locks match, routes by compute target (local / docker / modal), logs every API/compute cost to .epistemic/cost-ledger.jsonl, writes results to experiments/{id}/smokes/ — provisional only.
statistical-rigor — Assumption checking → test selection → effect sizes (Cohen's d, η², R²) alongside p-values → multiple-comparison correction → APA reporting with exact p-values.
falsification-review — Sends the claim to ≥2 adversary models. Each returns the cheapest experiment that would disprove it. If that experiment is <$1 and unrun, the agent insists on running it first.
surprise-triage — Triggered when results diverge >15%. Produces ranked explanations (sampling, judge mismatch, data leakage, ceiling effects, prompt drift, version change, bugs) and the cheapest disambiguating test for each.
kill-or-ship — Decision gate. Sunk-cost rule: killed hypotheses can't be silently revived. Expected kill-to-ship ratio ~5:1 — killing fast is the point.
verification-before-publication — Full pre-publish checklist: locks present and matching, baselines fresh, falsifier verdicts evaluated, cost ledger current, stats done, alternatives documented.

kill-or-ship outcomes

Option	When	Effect
KILL	Spend > 1.5× cap, or >21 days stale	Write `KILLED.md`, record a lesson
PIVOT	Failed but suggests a new direction	Kill old, open a new hypothesis
REFINE	Same claim, adjusted method	Re-run from execution (needs override)
RECOMMIT	Continue past kill criteria	New cap + override
SHIP	All gates pass, falsification clean	Tag and publish

Agents

Four bundled research agents, dispatched automatically.

Researcher — gather evidence across papers, HuggingFace, web, repos, and docs
Adversary — simulated peer review with the cheapest disconfirming experiment per claim
Statistician — assumption checking, test selection, effect sizes, APA reporting
Verifier — pre-publish checklist, lock matching, baseline freshness, cost ledger audit

The Monitor

epistemic monitor (or /monitor in the TUI) opens a full-screen interactive view of every experiment:

Ξ epistemic · mission control   [████░░ 16%] $34/$210   2 running · 1 shipped · 1 killed

● ✓ H-001  LoRA fine-tuning…
├─▶ ▶ H-004  Scaling LoRA to 7B…
│   ◇ if acc ≥ 0.80 → ship / H-006 pivot
└─▶ ☓ H-002  High learning rate…

Key	Action
`↑` / `↓`	select an experiment
`→` / `←`	open detail / back to tree
`enter`	actions: chat / approve / reject / modify
`q`	back to the chat

The Gates (automatic)

If the skills are well written, the gates never fire — the agent follows the manual. The gates are the safety net.

Gate	Blocks
Prereg	running experiments before pre-registration
Judge / Environment lock	changing the judge or environment mid-run
Smoke	quoting provisional numbers as results
Cost ledger	(transparent — logs every call with cost + category)
Claim intercept	comparing to an unreproduced baseline
Kill criteria	overrunning 1.5× the cost cap, or going stale
Baseline staleness	comparing to a >30-day-old baseline

Overrides go in OVERRIDES.md with a mandatory reason.

Tools

Internal tools the agent can call during research workflows.

Docker — isolated container execution for reproducible experiment environments
Modal — serverless GPU compute for burst training and inference
Web search — evidence gathering across papers, docs, and repos

Hugging Face (optional, requires hf auth login)

Tool	What it does
`huggingface-papers`	Read any arXiv paper as markdown; structured metadata (authors, linked models, citations)
`hf-cli`	Download/upload models & datasets, manage repos, run HF Jobs
`huggingface-datasets`	Paginate rows, full-text search, filter predicates, get Parquet URLs
`huggingface-community-evals`	Run evals locally with `inspect-ai` or `lighteval`
`huggingface-trackio`	Log metrics + alerts during training, sync real-time dashboard to HF Space
`huggingface-llm-trainer`	Fine-tune with TRL (SFT/DPO/GRPO) on HF Jobs cloud GPUs

Plugin API

@epistemic/omp exposes a typed plugin API to extend epistemic with your own commands, event handlers, and gates:

import type { EpistemicPlugin } from "@epistemic/omp";

export const myPlugin: EpistemicPlugin = (api) => {
  api.registerCommand("my-cmd", {
    description: "My custom command",
    handler: async (args, ctx) => {
      ctx.ui.notify(`Running my-cmd with: ${args}`, "info");
    },
  });

  api.on("session_start", async (_event, ctx) => {
    ctx.ui.setStatus?.("my-plugin", "● active");
  });

  api.gate(async (event, ctx) => {
    // Return { block: true, reason: "..." } to interrupt
  });
};

Method	What it does
`registerCommand(name, opts)`	Registers a `/name` slash command in the agent chat
`on(event, handler)`	Subscribes to `session_start`, `session_shutdown`, `before_agent_start`, or `tool_call`
`gate(handler)`	Registers a `tool_call` gate — return `{ block, reason }` to interrupt

How it works (layers)

Built on Pi for the agent runtime, with the epistemic skill core as the shared method contract across all harnesses. Runtime resources follow Pi's package, extension, and skill model.

Layer	What it does
Portable skill core	`using-epistemic`, `epistemic`, and stage skills — the shared method across harnesses
Harness bootstrap	Claude `SessionStart`, Codex manifest inject the skill core at the right time
Runtime gates	Invisible enforcement that blocks rule violations automatically
Monitor	Navigate the experiment tree, drill into a hypothesis, approve / reject / modify
Fleet	Parallel agent fleet runner for multi-experiment orchestration
HF stack	6 HF skills (papers, datasets, evals, training, tracking, CLI) auto-discovered
State	File-based ledger: `HYPOTHESES.md`, `.epistemic/cost-ledger.jsonl`, `experiments/{id}/`

Repo Layout

Path	What it is
`packages/omp/`	`@epistemic/omp` — forked oh-my-pi TUI shell (Amber Lab theme, ResearchSidebar)
`src/`	Extension: gates, commands, monitor, board
`src/tui/render3d.ts`	Shared software 3D renderer (Z-buffer, Lambert shading, amber glow)
`src/cli/intro.ts`	Startup animation (3D spin → name reveal)
`skills/`	Research methodology skills + 6 HF skills
`hooks/`	Claude Code hooks (SessionStart, prereg gate)
`.pi/extensions/welcome-header/`	Persistent 3D Ξ header in the TUI
`.pi/settings.json`	Project settings (Codex default, theme, extensions)
`.claude-plugin/`	Claude Code plugin manifest and marketplace metadata
`.codex-plugin/`	Codex plugin manifest for the portable skill core
`themes/epistemic.json`	Amber-on-transparent dark theme
`tests/claude-code/`	Headless Claude Code harness tests for skill triggering

Philosophy

Pre-registration — lock the hypothesis before running a single line
Reproducibility — you can't beat what you can't run
Falsification over confirmation — guilty until proven defensible
Evidence over claims — verify before publishing

Contributing

Fork the repository
Create a branch for your work
Follow the writing-skills skill for creating and testing new and modified skills
Submit a PR

Every skill must work across all supported harnesses (Claude Code, Codex, TUI).

More

GUIDE.md — prompt-driven walkthrough and use cases
TESTING.md — the agent-driven TUI test suite (npm run verify)
Issues: https://github.com/moralespanitz/epistemic/issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quickstart

How it works

What you type → what happens

Workflows

Installation

epistemic CLI (macOS / Linux)

Skills only (Claude Code, Codex, any agent)

Claude Code plugin marketplace

Codex CLI

Codex App

Skills Library

Research Methodology

Meta

Agents

The Monitor

The Gates (automatic)

Tools

Plugin API

How it works (layers)

Repo Layout

Philosophy

Contributing

More

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
.pi		.pi
bin		bin
docs		docs
hooks		hooks
packages/omp		packages/omp
skills		skills
src		src
test		test
tests		tests
themes		themes
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
GUIDE.md		GUIDE.md
README.md		README.md
TESTING.md		TESTING.md
install-skills.sh		install-skills.sh
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Quickstart

How it works

What you type → what happens

Workflows

Installation

epistemic CLI (macOS / Linux)

Skills only (Claude Code, Codex, any agent)

Claude Code plugin marketplace

Codex CLI

Codex App

Skills Library

Research Methodology

Meta

Agents

The Monitor

The Gates (automatic)

Tools

Plugin API

How it works (layers)

Repo Layout

Philosophy

Contributing

More

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages