Codex Autoresearch helps Codex turn "make this better" into a measured loop.
Give Codex a goal, a benchmark, and the files it may edit. Codex Autoresearch runs bounded experiment packets, logs each keep or discard with evidence, preserves ASI and metrics across context loss, and turns useful changes into reviewable branches.
Inspired by the AI-focused karpathy/autoresearch and pi-autoresearch. Codex Autoresearch adapts the measured-loop idea for Codex plugin workflows, repo-local benchmarks, durable session files, an evidence trail, live dashboards, and reviewable finalization.
Ask Codex to use Codex Autoresearch.
Broad prompts work:
Use $Codex Autoresearch to improve the speed of my indexer's pipeline, while keeping it memory efficient.
Use $Codex Autoresearch to keep reducing bugs in the codebase, starting with
the most obvious low hanging fruits. Run at most 5 packets or 30 minutes,
stop if checks fail twice, and report the best kept change.
You can also hand it a sharper investigation:
Use $Codex Autoresearch to figure out why my graphql service's p99 latency is so much higher
than its p90 latency at 1 minute metric resolution. I suspect: DNS lookup, event loop throttling,
memory spike, CPU spike. For each, run the 4-5 appropriate experiments @experiments.md and if the
results are promising keep iterating, otherwise stop and report back.
Or be exact about the benchmark and scope:
Use $Codex Autoresearch to optimize my unit tests' speed. different libraries are allowed, but try to avoid it.
Benchmark: npm test -- --runInBand
Metric: seconds, lower is better
Checks: npm test
Scope: test runner config and test helpers only
Codex should start by checking Git state, identifying the target package, creating or resuming the session, verifying the benchmark, running one packet, and logging the result with experiment details. Ask for the live dashboard when you want a visual readout or need fresh packet state in the browser.
Autoresearch stores its loop evidence in local project files and runs approved benchmark/check commands with local process permissions. Read Privacy, Terms, and Trust before using it on repos with secrets, sensitive data, external APIs, or expensive commands.
For normal Codex use, install the plugin through the Codex plugin flow for your workspace. Open Codex in the repo you want to improve, then use:
/plugins
Choose:
TheGreenCedar Autoresearch -> codex-autoresearch -> Install plugin
If your Codex build exposes terminal marketplace management for source marketplaces, add or refresh this marketplace first:
codex plugin marketplace add TheGreenCedar/codex-autoresearchSome workspace plugin settings are managed from the Codex Apps/Plugins UI rather than the terminal. Use the UI path when the CLI marketplace command is unavailable.
Start a new Codex thread after installation or refresh.
A normal session follows this shape:
setup -> doctor -> next -> log -> state -> finalize-preview
When the goal, benchmark, metric, or scope is still fuzzy, start with one of the read-only planning surfaces before creating files:
node plugins/codex-autoresearch/scripts/autoresearch.mjs setup-plan --cwd <project>
node plugins/codex-autoresearch/scripts/autoresearch.mjs prompt-plan --cwd <project> --prompt "<plain-language goal>"Codex Autoresearch helps Codex:
- set up the target repo, goal, primary metric, benchmark, checks, and scoped edit surface
- verify the benchmark contract and optional checks with
doctor - run one measured packet with
next - log the result as
keep,discard,measure,crash, orchecks_failed - inspect the compact state before spending another packet
- preview finalization into reviewable branches when the kept evidence is ready
That happy path is the default help surface. serve is an optional live dashboard handoff and is listed in the full help. Advanced diagnostics such as prompt-plan, onboarding-packet, recommend-next, benchmark-inspect, partial-results, session-forensics, and export are still available with --help --all when a run needs deeper repair, dashboard inspection, forensics, or recovery.
When you use Codex Goal mode, codex-goal-brief turns Autoresearch state into a Goal objective draft and completion audit. It does not mutate Codex Goal state.
A packet is one measured experiment cycle: make a scoped change, run the benchmark, inspect the metric, and log the decision.
ASI means Accumulated Structured Intelligence. It is the structured memory attached to each packet decision: hypothesis, evidence, rollback reason, next action hint, and optional lane, family, or risk metadata. It tells the next Codex session what happened, what was learned, and which path deserves the next attempt.
For terminal-first resumes, ask for the compact report:
node plugins/codex-autoresearch/scripts/autoresearch.mjs state --cwd <project> --reportFrom inside plugins/codex-autoresearch, the shorter node scripts/autoresearch.mjs ... form is equivalent.
It returns report.text for a one-screen readout and report.json for automation. Blockers outrank packet recommendations, and missing dashboard liveness includes the command to serve or verify the dashboard instead of pretending a stale view is live.
Use Codex Autoresearch when:
- the goal can be measured
- the benchmark is repeatable
- benchmark-contract files can be protected from quiet drift
- known tradeoffs can be expressed as secondary metric constraints
- correctness checks exist or can be added
- the editable scope is small enough to review
- kept work should become reviewable commits or branches
Use Autoresearch for qualitative work when it can be turned into a qualitative but checklist-measured loop: study the surface, accept evidence-backed gaps, close them, and verify quality_gap.
research-setup -> quality-gap -> gap-candidates
Use a regular Codex task when:
- the work needs one careful edit
- the goal is mainly taste or judgment
- the benchmark is flaky or very expensive
- the metric can improve by weakening the benchmark
- secrets, deployment paths, or unrelated dirty files are in scope
Protected benchmark folders are recursively inspected and hashed. Keep them small, or point Autoresearch at a compact manifest/contract file instead of a large generated, cache, fixture, or data directory.
Ask Codex to serve the dashboard when you want a live visual readout, packet freshness matters, or a stale/static export is confusing the decision.
The dashboard answers three questions:
- Is this live or a static snapshot?
- What is the next safe action?
- What blocks trust?
Audit view includes the deeper trace: metric formulas, lane state, watchdog quiet windows, runtime provenance, packet diagnostics, finalization readiness, ledger entries, ASI, and handoff packets.
Readout only. Use the CLI to do the work; the dashboard is a visual aid, not a control surface.
For product, docs, UX, or broad research, ask for a quality-gap loop:
Use Codex Autoresearch to study this project and improve the dashboard.
Turn accepted findings into a quality-gap loop, implement them, and keep the live dashboard open.
quality_gap=0 means the accepted checklist for that round is closed. It does not mean discovery is complete. Start another round if the question is still alive.
Ask the plugin to finalize once a loop has useful kept work mixed with exploratory history.
Finalization should:
- select only accepted/current kept evidence
- exclude session artifacts from review branches unless requested
- keep rejected, provisional, superseded, or quarantined evidence audit-visible but out of review branches
- block later-discarded, invalidated, or reverted keeps
- show dirty-tree, overlap, semantic-safety, and final-tree coverage warnings
- prepare clean review branches or a current-final-tree plan
- preserve metric evidence and verification commands
- leave cleanup until review branches are verified
- Docs index
- Concepts glossary
- Start
- Workflow diagrams
- Architecture diagrams
- Operate
- Trust
- Privacy
- Terms
- Finish
- Recipes
- Troubleshooting
- Hooks
- Maintainers
The active package lives under:
plugins/codex-autoresearch
The plugin skill lives at:
plugins/codex-autoresearch/skills/codex-autoresearch/SKILL.md
The plugin and dashboard source are written in TypeScript and developed on Node.js 24 or newer.
The package uses tsdown for Node builds, tsgo for typechecking, oxlint for linting, oxfmt for formatting, Vite for the dashboard, and npm-run-all2 for combined gates.
From plugins/codex-autoresearch:
npm install
npm run check
npm test
node scripts/autoresearch.mjs --helpTargeted checks:
npm run typecheck
npm run lint
npm run format:check
node scripts/autoresearch.mjs doctor --cwd . --check-benchmark --explain
git diff --checkFor normal Codex use, refresh or uninstall the plugin from the Codex plugin surface:
/plugins
Then choose the installed codex-autoresearch plugin and use the available refresh or uninstall action.
If your Codex build exposes terminal marketplace management for source marketplaces, these commands may be available:
codex plugin marketplace upgrade thegreencedar-autoresearch
codex plugin marketplace remove thegreencedar-autoresearchPrefer the plugin UI when the terminal marketplace commands are unavailable.
User-facing changes are tracked in CHANGELOG.md.
This project is licensed under the terms of the Apache License 2.0. Copyright (c) 2026 Albert Najjar.
