Self-improving agent runtime that learns from experience and drives LLM agents through test-verify-fix loops
Quick Start · Why Prax · Usage · Results · Integration Paths · Configuration · Architecture · Contributing
Goal: install Prax, configure an AI key, run your first task — in under 5 minutes. No programming background required.
Already an experienced user? Jump to One-liner for experienced users below.
Prax needs Node.js (for the CLI wrapper) and Python 3.10+ (for the runtime). Check if you already have them:
node --version # should print v14 or higher
python3 --version # should print Python 3.10 or higherMissing one? Install:
| OS | Install command |
|---|---|
| macOS | brew install node python@3.12 (install Homebrew first if needed) |
| Linux | sudo apt install nodejs python3 python3-pip (Debian/Ubuntu) or sudo dnf install nodejs python3 (Fedora) |
| Windows | Use WSL2 and follow the Linux commands. Native Windows is not supported yet (on the 0.5.x roadmap). |
npm install -g praxagentVerify:
prax --versionShould print:
prax 0.5.0
(0.5.0 or higher is fine.)
See command not found?
- macOS with Homebrew Node: run
export PATH=/opt/homebrew/bin:$PATHthen add it to~/.zshrc - Linux: confirm
npm prefix -g'sbin/is on your$PATH
Prax needs two things to call any LLM: an endpoint URL and an API key. The procedure is the same whether you use an official API or a third-party proxy.
-
Get the
base_urlandapi_keyfrom your service's dashboard or docs. -
Export them as environment variables (the names are arbitrary — any name your shell can
exportworks;LLM_BASE_URL/LLM_API_KEYis just our recommended convention):export LLM_BASE_URL="https://your-service-endpoint" export LLM_API_KEY="your-key"
-
Wire them into Prax via
~/.prax/models.yaml:mkdir -p ~/.prax cat > ~/.prax/models.yaml <<'YAML' providers: default: base_url_env: LLM_BASE_URL api_key_env: LLM_API_KEY format: openai # use "anthropic" if your service speaks the Anthropic protocol models: - name: <your-model> # the exact model name your service exposes default_model: <your-model> YAML
Replace
<your-model>with whatever model identifier your service supports (check the service's/v1/modelsendpoint or its dashboard). -
Verify it works:
prax providers
You should see your provider listed with the model name and a status. With
LLM_API_KEYset the status readsavailable; if it showsmissing-credentials, the env var didn't reach Prax — re-run theexportand make sureecho $LLM_API_KEYechoes the key back.
mkdir -p ~/Desktop/prax-hello && cd ~/Desktop/prax-hello
prax prompt "你是谁?用一句话回答。"Should print something like:
我是 Prax 这个智能体运行时里跑的 AI 助手,可以帮你执行代码、测试和自动化任务。
Congrats — Prax is working.
echo "hello world" > greeting.txt
prax prompt "读 greeting.txt 里的内容,然后把它改成全大写再写回去"
cat greeting.txtShould print:
HELLO WORLD
That's the distinguishing capability — Prax doesn't just chat, it reads, writes, runs tests, and verifies in a loop. Everything below builds on this.
Got stuck at any step? Common issues:
| Symptom | Fix |
|---|---|
Error: Model 'xxx' not found |
The <your-model> name in ~/.prax/models.yaml doesn't match what your service exposes. Check its /v1/models endpoint or dashboard. |
HTTP 401 / Unauthorized |
Key typo or expired. Regenerate and re-export LLM_API_KEY. |
| Silent exit with no output | Your endpoint is unreachable. Try curl -s "$LLM_BASE_URL"/... to confirm, or point LLM_BASE_URL at a healthy endpoint. |
| Chinese characters look broken | Set export LANG=zh_CN.UTF-8 in your shell rc. |
Once Step 5 above works, you're ready to pick a usage mode.
Use Prax directly at the shell prompt — perfect for automation, cron jobs, CI/CD, or just batching work without opening an IDE.
prax prompt "run pytest -q, fix the first failure, and stop when tests pass"
prax prompt "read README.md and propose 3 concrete improvements as a checklist"
prax cron add --name daily-news --schedule "0 17 * * *" --prompt "..." # schedule recurring workPick a role-matched walkthrough to see a real-world pipeline end-to-end:
| Your role | Tutorial | What you'll build |
|---|---|---|
| PM / support lead | support-digest | Daily PII-redacted ticket digest, local-only processing |
| Content creator / knowledge-base hobbyist | ai-news-daily | Scrape X/知乎/Bilibili → compile Obsidian wiki → send digest |
| Release manager | release-notes | Git log → CHANGELOG + release announcement |
| DevEx / tech writer | docs-audit | Weekly "which docs drifted from the code?" report |
| Engineering lead | pr-triage | Per-PR triage that actually runs tests on both branches |
All five tutorials start with sample data — no external API or real PR needed to follow along.
If you use Claude Code and want Prax's skills, commands, and verification hooks available inside the IDE:
prax install --profile fullThis copies Prax's bundled skills (4 commercial recipes + 1 content-automation pipeline + 4 developer workflows) into ~/.claude/, registers Prax MCP servers, and wires hooks so Claude Code runs Prax's verification loop on every code change.
Verify:
prax doctor --target claudeNow reopen your project in Claude Code. You'll see new /prax-status, /prax-doctor, /prax-plan, /prax-verify slash commands, the prax-planner agent, and Prax's rules automatically applied.
To undo: prax uninstall --target claude.
Note:
prax installonly ships a Claude Code integration today. Any LLM you configure via Step 3 still works as Prax's backend regardless of IDE choice. A skill-export path for additional IDE/CLI hosts is on the 0.5.x roadmap.
git clone https://github.com/ChanningLua/prax-agent.git && cd prax-agent
pip install -e .
export LLM_BASE_URL="https://your-service-endpoint"
export LLM_API_KEY="your-key"
# (then write ~/.prax/models.yaml as shown in Step 3 above)
prax prompt "run pytest -q, fix the failure, and stop when tests pass"Prax can execute shell commands on your behalf. It defaults to
workspace-writemode — files outside the project are off-limits. Use--permission-mode read-onlyfor safe exploration.
Prax isn't just another LLM wrapper — it's a production-grade agent runtime built for real repository work.
Prax learns from experience and self-improves across sessions and projects:
- Correction Detection — Automatically detects when users correct mistakes, extracts problem-solution patterns, and applies them in future sessions (multilingual support)
- Cross-Project Experience Accumulation — Builds a global experience store at
~/.prax/experiences.jsonthat improves performance across all your repositories - Structured Error Recovery — Blacklists failing approaches and tries alternatives, preventing repeated mistakes within the same session
- Persistent Memory with Confidence Scoring — Two backends (JSON/SQLite) track context, decisions, and learned patterns with decay over time
- Temporal Knowledge Graph — Tracks entity relationships and their evolution across sessions
- Checkpoint/Resume — Crash recovery ensures no work is lost, even during long-running tasks
- Trajectory Recording — Learns from execution history to identify successful patterns and avoid failure modes
These capabilities are production-ready and integrated into the core runtime — not experimental plugins.
Prax automatically learns from your work and applies that knowledge to future tasks. Here's how to work with its memory system:
Prax captures experience in these situations:
- Correction Detection — When you correct a mistake (e.g., "that's wrong", "不对", "try again"), Prax extracts the problem-solution pattern and saves it to
.prax/solutions/ - Task Completion — Facts with confidence ≥ 0.7 are persisted to project memory
- Tool Failures — Failed approaches are blacklisted within the session to avoid repetition
- Verification Success — Successful test-fix patterns are recorded as experiences
- Session End — Context snapshots are saved for the next session to resume
Check what Prax has learned:
# Project-specific memory
cat .prax/memory.json # Facts and context (JSON backend)
cat .prax/memory.db # Or SQLite backend
ls .prax/solutions/ # Problem-solution patterns
# Global cross-project experiences
cat ~/.prax/experiences.json # Shared learnings (JSON backend)
cat ~/.prax/experiences.db # Or SQLite backend
# Session history
ls .prax/sessions/ # Past conversation transcriptsClean up memory when needed:
# Clear project memory
rm -rf .prax/memory.json .prax/solutions/
# Clear global experiences
rm -rf ~/.prax/experiences.json
# Clear session history
rm -rf .prax/sessions/
# Full reset
rm -rf .prax/ ~/.prax/Prax supports two memory backends:
| Backend | Storage | Best For | Search |
|---|---|---|---|
| local (JSON) | .prax/memory.json + ~/.prax/experiences.json |
Zero-config, small projects | Linear scan |
| sqlite | .prax/memory.db + ~/.prax/experiences.db |
Medium to large projects, full-text search | FTS5 index |
Configure in .prax/config.yaml:
memory:
backend: local # or sqlite
local:
max_facts: 100
fact_confidence_threshold: 0.7
max_experiences: 500- Facts with confidence ≥ 0.7 are persisted to memory
- Lower-confidence observations are kept in session context only
- Confidence scoring is static per fact (no time-based decay currently implemented)
Most tools send a prompt and hope for the best. Prax runs a test-verify-fix loop: it executes your test suite, analyzes failures, edits code, and re-runs until tests pass. The verification layer is first-class — not an afterthought.
Benchmark-proven: 10/10 repository repair tasks solved in 29.56s average (vs 8/10 baseline across peer frameworks).
Dual Runtime Paths — Native CLI for automation and CI/CD, Claude Code integration for interactive development. Choose the right tool for the job.
Cross-Session Persistent Memory — Context persists when you close the terminal. Two memory backends: JSON (zero-config) and SQLite (full-text search).
Multi-Model Orchestration — OpenAI-compatible, Anthropic-compatible, and custom-protocol LLMs with explicit routing, fallback chains, and cost tracking. Switch models mid-session with /model <your-model>.
Security by Design — Permission modes (read-only, workspace-write, danger-full-access), schema validation, workspace boundaries, and full audit trail.
Built for Real Codebases — 25+ built-in tools, middleware pipeline (loop detection, quality gates), multi-language support, and interactive REPL mode.
Transparent & Measurable — Real-time cost tracking, session history and replay, benchmark suite included, open architecture for custom extensions.
$ prax "run pytest -q, fix the failure, and stop when tests pass"
▶ VerifyCommand {"command": "pytest -q"}
✗ FAILED test_auth.py::test_login - AssertionError
▶ Read {"file_path": "src/auth.py"}
▶ Edit {"file_path": "src/auth.py", ...}
▶ VerifyCommand {"command": "pytest -q"}
✓ 1 passed in 0.12s
Verification passed. Task complete.
prax "explain the authentication flow in login.py"
prax "refactor auth.py error handling, replace requests with httpx"
prax "analyze project architecture, list technical debt, prioritize by impact"prax repl
> analyze the codebase structure
> fix the SQL injection in user_query.py
> /model <your-model>
> /cost
Session: 12.4K tokens ($0.04)/model, /session list, /plan, /todo show, /doctor, /cost, /help
Prax can own scheduled work end-to-end. Declare channels in .prax/notify.yaml, jobs in .prax/cron.yaml, and prax cron install writes the system-level trigger for you (LaunchAgent on macOS, crontab line on Linux).
# 1. configure an outbound channel (Feishu / Lark / Email)
cat > .prax/notify.yaml <<YAML
channels:
daily-digest:
provider: feishu_webhook
url: "\${FEISHU_WEBHOOK_URL}"
YAML
# 2. schedule a daily job
prax cron add \
--name ai-news-daily \
--schedule "0 17 * * *" \
--prompt "触发 ai-news-daily 技能" \
--notify-on failure \
--notify-channel daily-digest
# 3. install the per-minute dispatcher
prax cron installSee docs/recipes/ai-news-daily.md for the full AI-news-automation recipe.
Skills live under skills/ (bundled) or .prax/skills/ (project-local) and inject prompt guidance when their triggers match. Content / writing helpers:
| Skill | Triggers | Purpose |
|---|---|---|
browser-scrape |
抓取 scrape twitter zhihu bilibili autocli |
Drive AutoCLI to scrape 55+ sites reusing the user's Chrome login |
knowledge-compile |
整理 compile wiki digest 知识库 |
Turn raw markdown into Obsidian-ready wiki (index.md + topics/ + daily-digest.md) |
ai-news-daily |
ai-news-daily daily digest 日报 |
End-to-end pipeline: scrape → compile → notify |
chinese-coding |
中文 注释 文档 |
Chinese comments/docs style guide |
Four additional commercial recipes (pr-triage, release-notes, docs-audit, support-digest) ship under skills/ as well — see the Commercial Use Cases table below for what each one does. Project-local skills in .prax/skills/ override bundled ones with the same name.
Four recipes tuned for team/enterprise workflows — designed to ship reviewable artefacts (not "AI said so" hallucinations) and to keep destructive actions firmly in human hands.
| Case | Target user | Prax differentiator | Recipe |
|---|---|---|---|
| PR Triage Bot | Eng lead | Actually checks out the PR branch and runs tests via VerifyCommand; compares against base. No GitHub side-effects. |
docs/recipes/pr-triage.md |
| Release Notes Generator | Release manager | Reads git log + issue refs, groups by Conventional Commits into Keep-a-Changelog sections, idempotent per version. Writes files; never tags/pushes/publishes. | docs/recipes/release-notes.md |
| Docs Freshness Audit | DevEx / tech writer | Diffs recently-changed source vs doc mentions, outputs an evidence-cited drift report. Never edits docs itself. | docs/recipes/docs-audit.md |
| Support Ticket Digest | PM / support lead | Zero external API calls; PII redaction runs before any LLM sees the data — compliance-grade local-only processing. | docs/recipes/support-digest.md |
Each case is 10-minute deployable, works with the cron/notify plumbing above, and has hard contractual limits baked into its SKILL.md (no auto-approve, no auto-merge, no auto-refund, no auto-edit-docs) so the agent cannot drift.
Prax achieves 10/10 success rate on repository repair tasks, completing them in 29.56s average — 49% faster than the cross-framework baseline.
| Metric | Prax | Framework Baseline | Improvement |
|---|---|---|---|
| Success Rate | 10/10 (100%) | 8/10 (80%) | +25% |
| Average Time | 29.56s | 58.44s | -49% |
| Timeouts | 0 | 2 | -100% |
What drives these results:
- Verification-First Architecture — Test-verify-fix loops catch errors early
- Quality Gate Middleware — Loop detection and convergence guidance
- Smart Sandbox Downgrade — Verification commands bypass unnecessary overhead
- Experience-Based Learning — Correction detection, error pattern blacklisting, and cross-session memory accumulation
Benchmark methodology: 10 repeated rounds on real repository-fix tasks with session state preserved. See docs/BENCHMARKS.md for full details.
Prax offers two runtime paths — choose the right tool for the job:
| Feature | Native Runtime | Claude Code Integration |
|---|---|---|
| Execution | CLI commands | Claude Code IDE |
| Interaction | Command-line REPL | IDE conversation interface |
| Context Management | Local JSON/SQLite | Claude Code sessions |
| Tool Integration | 25+ built-in tools | Claude Code tools + Prax extensions |
| Use Cases | Automation, CI/CD | Interactive development, code review |
- IDE Native Experience — Use Prax capabilities directly within Claude Code
- Seamless Integration — Deep integration via MCP servers and Hooks
- Security Protection — Pre-write secret scanning, pre-commit quality checks
- Session Persistence — Auto-save session state, resume from breakpoints
- Bidirectional Collaboration — Claude Code's conversational ability + Prax's verification loop
Installation lives under Way 2 · Inside Claude Code IDE above (
prax install --profile full, thenprax doctor --target claude).
Models — create .prax/models.yaml in your project (or edit the global ~/.prax/models.yaml from Quick Start Step 3):
default_model: <your-model>
providers:
default:
base_url_env: LLM_BASE_URL # any env var name; declared in Quick Start Step 3
api_key_env: LLM_API_KEY
format: openai # use "anthropic" if your endpoint speaks the Anthropic protocol
models:
- name: <your-model> # the model identifier your service exposesTo wire multiple endpoints (e.g. one OpenAI-compatible and one Anthropic-compatible), declare another providers: entry with its own base_url_env / api_key_env and list its models. Each provider can also override base_url directly instead of via env (see core/llm_client.py for the full schema).
Permission modes
| Mode | What it allows | Default |
|---|---|---|
read-only |
No file writes, no shell commands | |
workspace-write |
Modify files inside the project | ✓ |
danger-full-access |
Unrestricted |
prax --permission-mode read-only "analyze security vulnerabilities"Runtime paths
| Flag | Behavior |
|---|---|
--runtime-path auto |
Uses Claude CLI bridge if claude is installed, otherwise native runtime (default) |
--runtime-path native |
Always use the native runtime |
--runtime-path bridge |
Always use the Claude CLI bridge; fails if claude is not installed |
Data directory
| Path | Content |
|---|---|
.prax/sessions/ |
Conversation history |
.prax/memory.json |
Project memory (auto-extracted facts, JSON backend) |
.prax/memory.db |
Project memory (SQLite backend) |
.prax/solutions/ |
Problem-solution patterns from correction detection |
.prax/todos.json |
Current task list |
.prax/agents/ |
Custom agent definitions |
.prax/models.yaml |
Model configuration |
.prax/config.yaml |
Project-level configuration (memory backend, etc.) |
~/.prax/ |
Global config (cross-project) |
~/.prax/experiences.json |
Global cross-project experiences (JSON backend) |
~/.prax/experiences.db |
Global cross-project experiences (SQLite backend) |
~/.prax/config.yaml |
User-level configuration |
Key modules:
| Path | Role |
|---|---|
core/agent_loop.py |
Core orchestration cycle (25 iter max, circuit breaker) |
core/middleware.py |
VerificationGuidance, LoopDetection, QualityGate, etc. |
tools/verify_command.py |
Bounded verification (pytest, npm test, cargo test, go test) |
tools/sandbox_bash.py |
Auto-downgrade: verify commands bypass sandbox overhead |
core/memory/ |
Pluggable backends (JSON / SQLite) |
core/llm_client.py |
Provider registry, multi-model routing |
agents/ |
Ralph (planner), Sisyphus (executor), Team (parallel) |
workflows/ |
Task decomposition and orchestration |
We welcome contributions! See CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Testing requirements
- PR process
For benchmark and reproducibility work, also see docs/BENCHMARKS.md.
MIT License — see LICENSE for details.