An AI engineering orchestrator that ships real code.
Plan, develop, test, validate, and version full software projects under a deterministic state machine — with anti-AI-scrap scanners, multi-agent collaboration, and risk-gated terminal access on top of any LLM provider.
Install · Quick start · Features · TUI · CLI reference · Configuration · Validation · Architecture
Teddy is a governance layer for AI-driven software engineering. Instead of a chat box that hopes the model behaves, Teddy gives you:
- A deterministic 19-phase state machine (idea → PR) with ACID-flavored persistence, top-level run lock, and corruption-recovering state files.
- 11 anti-AI-scrap scanners that block placeholders, secrets, dead code, fake tests, silent failures, insecure patterns (
eval,shell=True,pickle.loads), strayconsole.log, sync I/O in async paths, hardcoded URLs, untyped boundaries, and license drift — before anything reachesgit. - Multi-agent collaboration per task type (PM, architect, backend, frontend, ui_ux, qa, e2e, performance, security, devops, git_manager) with live streaming, per-task Markdown transcripts, and inlined failure findings.
- A tool-calling loop any tool-capable model can drive (OpenAI, Anthropic, OpenRouter, Ollama). Skills are gated by autonomy, per-skill permission overrides, and a hard denylist that the LLM cannot bypass.
- A 17-view full-screen TUI built on
inkplus a daemon mode for warm-boot CLI calls and a generic background runner. - Per-hunk diff approval (
git add -pkeys) and a persistent diff log under.teddy/diffs/— every write leaves a paper trail even in autonomous mode.
It works fully offline against scanners, docs, kanban, fs/process tooling, and templates — agents fall back to "deferred" mode when no provider is configured.
# Use it on demand without installing
npx teddy-orchestrator --help
# Or install globally
npm install -g teddy-orchestrator
teddy --helpThe package ships as teddy-orchestrator on npm; the binary is teddy.
- Node.js ≥ 20 (tested on 20.x and 22.x)
- Git (branch / commit / PR features)
- Docker + Compose (optional — used by templates and
teddy docker) - An LLM provider (optional — agents go into "deferred" mode without one)
- OpenAI (
OPENAI_API_KEY) - Anthropic (
ANTHROPIC_API_KEY) - OpenRouter (
OPENROUTER_API_KEY) - Ollama (local, no key needed —
http://localhost:11434)
- OpenAI (
# 1. Scaffold a new project
teddy init --template nextjs-saas --name acme \
--type fullstack --goal "Multi-tenant reservations" \
--non-interactive
# 2. Generate sprints and tasks
teddy plan
# 3. Drive the autonomy loop until done
teddy workflow --max-iterations 50
# 4. Watch live agent activity
teddy tui # full-screen, 17 views
# or:
teddy monitor # one-shot dashboardOr jump straight to ad-hoc tasks:
teddy fix "auth redirect loop after Stripe checkout" --run
teddy feature "rate-limit /api/upload to 10 req/min" --run
teddy review --ai
teddy chat --continue # resumable interactive REPL with multimodal inputProvider abstraction with live capability detection — each model self-reports vision support, tool-calling, streaming, max context, and pricing. Fallback chains run in declared order; the orchestrator picks the cheapest provider that can serve the requested capability.
teddy provider list # configured providers + capabilities
teddy provider probe # round-trip ping through fallback chain
teddy doctor --probe-providersEvery project moves through a finite state machine (idea → discovery → planning → implementation → validation → review → release → …). State writes are atomic + locked + mirrored to .json.backup; a corrupt primary auto-promotes the backup with the bad file quarantined to .teddy/corrupt/. A top-level run lock prevents two runNextTask invocations from racing.
| Scanner | Default | Severity | What it catches |
|---|---|---|---|
placeholders |
on | error | lorem ipsum, naked TODO/FIXME/XXX, <<replace_me>> |
secrets |
on | error/critical | AWS, GitHub PAT, OpenAI, Anthropic, Slack, private keys |
empty_catch |
on | error | catch { } and except: pass |
fake_tests |
on | error/warn | expect(true).toBe(true), test files with no assertions |
dead_code |
on | warning | Files unreferenced by any other module |
insecure_patterns |
on | error/critical | eval, new Function, child_process.exec, vm.runIn*, os.system, subprocess(... shell=True), pickle.loads |
console_leftover |
on | warning | Stray console.log/debug/info/trace outside CLI/TUI |
sync_io_in_async |
on | warning | readFileSync, execSync, etc. inside async function bodies |
hardcoded_urls |
off | warning | Pinned URLs / public IPv4s in source |
any_overuse |
off | warning | : any / as any in TS source |
missing_license |
off | info | No SPDX/Copyright header at top of file |
Each is independently togglable in quality: config and respects a // TEDDY_OK <reason> per-line escape hatch.
Per task type, Teddy assembles a pipeline (e.g. pm → architect → backend → qa → e2e → security → git_manager). Each agent gets a curated skill list and runs through the tool-calling loop. Live event stream printed to stderr, full transcripts at .teddy/transcripts/<task>.<attempt>.md.
▸ backend (BACKLOG-T01)
→ read_file src/auth.ts
✓ 12ms
… I'll need to add a check before the redirect …
→ write_file src/auth.ts
✓ 24ms
backend → implemented
ℹ tokens=12345 calls=7 est=$0.4200
teddy fs tree --depth 3
teddy fs find "src/**/*.ts" --limit 50
teddy fs rm node_modules -r # journaled
teddy fs undo # restored
teddy ps list --filter node
teddy ps kill 12345 --signal SIGTERM
teddy run "npm run build"
teddy run --dry-run "rm -rf node_modules"
teddy run "rm -rf /" # ✗ refused by hard denylistRisk tiers (safe / write / destructive / dangerous) drive autonomy gates. The hard denylist refuses catastrophic patterns regardless of approval: rm -rf /, rm -rf $HOME, dd of=/dev/sda, mkfs.*, fork bombs, curl … | sh, recursive chmod at root, Windows format C: / diskpart. Every destructive op journals to .teddy/undo/<id>/ for teddy fs undo [N].
teddy daemon start runs a long-lived orchestrator over a Unix socket so warm-boot CLI calls skip the 500ms re-init. JSON-line protocol, capability negotiation, automatic fallback to direct execution if the socket isn't there.
teddy daemon start
teddy daemon status # uptime, queue depth, active agents
teddy daemon ping # round-trip latency
teddy daemon stopteddy bg run "task run BACKLOG-T01"
teddy bg run "watch --on-change validate" --name watcher
teddy bg list / stop <id> / logs <id> --follow / cleanPID + manifest + log under .teddy/bg/<id>.{json,log}. Liveness via kill -0.
Real imports + symbols + co-change graph, mtime-fingerprinted cache. Plus a multi-provider vector index for semantic recall:
teddy graph imports / cochange / context / state / symbols / neighbors
teddy index build [--provider openai|voyage|ollama]
teddy index search "auth flow"
teddy repo-map --task "fix the Stripe webhook timeout"OpenAI text-embedding-3-small, Voyage voyage-code-2, or Ollama nomic-embed-text (fully local). Auto-detected from env.
Skills are the building blocks every agent and orchestrator phase uses. Each one has typed input/output, optional JSONSchema validation, idempotency hints, cost estimates, and per-skill permission overrides.
teddy skills list # 52 native + MCP tools
teddy skills info read_file # introspection
teddy skills validate write_file --args '...' # dry-run validation
teddy skills run scan_secrets --args '{}'Categories:
- Core —
read_file,write_file,create_file,delete_path,move_path,copy_path,list_*,find_files,which_binary,current_directory,change_directory,run_command, plus all five legacy scanners. - Engineering —
analyze_repo,detect_stack,update_docs,update_progress,append_decision,dependency_review,test_generation,e2e_flow_design,refactor_planning,security_review,performance_check. - Operational —
git_branch,git_commit,git_push,create_pr,run_docker,healthcheck_service,docker_down,run_tests,run_playwright,bootstrap_project,start_local_environment,collect_logs,diagnose_failure,retry_with_strategy,rollback_changes,list_processes,kill_process. - Terminal — cross-platform fs / ps / run primitives with risk tiers.
Teddy is an MCP client — point it at any MCP server and its tools become first-class skills.
mcp_servers:
filesystem:
command: npx
args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']
postgres:
command: npx
args: ['-y', '@modelcontextprotocol/server-postgres', 'postgresql://...']teddy mcp list / probe / call <server> <tool> --args '...'teddy chat # fresh session (.teddy/sessions/<id>.jsonl)
teddy chat --continue # resume the most recent
teddy chat --resume <id>
teddy chat --image bug.png screenshot.jpg
teddy chat list / forget <id>Append-only one-message-per-line: a kill -9 mid-write loses the in-flight token, never the prior history. OpenAI / Anthropic / Ollama all wired; vision models receive parts, text-only models silently get the text part.
git add -p keys (y/n/a/q) on every write when autonomy.mode = manual or when a per-skill skill_permissions: { write_file: confirm } is set. Every write also appends to .teddy/diffs/<yyyy-mm-dd>.diff so even autonomous runs leave a paper trail.
teddy diffs list / show <day> / todayteddy roadmap gantt # sprint Gantt
teddy roadmap mindmap # task hierarchy mindmap
teddy roadmap timeline # release timeline
teddy roadmap collab # agent collaboration flowchart
teddy graph imports --mermaid
teddy kanban --mermaidteddy tui launches a full-screen ink-based TUI with 17 views:
Primary (1–9) — home, tasks, run, chat, diffs, sessions, index, bg, logs.
Secondary (M opens the More menu) — plan, kanban, memory, providers, mcp, workflow, doctor, failures.
Hotkeys are mnemonic and discoverable: ? opens help, t toggles theme, q quits, 1–9 jump to primary views, M opens the more menu. Mouse support, scrollable panes, narrow/wide layout fallback, and brand-coordinated colors.
The Kanban view auto-refreshes every 3 seconds, so a runNextTask in another pane visibly walks cards from Ready → In Progress → Validating → Completed.
| Command | Purpose |
|---|---|
teddy init |
Scaffold project + state + docs (with --template) |
teddy interpret |
Heuristic interpretation of a one-line product idea |
teddy analyze |
Detect language / framework / package manager |
teddy discover |
Q&A flow: blocking + non-blocking questions |
teddy docs |
Generate / review the /docs tree |
teddy plan [--from-prd] |
Generate sprints (baseline or PRD-derived) |
teddy sprint, task |
Sprint + task lifecycle |
teddy task add "<title>" --type ... |
Ad-hoc task without full sprint planning |
teddy task plan <id> |
Read-only LLM plan preview before execution |
teddy fix "<bug>" [--run] |
One-shot bugfix task creation |
teddy feature "<desc>" [--run] |
One-shot feature task creation |
teddy review [--ai] |
Code review: scanners + graph + optional agent review |
teddy status |
Phase, sprint, task, branch, validation, tokens, $$ |
teddy validate [--with-tests] |
Anti-scrap scanners (and full gate when requested) |
teddy test, e2e |
Test + Playwright runners |
teddy e2e install / scaffold |
Browser setup, baseline config |
teddy fs |
Cross-platform fs ops + undo journal |
teddy ps |
Cross-platform process list + kill |
teddy run "<cmd>" |
Risk-gated raw shell with live streaming |
teddy docker |
Compose up / down / status |
teddy git |
Status, PR open/update via gh |
teddy resume |
Pre-flight summary then run next task |
teddy doctor [--fix] |
Host + provider readiness + optional auto-repair |
teddy audit |
Filter RUN_LOG / TASK_HISTORY / FAILURE_LOG |
teddy config |
Show / set / validate .teddy.yml |
teddy failure-report |
Markdown postmortem from journals |
teddy bootstrap / diagnose / rollback / logs |
Ops shortcuts |
teddy pricing |
Inspect / set / refresh provider pricing overlay |
teddy provider |
List / probe configured AI providers |
teddy repo-map |
Top-N relevant files for a task description |
teddy memory |
Cross-session memory |
teddy mcp |
List / probe / call MCP servers |
teddy graph |
Imports / cochange / context / state / symbols / neighbors |
teddy kanban |
Terminal kanban (or --mermaid) |
teddy monitor |
Live dashboard with refresh interval |
teddy roadmap |
Mermaid Gantt / mindmap / timeline / collab flowchart |
teddy skills |
List / info <name> / validate <name> / run <name> |
teddy watch |
Re-run validation on file change |
teddy chat [--continue / --resume <id> / --image <path>] |
Resumable REPL with multimodal input |
teddy chat list / forget <id> |
Browse / drop saved sessions |
teddy diffs list / show <day> / today |
Browse the persistent diff log |
teddy index build / search / status |
Vector embedding index, OpenAI/Voyage/Ollama |
teddy daemon start / stop / status / ping |
Long-lived warm-boot RPC daemon |
teddy bg run / list / stop / logs / clean |
Generic background runner |
teddy tui |
Full-screen TUI: 17 views, branded, all features |
teddy workflow |
Drive the autonomy loop until terminal state |
teddy telemetry |
Opt-in, anonymous, local-only command logging |
teddy upgrade |
Self-update via npm |
teddy version |
Build info |
teddy init writes .teddy.yml at the project root. Highlights:
autonomy:
mode: assisted # manual | assisted | autonomous | full_auto_guarded
skill_permissions:
write_file: confirm # allow | confirm | deny
providers:
default: openrouter
fallback: [anthropic, openai, ollama]
budget:
max_tokens: 1_000_000
max_usd: 10
warn_at_percent: 80
enforce: true
quality:
no_placeholders: true
no_dead_code: true
no_secrets: true
require_tests: true
require_docker: true
forbid_insecure_patterns: true
forbid_console_leftover: true
forbid_sync_io_in_async: true
forbid_hardcoded_urls: false
forbid_any_overuse: false
require_license_header: false
git:
strategy: github-flow # github-flow | git-flow | trunk-based
commit_style: conventional_commits
hooks:
pre_task: ['npm run lint:fix']
post_validation:
- './scripts/notify-slack.sh "$TEDDY_TASK_ID $TEDDY_VALIDATION"'
mcp_servers:
filesystem:
command: npx
args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']Full schema: teddy config show --schema.
Every previously-static table lives as a real JSON file under the shipped data/ directory and accepts overrides:
| File | What it controls |
|---|---|
data/pricing.json |
USD per 1M tokens by provider/model |
data/frameworks.json |
Framework detection + port inference |
data/dependency-policy.json |
Heavy deps + duplicate-family groups |
data/git-conventions.json |
Branch prefixes, commit types, hotfix marker, slug length |
Override locations (project beats user beats bundled):
- Project:
.teddy/data/<name>.json - User:
~/.teddy/data/<name>.json - Bundled: ships with the package (read-only)
For pricing: teddy pricing show / set / refresh / path / reset — refresh from a URL or local JSON.
teddy validate runs all enabled scanners. The full quality gate (teddy validate --with-tests) additionally runs the test suite, type checker, and lint, and blocks on any error or critical finding.
teddy validate # scanners only
teddy validate --with-tests # scanners + tests + typecheck + lint
teddy watch --on-change validateFindings are written to .teddy/VALIDATION_REPORT.json and surfaced in the TUI's failures view.
.teddy/
PROJECT_STATE.json # current phase + task + branch (atomic, locked, migrated)
CONTEXT_MODEL.json # structured project context
SPRINTS.json # sprints + tasks
TASKS.json # flat task index
VALIDATION_REPORT.json # last quality-gate run
RUN_LOG.jsonl # append-only structured log (secrets redacted)
TASK_HISTORY.jsonl # append-only task lifecycle
FAILURE_LOG.jsonl # append-only failures
PROVIDER_STATE.json # provider-specific state
MEMORY.md # cross-session memory
data/ # overlay tables
cache/code-graph.json # code-graph cache (mtime-fingerprinted)
diffs/<yyyy-mm-dd>.diff # daily diff log
sessions/<id>.jsonl # chat sessions
transcripts/<task>.<n>.md # per-task agent transcripts
bg/<id>.{json,log} # background runner manifests + logs
undo.jsonl # destructive-op journal (last 100 entries)
undo/<id>/ # snapshots backing each entry
corrupt/ # quarantined corrupt state files
State is migrated automatically when an older PROJECT_STATE.json is loaded by a newer Teddy.
- uses: ohswedd/teddy@v1.26.0
with:
command: validatePass any subcommand:
- uses: ohswedd/teddy@v1.26.0
with:
command: workflow
args: '--max-iterations 50 --stop-on-failure'- SOLID throughout. Every collaborator has one reason to change; the orchestrator only wires.
- DIP: agents / skills / providers consume small interfaces, not concrete classes.
- OCP: register new agents, providers, skills, gates, MCP servers, data overlays without editing existing code.
- ACID-flavored persistence: atomic writes, schema validation, per-file locks, fsync, append-only journals, schema migrations.
- Graceful shutdown: SIGINT / SIGTERM / uncaughtException flush logs and tear down resources (MCP children, watchers, locks) with a 5-second hard deadline.
- Defense in depth: anti-scrap scanners, risk classifier + hard denylist on shell, autonomy gate on writes/destructive ops, undo journal on every destructive fs op, secret redaction in every log line, token + USD budget enforcement on every provider call.
- AbortSignal end-to-end through every provider call so mid-LLM cancellation actually cancels.
- Pure-function scanners so they can be tested standalone, composed in pipelines, or invoked from skills.
git clone https://github.com/Ohswedd/teddy.git
cd teddy
npm install
npm run build
npm test # 468 tests
npm run lint
node dist/cli/bin.js --quiet validate # self anti-scrap scanCI runs typecheck + lint + build + test + self-validation on every push and PR (Node 20 + 22).
See CONTRIBUTING.md for the contribution checklist.
MIT — see LICENSE.
Built with care. Opinions, bug reports, and feature requests welcome at github.com/Ohswedd/teddy/issues.