🧸 Teddy

An AI engineering orchestrator that ships real code.

Plan, develop, test, validate, and version full software projects under a deterministic state machine — with anti-AI-scrap scanners, multi-agent collaboration, and risk-gated terminal access on top of any LLM provider.

Install · Quick start · Features · TUI · CLI reference · Configuration · Validation · Architecture

What Teddy is

Teddy is a governance layer for AI-driven software engineering. Instead of a chat box that hopes the model behaves, Teddy gives you:

A deterministic 19-phase state machine (idea → PR) with ACID-flavored persistence, top-level run lock, and corruption-recovering state files.
11 anti-AI-scrap scanners that block placeholders, secrets, dead code, fake tests, silent failures, insecure patterns (eval, shell=True, pickle.loads), stray console.log, sync I/O in async paths, hardcoded URLs, untyped boundaries, and license drift — before anything reaches git.
Multi-agent collaboration per task type (PM, architect, backend, frontend, ui_ux, qa, e2e, performance, security, devops, git_manager) with live streaming, per-task Markdown transcripts, and inlined failure findings.
A tool-calling loop any tool-capable model can drive (OpenAI, Anthropic, OpenRouter, Ollama). Skills are gated by autonomy, per-skill permission overrides, and a hard denylist that the LLM cannot bypass.
A 17-view full-screen TUI built on ink plus a daemon mode for warm-boot CLI calls and a generic background runner.
Per-hunk diff approval (git add -p keys) and a persistent diff log under .teddy/diffs/ — every write leaves a paper trail even in autonomous mode.

It works fully offline against scanners, docs, kanban, fs/process tooling, and templates — agents fall back to "deferred" mode when no provider is configured.

Install

# Use it on demand without installing
npx teddy-orchestrator --help

# Or install globally
npm install -g teddy-orchestrator
teddy --help

The package ships as teddy-orchestrator on npm; the binary is teddy.

Requirements

Node.js ≥ 20 (tested on 20.x and 22.x)
Git (branch / commit / PR features)
Docker + Compose (optional — used by templates and teddy docker)
An LLM provider (optional — agents go into "deferred" mode without one)
- OpenAI (OPENAI_API_KEY)
- Anthropic (ANTHROPIC_API_KEY)
- OpenRouter (OPENROUTER_API_KEY)
- Ollama (local, no key needed — http://localhost:11434)

Quick start

# 1. Scaffold a new project
teddy init --template nextjs-saas --name acme \
  --type fullstack --goal "Multi-tenant reservations" \
  --non-interactive

# 2. Generate sprints and tasks
teddy plan

# 3. Drive the autonomy loop until done
teddy workflow --max-iterations 50

# 4. Watch live agent activity
teddy tui                  # full-screen, 17 views
# or:
teddy monitor              # one-shot dashboard

Or jump straight to ad-hoc tasks:

teddy fix     "auth redirect loop after Stripe checkout" --run
teddy feature "rate-limit /api/upload to 10 req/min"     --run
teddy review  --ai
teddy chat    --continue   # resumable interactive REPL with multimodal input

Features

Multi-provider AI with capability registry

Provider abstraction with live capability detection — each model self-reports vision support, tool-calling, streaming, max context, and pricing. Fallback chains run in declared order; the orchestrator picks the cheapest provider that can serve the requested capability.

teddy provider list           # configured providers + capabilities
teddy provider probe          # round-trip ping through fallback chain
teddy doctor --probe-providers

19-phase deterministic FSM

Every project moves through a finite state machine (idea → discovery → planning → implementation → validation → review → release → …). State writes are atomic + locked + mirrored to .json.backup; a corrupt primary auto-promotes the backup with the bad file quarantined to .teddy/corrupt/. A top-level run lock prevents two runNextTask invocations from racing.

Anti-AI-scrap scanners (11 of them)

Scanner	Default	Severity	What it catches
`placeholders`	on	error	`lorem ipsum`, naked `TODO/FIXME/XXX`, `<<replace_me>>`
`secrets`	on	error/critical	AWS, GitHub PAT, OpenAI, Anthropic, Slack, private keys
`empty_catch`	on	error	`catch { }` and `except: pass`
`fake_tests`	on	error/warn	`expect(true).toBe(true)`, test files with no assertions
`dead_code`	on	warning	Files unreferenced by any other module
`insecure_patterns`	on	error/critical	`eval`, `new Function`, `child_process.exec`, `vm.runIn*`, `os.system`, `subprocess(... shell=True)`, `pickle.loads`
`console_leftover`	on	warning	Stray `console.log/debug/info/trace` outside CLI/TUI
`sync_io_in_async`	on	warning	`readFileSync`, `execSync`, etc. inside `async function` bodies
`hardcoded_urls`	off	warning	Pinned URLs / public IPv4s in source
`any_overuse`	off	warning	`: any` / `as any` in TS source
`missing_license`	off	info	No SPDX/Copyright header at top of file

Each is independently togglable in quality: config and respects a // TEDDY_OK <reason> per-line escape hatch.

Multi-agent collaboration

Per task type, Teddy assembles a pipeline (e.g. pm → architect → backend → qa → e2e → security → git_manager). Each agent gets a curated skill list and runs through the tool-calling loop. Live event stream printed to stderr, full transcripts at .teddy/transcripts/<task>.<attempt>.md.

▸ backend (BACKLOG-T01)
  → read_file src/auth.ts
  ✓ 12ms
  … I'll need to add a check before the redirect …
  → write_file src/auth.ts
  ✓ 24ms
  backend → implemented
ℹ tokens=12345 calls=7 est=$0.4200

Cross-platform terminal layer

teddy fs tree --depth 3
teddy fs find "src/**/*.ts" --limit 50
teddy fs rm node_modules -r       # journaled
teddy fs undo                     # restored

teddy ps list --filter node
teddy ps kill 12345 --signal SIGTERM

teddy run "npm run build"
teddy run --dry-run "rm -rf node_modules"
teddy run "rm -rf /"              # ✗ refused by hard denylist

Risk tiers (safe / write / destructive / dangerous) drive autonomy gates. The hard denylist refuses catastrophic patterns regardless of approval: rm -rf /, rm -rf $HOME, dd of=/dev/sda, mkfs.*, fork bombs, curl … | sh, recursive chmod at root, Windows format C: / diskpart. Every destructive op journals to .teddy/undo/<id>/ for teddy fs undo [N].

Daemon + warm-boot RPC

teddy daemon start runs a long-lived orchestrator over a Unix socket so warm-boot CLI calls skip the 500ms re-init. JSON-line protocol, capability negotiation, automatic fallback to direct execution if the socket isn't there.

teddy daemon start
teddy daemon status     # uptime, queue depth, active agents
teddy daemon ping       # round-trip latency
teddy daemon stop

Background runner

teddy bg run "task run BACKLOG-T01"
teddy bg run "watch --on-change validate" --name watcher
teddy bg list / stop <id> / logs <id> --follow / clean

PID + manifest + log under .teddy/bg/<id>.{json,log}. Liveness via kill -0.

Code graph + vector embedding index

Real imports + symbols + co-change graph, mtime-fingerprinted cache. Plus a multi-provider vector index for semantic recall:

teddy graph imports / cochange / context / state / symbols / neighbors
teddy index build [--provider openai|voyage|ollama]
teddy index search "auth flow"
teddy repo-map --task "fix the Stripe webhook timeout"

OpenAI text-embedding-3-small, Voyage voyage-code-2, or Ollama nomic-embed-text (fully local). Auto-detected from env.

52 native skills + dynamic MCP tools

Skills are the building blocks every agent and orchestrator phase uses. Each one has typed input/output, optional JSONSchema validation, idempotency hints, cost estimates, and per-skill permission overrides.

teddy skills list                              # 52 native + MCP tools
teddy skills info read_file                    # introspection
teddy skills validate write_file --args '...'  # dry-run validation
teddy skills run scan_secrets --args '{}'

Categories:

Core — read_file, write_file, create_file, delete_path, move_path, copy_path, list_*, find_files, which_binary, current_directory, change_directory, run_command, plus all five legacy scanners.
Engineering — analyze_repo, detect_stack, update_docs, update_progress, append_decision, dependency_review, test_generation, e2e_flow_design, refactor_planning, security_review, performance_check.
Operational — git_branch, git_commit, git_push, create_pr, run_docker, healthcheck_service, docker_down, run_tests, run_playwright, bootstrap_project, start_local_environment, collect_logs, diagnose_failure, retry_with_strategy, rollback_changes, list_processes, kill_process.
Terminal — cross-platform fs / ps / run primitives with risk tiers.

MCP client

Teddy is an MCP client — point it at any MCP server and its tools become first-class skills.

mcp_servers:
  filesystem:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']
  postgres:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-postgres', 'postgresql://...']

teddy mcp list / probe / call <server> <tool> --args '...'

Resumable chat with multimodal input

teddy chat                       # fresh session (.teddy/sessions/<id>.jsonl)
teddy chat --continue            # resume the most recent
teddy chat --resume <id>
teddy chat --image bug.png screenshot.jpg
teddy chat list / forget <id>

Append-only one-message-per-line: a kill -9 mid-write loses the in-flight token, never the prior history. OpenAI / Anthropic / Ollama all wired; vision models receive parts, text-only models silently get the text part.

Per-hunk diff approval + persistent diff log

git add -p keys (y/n/a/q) on every write when autonomy.mode = manual or when a per-skill skill_permissions: { write_file: confirm } is set. Every write also appends to .teddy/diffs/<yyyy-mm-dd>.diff so even autonomous runs leave a paper trail.

teddy diffs list / show <day> / today

Mermaid output everywhere

teddy roadmap gantt              # sprint Gantt
teddy roadmap mindmap            # task hierarchy mindmap
teddy roadmap timeline           # release timeline
teddy roadmap collab             # agent collaboration flowchart
teddy graph imports --mermaid
teddy kanban --mermaid

TUI

teddy tui launches a full-screen ink-based TUI with 17 views:

Primary (1–9) — home, tasks, run, chat, diffs, sessions, index, bg, logs.

Secondary (M opens the More menu) — plan, kanban, memory, providers, mcp, workflow, doctor, failures.

Hotkeys are mnemonic and discoverable: ? opens help, t toggles theme, q quits, 1–9 jump to primary views, M opens the more menu. Mouse support, scrollable panes, narrow/wide layout fallback, and brand-coordinated colors.

The Kanban view auto-refreshes every 3 seconds, so a runNextTask in another pane visibly walks cards from Ready → In Progress → Validating → Completed.

CLI reference

Command	Purpose
`teddy init`	Scaffold project + state + docs (with `--template`)
`teddy interpret`	Heuristic interpretation of a one-line product idea
`teddy analyze`	Detect language / framework / package manager
`teddy discover`	Q&A flow: blocking + non-blocking questions
`teddy docs`	Generate / review the `/docs` tree
`teddy plan [--from-prd]`	Generate sprints (baseline or PRD-derived)
`teddy sprint`, `task`	Sprint + task lifecycle
`teddy task add "<title>" --type ...`	Ad-hoc task without full sprint planning
`teddy task plan <id>`	Read-only LLM plan preview before execution
`teddy fix "<bug>" [--run]`	One-shot bugfix task creation
`teddy feature "<desc>" [--run]`	One-shot feature task creation
`teddy review [--ai]`	Code review: scanners + graph + optional agent review
`teddy status`	Phase, sprint, task, branch, validation, tokens, $$
`teddy validate [--with-tests]`	Anti-scrap scanners (and full gate when requested)
`teddy test`, `e2e`	Test + Playwright runners
`teddy e2e install / scaffold`	Browser setup, baseline config
`teddy fs`	Cross-platform fs ops + undo journal
`teddy ps`	Cross-platform process list + kill
`teddy run "<cmd>"`	Risk-gated raw shell with live streaming
`teddy docker`	Compose up / down / status
`teddy git`	Status, PR open/update via `gh`
`teddy resume`	Pre-flight summary then run next task
`teddy doctor [--fix]`	Host + provider readiness + optional auto-repair
`teddy audit`	Filter RUN_LOG / TASK_HISTORY / FAILURE_LOG
`teddy config`	Show / set / validate `.teddy.yml`
`teddy failure-report`	Markdown postmortem from journals
`teddy bootstrap` / `diagnose` / `rollback` / `logs`	Ops shortcuts
`teddy pricing`	Inspect / set / refresh provider pricing overlay
`teddy provider`	List / probe configured AI providers
`teddy repo-map`	Top-N relevant files for a task description
`teddy memory`	Cross-session memory
`teddy mcp`	List / probe / call MCP servers
`teddy graph`	Imports / cochange / context / state / symbols / neighbors
`teddy kanban`	Terminal kanban (or `--mermaid`)
`teddy monitor`	Live dashboard with refresh interval
`teddy roadmap`	Mermaid Gantt / mindmap / timeline / collab flowchart
`teddy skills`	List / `info <name>` / `validate <name>` / `run <name>`
`teddy watch`	Re-run validation on file change
`teddy chat [--continue / --resume <id> / --image <path>]`	Resumable REPL with multimodal input
`teddy chat list / forget <id>`	Browse / drop saved sessions
`teddy diffs list / show <day> / today`	Browse the persistent diff log
`teddy index build / search / status`	Vector embedding index, OpenAI/Voyage/Ollama
`teddy daemon start / stop / status / ping`	Long-lived warm-boot RPC daemon
`teddy bg run / list / stop / logs / clean`	Generic background runner
`teddy tui`	Full-screen TUI: 17 views, branded, all features
`teddy workflow`	Drive the autonomy loop until terminal state
`teddy telemetry`	Opt-in, anonymous, local-only command logging
`teddy upgrade`	Self-update via npm
`teddy version`	Build info

Configuration

teddy init writes .teddy.yml at the project root. Highlights:

autonomy:
  mode: assisted              # manual | assisted | autonomous | full_auto_guarded
  skill_permissions:
    write_file: confirm       # allow | confirm | deny

providers:
  default: openrouter
  fallback: [anthropic, openai, ollama]

budget:
  max_tokens: 1_000_000
  max_usd: 10
  warn_at_percent: 80
  enforce: true

quality:
  no_placeholders: true
  no_dead_code: true
  no_secrets: true
  require_tests: true
  require_docker: true
  forbid_insecure_patterns: true
  forbid_console_leftover: true
  forbid_sync_io_in_async: true
  forbid_hardcoded_urls: false
  forbid_any_overuse: false
  require_license_header: false

git:
  strategy: github-flow       # github-flow | git-flow | trunk-based
  commit_style: conventional_commits

hooks:
  pre_task: ['npm run lint:fix']
  post_validation:
    - './scripts/notify-slack.sh "$TEDDY_TASK_ID $TEDDY_VALIDATION"'

mcp_servers:
  filesystem:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']

Full schema: teddy config show --schema.

Data overlays (no recompile)

Every previously-static table lives as a real JSON file under the shipped data/ directory and accepts overrides:

File	What it controls
`data/pricing.json`	USD per 1M tokens by provider/model
`data/frameworks.json`	Framework detection + port inference
`data/dependency-policy.json`	Heavy deps + duplicate-family groups
`data/git-conventions.json`	Branch prefixes, commit types, hotfix marker, slug length

Override locations (project beats user beats bundled):

Project: .teddy/data/<name>.json
User: ~/.teddy/data/<name>.json
Bundled: ships with the package (read-only)

For pricing: teddy pricing show / set / refresh / path / reset — refresh from a URL or local JSON.

Validation

teddy validate runs all enabled scanners. The full quality gate (teddy validate --with-tests) additionally runs the test suite, type checker, and lint, and blocks on any error or critical finding.

teddy validate                  # scanners only
teddy validate --with-tests     # scanners + tests + typecheck + lint
teddy watch --on-change validate

Findings are written to .teddy/VALIDATION_REPORT.json and surfaced in the TUI's failures view.

State files

.teddy/
  PROJECT_STATE.json         # current phase + task + branch (atomic, locked, migrated)
  CONTEXT_MODEL.json         # structured project context
  SPRINTS.json               # sprints + tasks
  TASKS.json                 # flat task index
  VALIDATION_REPORT.json     # last quality-gate run
  RUN_LOG.jsonl              # append-only structured log (secrets redacted)
  TASK_HISTORY.jsonl         # append-only task lifecycle
  FAILURE_LOG.jsonl          # append-only failures
  PROVIDER_STATE.json        # provider-specific state
  MEMORY.md                  # cross-session memory
  data/                      # overlay tables
  cache/code-graph.json      # code-graph cache (mtime-fingerprinted)
  diffs/<yyyy-mm-dd>.diff    # daily diff log
  sessions/<id>.jsonl        # chat sessions
  transcripts/<task>.<n>.md  # per-task agent transcripts
  bg/<id>.{json,log}         # background runner manifests + logs
  undo.jsonl                 # destructive-op journal (last 100 entries)
  undo/<id>/                 # snapshots backing each entry
  corrupt/                   # quarantined corrupt state files

State is migrated automatically when an older PROJECT_STATE.json is loaded by a newer Teddy.

GitHub Action

- uses: ohswedd/teddy@v1.26.0
  with:
    command: validate

Pass any subcommand:

- uses: ohswedd/teddy@v1.26.0
  with:
    command: workflow
    args: '--max-iterations 50 --stop-on-failure'

Architecture

SOLID throughout. Every collaborator has one reason to change; the orchestrator only wires.
DIP: agents / skills / providers consume small interfaces, not concrete classes.
OCP: register new agents, providers, skills, gates, MCP servers, data overlays without editing existing code.
ACID-flavored persistence: atomic writes, schema validation, per-file locks, fsync, append-only journals, schema migrations.
Graceful shutdown: SIGINT / SIGTERM / uncaughtException flush logs and tear down resources (MCP children, watchers, locks) with a 5-second hard deadline.
Defense in depth: anti-scrap scanners, risk classifier + hard denylist on shell, autonomy gate on writes/destructive ops, undo journal on every destructive fs op, secret redaction in every log line, token + USD budget enforcement on every provider call.
AbortSignal end-to-end through every provider call so mid-LLM cancellation actually cancels.
Pure-function scanners so they can be tested standalone, composed in pipelines, or invoked from skills.

Development

git clone https://github.com/Ohswedd/teddy.git
cd teddy
npm install
npm run build
npm test                                     # 468 tests
npm run lint
node dist/cli/bin.js --quiet validate        # self anti-scrap scan

CI runs typecheck + lint + build + test + self-validation on every push and PR (Node 20 + 22).

See CONTRIBUTING.md for the contribution checklist.

License

MIT — see LICENSE.

Built with care. Opinions, bug reports, and feature requests welcome at github.com/Ohswedd/teddy/issues.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
data		data
src		src
tests		tests
.gitignore		.gitignore
.prettierrc		.prettierrc
.teddyignore		.teddyignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

🧸 Teddy

What Teddy is

Install

Requirements

Quick start

Features

Multi-provider AI with capability registry

19-phase deterministic FSM

Anti-AI-scrap scanners (11 of them)

Multi-agent collaboration

Cross-platform terminal layer

Daemon + warm-boot RPC

Background runner

Code graph + vector embedding index

52 native skills + dynamic MCP tools

MCP client

Resumable chat with multimodal input

Per-hunk diff approval + persistent diff log

Mermaid output everywhere

TUI

CLI reference

Configuration

Data overlays (no recompile)

Validation

State files

GitHub Action

Architecture

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 29

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages