Skip to content

Ohswedd/teddy

Repository files navigation

🧸 Teddy

An AI engineering orchestrator that ships real code.

Plan, develop, test, validate, and version full software projects under a deterministic state machine — with anti-AI-scrap scanners, multi-agent collaboration, and risk-gated terminal access on top of any LLM provider.

npm version npm downloads CI tests license node types PRs welcome GitHub stars

Install · Quick start · Features · TUI · CLI reference · Configuration · Validation · Architecture


What Teddy is

Teddy is a governance layer for AI-driven software engineering. Instead of a chat box that hopes the model behaves, Teddy gives you:

  • A deterministic 19-phase state machine (idea → PR) with ACID-flavored persistence, top-level run lock, and corruption-recovering state files.
  • 11 anti-AI-scrap scanners that block placeholders, secrets, dead code, fake tests, silent failures, insecure patterns (eval, shell=True, pickle.loads), stray console.log, sync I/O in async paths, hardcoded URLs, untyped boundaries, and license drift — before anything reaches git.
  • Multi-agent collaboration per task type (PM, architect, backend, frontend, ui_ux, qa, e2e, performance, security, devops, git_manager) with live streaming, per-task Markdown transcripts, and inlined failure findings.
  • A tool-calling loop any tool-capable model can drive (OpenAI, Anthropic, OpenRouter, Ollama). Skills are gated by autonomy, per-skill permission overrides, and a hard denylist that the LLM cannot bypass.
  • A 17-view full-screen TUI built on ink plus a daemon mode for warm-boot CLI calls and a generic background runner.
  • Per-hunk diff approval (git add -p keys) and a persistent diff log under .teddy/diffs/ — every write leaves a paper trail even in autonomous mode.

It works fully offline against scanners, docs, kanban, fs/process tooling, and templates — agents fall back to "deferred" mode when no provider is configured.


Install

# Use it on demand without installing
npx teddy-orchestrator --help

# Or install globally
npm install -g teddy-orchestrator
teddy --help

The package ships as teddy-orchestrator on npm; the binary is teddy.

Requirements

  • Node.js ≥ 20 (tested on 20.x and 22.x)
  • Git (branch / commit / PR features)
  • Docker + Compose (optional — used by templates and teddy docker)
  • An LLM provider (optional — agents go into "deferred" mode without one)
    • OpenAI (OPENAI_API_KEY)
    • Anthropic (ANTHROPIC_API_KEY)
    • OpenRouter (OPENROUTER_API_KEY)
    • Ollama (local, no key needed — http://localhost:11434)

Quick start

# 1. Scaffold a new project
teddy init --template nextjs-saas --name acme \
  --type fullstack --goal "Multi-tenant reservations" \
  --non-interactive

# 2. Generate sprints and tasks
teddy plan

# 3. Drive the autonomy loop until done
teddy workflow --max-iterations 50

# 4. Watch live agent activity
teddy tui                  # full-screen, 17 views
# or:
teddy monitor              # one-shot dashboard

Or jump straight to ad-hoc tasks:

teddy fix     "auth redirect loop after Stripe checkout" --run
teddy feature "rate-limit /api/upload to 10 req/min"     --run
teddy review  --ai
teddy chat    --continue   # resumable interactive REPL with multimodal input

Features

Multi-provider AI with capability registry

Provider abstraction with live capability detection — each model self-reports vision support, tool-calling, streaming, max context, and pricing. Fallback chains run in declared order; the orchestrator picks the cheapest provider that can serve the requested capability.

teddy provider list           # configured providers + capabilities
teddy provider probe          # round-trip ping through fallback chain
teddy doctor --probe-providers

19-phase deterministic FSM

Every project moves through a finite state machine (idea → discovery → planning → implementation → validation → review → release → …). State writes are atomic + locked + mirrored to .json.backup; a corrupt primary auto-promotes the backup with the bad file quarantined to .teddy/corrupt/. A top-level run lock prevents two runNextTask invocations from racing.

Anti-AI-scrap scanners (11 of them)

Scanner Default Severity What it catches
placeholders on error lorem ipsum, naked TODO/FIXME/XXX, <<replace_me>>
secrets on error/critical AWS, GitHub PAT, OpenAI, Anthropic, Slack, private keys
empty_catch on error catch { } and except: pass
fake_tests on error/warn expect(true).toBe(true), test files with no assertions
dead_code on warning Files unreferenced by any other module
insecure_patterns on error/critical eval, new Function, child_process.exec, vm.runIn*, os.system, subprocess(... shell=True), pickle.loads
console_leftover on warning Stray console.log/debug/info/trace outside CLI/TUI
sync_io_in_async on warning readFileSync, execSync, etc. inside async function bodies
hardcoded_urls off warning Pinned URLs / public IPv4s in source
any_overuse off warning : any / as any in TS source
missing_license off info No SPDX/Copyright header at top of file

Each is independently togglable in quality: config and respects a // TEDDY_OK <reason> per-line escape hatch.

Multi-agent collaboration

Per task type, Teddy assembles a pipeline (e.g. pm → architect → backend → qa → e2e → security → git_manager). Each agent gets a curated skill list and runs through the tool-calling loop. Live event stream printed to stderr, full transcripts at .teddy/transcripts/<task>.<attempt>.md.

▸ backend (BACKLOG-T01)
  → read_file src/auth.ts
  ✓ 12ms
  … I'll need to add a check before the redirect …
  → write_file src/auth.ts
  ✓ 24ms
  backend → implemented
ℹ tokens=12345 calls=7 est=$0.4200

Cross-platform terminal layer

teddy fs tree --depth 3
teddy fs find "src/**/*.ts" --limit 50
teddy fs rm node_modules -r       # journaled
teddy fs undo                     # restored

teddy ps list --filter node
teddy ps kill 12345 --signal SIGTERM

teddy run "npm run build"
teddy run --dry-run "rm -rf node_modules"
teddy run "rm -rf /"              # ✗ refused by hard denylist

Risk tiers (safe / write / destructive / dangerous) drive autonomy gates. The hard denylist refuses catastrophic patterns regardless of approval: rm -rf /, rm -rf $HOME, dd of=/dev/sda, mkfs.*, fork bombs, curl … | sh, recursive chmod at root, Windows format C: / diskpart. Every destructive op journals to .teddy/undo/<id>/ for teddy fs undo [N].

Daemon + warm-boot RPC

teddy daemon start runs a long-lived orchestrator over a Unix socket so warm-boot CLI calls skip the 500ms re-init. JSON-line protocol, capability negotiation, automatic fallback to direct execution if the socket isn't there.

teddy daemon start
teddy daemon status     # uptime, queue depth, active agents
teddy daemon ping       # round-trip latency
teddy daemon stop

Background runner

teddy bg run "task run BACKLOG-T01"
teddy bg run "watch --on-change validate" --name watcher
teddy bg list / stop <id> / logs <id> --follow / clean

PID + manifest + log under .teddy/bg/<id>.{json,log}. Liveness via kill -0.

Code graph + vector embedding index

Real imports + symbols + co-change graph, mtime-fingerprinted cache. Plus a multi-provider vector index for semantic recall:

teddy graph imports / cochange / context / state / symbols / neighbors
teddy index build [--provider openai|voyage|ollama]
teddy index search "auth flow"
teddy repo-map --task "fix the Stripe webhook timeout"

OpenAI text-embedding-3-small, Voyage voyage-code-2, or Ollama nomic-embed-text (fully local). Auto-detected from env.

52 native skills + dynamic MCP tools

Skills are the building blocks every agent and orchestrator phase uses. Each one has typed input/output, optional JSONSchema validation, idempotency hints, cost estimates, and per-skill permission overrides.

teddy skills list                              # 52 native + MCP tools
teddy skills info read_file                    # introspection
teddy skills validate write_file --args '...'  # dry-run validation
teddy skills run scan_secrets --args '{}'

Categories:

  • Coreread_file, write_file, create_file, delete_path, move_path, copy_path, list_*, find_files, which_binary, current_directory, change_directory, run_command, plus all five legacy scanners.
  • Engineeringanalyze_repo, detect_stack, update_docs, update_progress, append_decision, dependency_review, test_generation, e2e_flow_design, refactor_planning, security_review, performance_check.
  • Operationalgit_branch, git_commit, git_push, create_pr, run_docker, healthcheck_service, docker_down, run_tests, run_playwright, bootstrap_project, start_local_environment, collect_logs, diagnose_failure, retry_with_strategy, rollback_changes, list_processes, kill_process.
  • Terminal — cross-platform fs / ps / run primitives with risk tiers.

MCP client

Teddy is an MCP client — point it at any MCP server and its tools become first-class skills.

mcp_servers:
  filesystem:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']
  postgres:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-postgres', 'postgresql://...']
teddy mcp list / probe / call <server> <tool> --args '...'

Resumable chat with multimodal input

teddy chat                       # fresh session (.teddy/sessions/<id>.jsonl)
teddy chat --continue            # resume the most recent
teddy chat --resume <id>
teddy chat --image bug.png screenshot.jpg
teddy chat list / forget <id>

Append-only one-message-per-line: a kill -9 mid-write loses the in-flight token, never the prior history. OpenAI / Anthropic / Ollama all wired; vision models receive parts, text-only models silently get the text part.

Per-hunk diff approval + persistent diff log

git add -p keys (y/n/a/q) on every write when autonomy.mode = manual or when a per-skill skill_permissions: { write_file: confirm } is set. Every write also appends to .teddy/diffs/<yyyy-mm-dd>.diff so even autonomous runs leave a paper trail.

teddy diffs list / show <day> / today

Mermaid output everywhere

teddy roadmap gantt              # sprint Gantt
teddy roadmap mindmap            # task hierarchy mindmap
teddy roadmap timeline           # release timeline
teddy roadmap collab             # agent collaboration flowchart
teddy graph imports --mermaid
teddy kanban --mermaid

TUI

teddy tui launches a full-screen ink-based TUI with 17 views:

Primary (1–9)home, tasks, run, chat, diffs, sessions, index, bg, logs.

Secondary (M opens the More menu)plan, kanban, memory, providers, mcp, workflow, doctor, failures.

Hotkeys are mnemonic and discoverable: ? opens help, t toggles theme, q quits, 19 jump to primary views, M opens the more menu. Mouse support, scrollable panes, narrow/wide layout fallback, and brand-coordinated colors.

The Kanban view auto-refreshes every 3 seconds, so a runNextTask in another pane visibly walks cards from Ready → In Progress → Validating → Completed.


CLI reference

Command Purpose
teddy init Scaffold project + state + docs (with --template)
teddy interpret Heuristic interpretation of a one-line product idea
teddy analyze Detect language / framework / package manager
teddy discover Q&A flow: blocking + non-blocking questions
teddy docs Generate / review the /docs tree
teddy plan [--from-prd] Generate sprints (baseline or PRD-derived)
teddy sprint, task Sprint + task lifecycle
teddy task add "<title>" --type ... Ad-hoc task without full sprint planning
teddy task plan <id> Read-only LLM plan preview before execution
teddy fix "<bug>" [--run] One-shot bugfix task creation
teddy feature "<desc>" [--run] One-shot feature task creation
teddy review [--ai] Code review: scanners + graph + optional agent review
teddy status Phase, sprint, task, branch, validation, tokens, $$
teddy validate [--with-tests] Anti-scrap scanners (and full gate when requested)
teddy test, e2e Test + Playwright runners
teddy e2e install / scaffold Browser setup, baseline config
teddy fs Cross-platform fs ops + undo journal
teddy ps Cross-platform process list + kill
teddy run "<cmd>" Risk-gated raw shell with live streaming
teddy docker Compose up / down / status
teddy git Status, PR open/update via gh
teddy resume Pre-flight summary then run next task
teddy doctor [--fix] Host + provider readiness + optional auto-repair
teddy audit Filter RUN_LOG / TASK_HISTORY / FAILURE_LOG
teddy config Show / set / validate .teddy.yml
teddy failure-report Markdown postmortem from journals
teddy bootstrap / diagnose / rollback / logs Ops shortcuts
teddy pricing Inspect / set / refresh provider pricing overlay
teddy provider List / probe configured AI providers
teddy repo-map Top-N relevant files for a task description
teddy memory Cross-session memory
teddy mcp List / probe / call MCP servers
teddy graph Imports / cochange / context / state / symbols / neighbors
teddy kanban Terminal kanban (or --mermaid)
teddy monitor Live dashboard with refresh interval
teddy roadmap Mermaid Gantt / mindmap / timeline / collab flowchart
teddy skills List / info <name> / validate <name> / run <name>
teddy watch Re-run validation on file change
teddy chat [--continue / --resume <id> / --image <path>] Resumable REPL with multimodal input
teddy chat list / forget <id> Browse / drop saved sessions
teddy diffs list / show <day> / today Browse the persistent diff log
teddy index build / search / status Vector embedding index, OpenAI/Voyage/Ollama
teddy daemon start / stop / status / ping Long-lived warm-boot RPC daemon
teddy bg run / list / stop / logs / clean Generic background runner
teddy tui Full-screen TUI: 17 views, branded, all features
teddy workflow Drive the autonomy loop until terminal state
teddy telemetry Opt-in, anonymous, local-only command logging
teddy upgrade Self-update via npm
teddy version Build info

Configuration

teddy init writes .teddy.yml at the project root. Highlights:

autonomy:
  mode: assisted              # manual | assisted | autonomous | full_auto_guarded
  skill_permissions:
    write_file: confirm       # allow | confirm | deny

providers:
  default: openrouter
  fallback: [anthropic, openai, ollama]

budget:
  max_tokens: 1_000_000
  max_usd: 10
  warn_at_percent: 80
  enforce: true

quality:
  no_placeholders: true
  no_dead_code: true
  no_secrets: true
  require_tests: true
  require_docker: true
  forbid_insecure_patterns: true
  forbid_console_leftover: true
  forbid_sync_io_in_async: true
  forbid_hardcoded_urls: false
  forbid_any_overuse: false
  require_license_header: false

git:
  strategy: github-flow       # github-flow | git-flow | trunk-based
  commit_style: conventional_commits

hooks:
  pre_task: ['npm run lint:fix']
  post_validation:
    - './scripts/notify-slack.sh "$TEDDY_TASK_ID $TEDDY_VALIDATION"'

mcp_servers:
  filesystem:
    command: npx
    args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp']

Full schema: teddy config show --schema.

Data overlays (no recompile)

Every previously-static table lives as a real JSON file under the shipped data/ directory and accepts overrides:

File What it controls
data/pricing.json USD per 1M tokens by provider/model
data/frameworks.json Framework detection + port inference
data/dependency-policy.json Heavy deps + duplicate-family groups
data/git-conventions.json Branch prefixes, commit types, hotfix marker, slug length

Override locations (project beats user beats bundled):

  • Project: .teddy/data/<name>.json
  • User: ~/.teddy/data/<name>.json
  • Bundled: ships with the package (read-only)

For pricing: teddy pricing show / set / refresh / path / reset — refresh from a URL or local JSON.


Validation

teddy validate runs all enabled scanners. The full quality gate (teddy validate --with-tests) additionally runs the test suite, type checker, and lint, and blocks on any error or critical finding.

teddy validate                  # scanners only
teddy validate --with-tests     # scanners + tests + typecheck + lint
teddy watch --on-change validate

Findings are written to .teddy/VALIDATION_REPORT.json and surfaced in the TUI's failures view.


State files

.teddy/
  PROJECT_STATE.json         # current phase + task + branch (atomic, locked, migrated)
  CONTEXT_MODEL.json         # structured project context
  SPRINTS.json               # sprints + tasks
  TASKS.json                 # flat task index
  VALIDATION_REPORT.json     # last quality-gate run
  RUN_LOG.jsonl              # append-only structured log (secrets redacted)
  TASK_HISTORY.jsonl         # append-only task lifecycle
  FAILURE_LOG.jsonl          # append-only failures
  PROVIDER_STATE.json        # provider-specific state
  MEMORY.md                  # cross-session memory
  data/                      # overlay tables
  cache/code-graph.json      # code-graph cache (mtime-fingerprinted)
  diffs/<yyyy-mm-dd>.diff    # daily diff log
  sessions/<id>.jsonl        # chat sessions
  transcripts/<task>.<n>.md  # per-task agent transcripts
  bg/<id>.{json,log}         # background runner manifests + logs
  undo.jsonl                 # destructive-op journal (last 100 entries)
  undo/<id>/                 # snapshots backing each entry
  corrupt/                   # quarantined corrupt state files

State is migrated automatically when an older PROJECT_STATE.json is loaded by a newer Teddy.


GitHub Action

- uses: ohswedd/teddy@v1.26.0
  with:
    command: validate

Pass any subcommand:

- uses: ohswedd/teddy@v1.26.0
  with:
    command: workflow
    args: '--max-iterations 50 --stop-on-failure'

Architecture

  • SOLID throughout. Every collaborator has one reason to change; the orchestrator only wires.
  • DIP: agents / skills / providers consume small interfaces, not concrete classes.
  • OCP: register new agents, providers, skills, gates, MCP servers, data overlays without editing existing code.
  • ACID-flavored persistence: atomic writes, schema validation, per-file locks, fsync, append-only journals, schema migrations.
  • Graceful shutdown: SIGINT / SIGTERM / uncaughtException flush logs and tear down resources (MCP children, watchers, locks) with a 5-second hard deadline.
  • Defense in depth: anti-scrap scanners, risk classifier + hard denylist on shell, autonomy gate on writes/destructive ops, undo journal on every destructive fs op, secret redaction in every log line, token + USD budget enforcement on every provider call.
  • AbortSignal end-to-end through every provider call so mid-LLM cancellation actually cancels.
  • Pure-function scanners so they can be tested standalone, composed in pipelines, or invoked from skills.

Development

git clone https://github.com/Ohswedd/teddy.git
cd teddy
npm install
npm run build
npm test                                     # 468 tests
npm run lint
node dist/cli/bin.js --quiet validate        # self anti-scrap scan

CI runs typecheck + lint + build + test + self-validation on every push and PR (Node 20 + 22).

See CONTRIBUTING.md for the contribution checklist.


License

MIT — see LICENSE.


Built with care. Opinions, bug reports, and feature requests welcome at github.com/Ohswedd/teddy/issues.

About

Teddy — your friendly AI engineering companion. Orchestrates real code: plans, develops, tests, validates, and versions in sandbox/Docker with anti-AI-scrap scanners, multi-agent collaboration, code graph, and Mermaid visualizations.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors