AgentPilot MCP Server

Infrastructure Moat First — the stateful task engine + AI-native tool registry + Model Router for MCP

AgentPilot is an MCP (Model Context Protocol) server that gives AI agents persistent, auditable task infrastructure. It stores goals as task trees in SQLite, enforces valid state transitions via a deterministic state machine, provides full-text tool search through FTS5, and routes tasks to optimal models through a zero-LLM keyword-based rule engine. No embedded LLM — the server's value is infrastructure that no model upgrade can absorb.

Design Philosophy: Model-layer features (prompt engineering, reflection heuristics, auto-decomposition) are absorbed by Anthropic/OpenAI within 6 months. Infrastructure-layer features (persistent state, database triggers, state machine enforcement, full-text search, rule engine routing) are structurally defensible. The server provides the reliable substrate; AI agents bring their own intelligence.

Quick Start

git clone https://github.com/anomalyco/agent-pilot.git
cd agent-pilot
npm install
npm run build

The server runs on stdio. On first launch it auto-creates the SQLite database (data/orchestrator.db), initializes schema (tasks, tools, audit_log, snapshots, archived_tasks, model_config), and seeds the default model router plan (Plan B — China models).

npm start

Full API Reference

All 30 tools. Every tool returns { error: string } on failure. Schema validation via Zod.

Phase 1: Core Engine

Tool	Description	Key Inputs	Returns
`task_plan`	Create a task plan — stores goal + decomposed subtasks as a tree in SQLite. Subtasks with `depends_on` start as `blocked`.	`goal`, `subtasks[]`, `context?`	`root_task`, `subtasks[]`
`task_next`	Get the next actionable task. Two modes: `executor` (pick pending with deps met) or `reviewer` (pick `pending_review`).	`task_id?`, `mode?`, `include_blocked?`	`next_task` (nullable), `summary`
`task_update`	Transition a task to a new status. Enforces valid transitions (see state machine). Auto-cascades parent completion when all siblings are done.	`task_id`, `status`, `result?`, `error?`, `review_comment?`, `tool_name?`	`task`, `parent_auto_completed?`, `parent_status?`
`task_reflect`	Gather execution history + heuristic suggestions (retry advice, review queue, blocked deps, success rate warnings). No LLM.	`task_id`, `goal?`	`task_tree[]`, `summary`, `suggestions[]`
`task_status`	View a task tree in `flat` or `tree` format. Defaults to latest root.	`task_id?`, `format?`	`root`, `tasks` (flat or nested tree)

Phase 2: Task Lifecycle

Tool	Description	Key Inputs	Returns
`task_cancel`	Cancel a task and optionally cascade to all descendants. Only cancellable: `pending`, `in_progress`, `failed`, `blocked`.	`task_id`, `cascade?`, `reason?`	`cancelled`, `cascaded_count`, `reason`
`task_export`	Export a task tree recursively to JSON (nested tree) or CSV (flat rows). Creates parent directories automatically.	`task_id`, `filepath?`, `format?`	`exported_to`, `task_count`
`task_import`	Import a task tree from a JSON file (produced by `task_export`). Re-IDs all nodes, remaps `parent_id` and `depends_on`. Transactional.	`filepath`	`root_id`, `imported_count`
`task_duplicate`	Deep-clone an existing task tree. All tasks reset to `pending`, new UUIDs, remapped relationships.	`task_id`	`original_id`, `new_root_id`, `duplicated_count`
`task_batch_update`	Update multiple tasks atomically. Validates each transition individually — partial success allowed.	`task_ids[]`, `status`, `result?`, `error?`, `review_comment?`	`succeeded`, `failed`, `failures[]`, `cascaded_parent_count?`
`task_dependency_graph`	Visualize a task tree's dependency graph. Formats: `mermaid` (flowchart), `ascii` (text tree), `json` (nodes + edges).	`task_id?`, `format?`	`graph` (format-dependent), `format`

Phase 3: Tool Registry

Tool	Description	Key Inputs	Returns
`tool_register`	Register a new tool in the AI-native tool registry. Syncs to FTS5 for full-text search. Platform growth mechanism.	`name`, `description`, `schema`, `provider`, `tags?`	Registered tool object
`tool_search`	Search the tool registry with natural language via FTS5 full-text search. Returns relevance-ranked results. Falls back to LIKE if FTS5 unavailable.	`query`, `limit?`, `tags?`	`results[]` (with `relevance_score`)
`tool_update`	Update tool metadata. Only provided fields are changed. Syncs FTS5 index on description/tag changes.	`name`, `description?`, `schema?`, `provider?`, `tags?`	Updated tool object
`tool_deprecate`	Deprecate a tool. Prefixes description with `[DEPRECATED]`, removes from FTS5 index. Data preserved in DB.	`name`, `replacement?`	Deprecated tool info
`tool_export`	Export all registered tools to a JSON file. Optionally include deprecated tools.	`filepath?`, `include_deprecated?`	`exported_to`, `tool_count`
`tool_import`	Bulk import tools from a JSON array. Modes: `merge` (upsert) or `replace` (clear first). Syncs FTS5.	`filepath`, `mode?`	`imported`, `skipped`, `errors[]`
`tool_stats`	Registry statistics: total/deprecated/active counts, per-provider breakdown, top tags, recently added. Read-only.	(none)	`total`, `deprecated`, `active`, `per_provider[]`, `top_tags[]`, `recently_added[]`

Phase 4: Quality & Accountability

Tool	Description	Key Inputs	Returns
`task_audit_log`	Retrieve task status change history. Powered by SQLite triggers (`AFTER UPDATE OF status`). Filter by task_id or browse globally.	`task_id?`, `limit?`	`entries[]` (old_status, new_status, changed_at)
`task_metrics`	Aggregate metrics across ALL task trees: avg completion rate, avg time, most failed titles, avg retries, review ratio, status breakdown. Read-only.	(none)	Full metrics object (see types.ts)
`task_rollback`	Revert a task to its previous status using the audit log. Cannot rollback terminal states (`completed`, `cancelled`). Inserts rollback audit entry.	`task_id`	`task_id`, `rolled_back_from`, `rolled_back_to`
`task_snapshot`	Create a point-in-time snapshot of an entire task tree. Stores full recursive JSON in the `snapshots` table. Git-like versioning for task trees.	`task_id`, `label?`	`snapshot_id`, `task_count`, `created_at`

Phase 5: Data Lifecycle

Tool	Description	Key Inputs	Returns
`task_archive`	Archive a completed/approved task tree. Moves to `archived_tasks`, cleans up snapshots + audit log + depends_on refs. Cascade required for non-leaf nodes.	`task_id`, `cascade?`	`archived`, `title`, `subtrees_archived`, `archived_at`
`task_archive_list`	List archived tasks with optional status filter and pagination.	`status?`, `limit?`, `offset?`	`tasks[]`, `total`
`task_archive_restore`	Restore an archived task tree back to active tasks. Validates parent/dep references, fails on ID conflicts.	`task_id`, `cascade?`	`restored`, `title`, `subtrees_restored`

Phase 5.5: Model Router

Tool	Description	Key Inputs	Returns
`model_classify`	Classify a task description into 1 of 8 categories via keyword-based rule engine. Zero LLM dependency. Returns confidence + all category scores.	`task_description`	`category`, `category_name`, `confidence`, `matched_keywords[]`, `all_scores[]`
`model_route`	Route a task to the best model. Classifies internally, then matches against available models using active plan presets. Falls back gracefully.	`task_description`, `available_models?`, `plan_preset?`	`model`, `category`, `confidence`, `reason`, `estimated_cost_per_1M`, `plan_used`
`model_config`	Manage model routing configuration. Actions: `list` (view active plan + mappings), `switch` (change plan), `set` (override category model), `reset` (restore defaults). Pure CRUD.	`action`, `plan?`, `category?`, `primary_model?`, `fallback_model?`	Depends on action (see types.ts)

Phase 7: System

Tool	Description	Key Inputs	Returns
`system_info`	Database infrastructure stats: table row counts, DB file size, WAL status, journal mode, checkpoint info. Read-only introspection.	(none)	`total_tasks`, `total_tools`, `total_snapshots`, `total_audit_log_entries`, `database_file_size_bytes`, `journal_mode`, `wal_exists`, `shm_exists`, `table_row_counts`
`data_integrity_check`	Scan the database for 10 categories of issues: invalid statuses, orphan parents, broken depends_on refs, circular dependencies, retry overflow, orphan snapshots, etc. Optionally auto-repair.	`repair?`	`issues_found`, `issues[]`, `healthy`, `repair_applied?`

State Machine

The state machine enforces valid transitions on every task_update and task_batch_update call. Invalid transitions return a clear error with the list of allowed next states.

stateDiagram-v2
  [*] --> pending
  pending --> in_progress
  pending --> cancelled
  pending --> completed
  in_progress --> pending_review
  in_progress --> completed
  in_progress --> failed
  in_progress --> blocked
  in_progress --> cancelled
  pending_review --> approved
  pending_review --> needs_revision
  approved --> completed
  needs_revision --> pending
  failed --> pending
  failed --> cancelled
  blocked --> pending
  blocked --> cancelled
  blocked --> completed
  completed --> [*]
  cancelled --> [*]

Key behaviors:

completed and cancelled are terminal — no further transitions allowed
needs_revision bumps retry_count and routes back to pending (Agent A re-executes)
Parent auto-cascade: when all siblings reach completed/approved, the parent auto-transitions to completed recursively up the chain
blocked tasks auto-resolve to pending when all dependencies complete
Subtasks with depends_on specified at plan creation start as blocked

Dual-Review Workflow

AgentPilot supports a two-agent review pattern without any LLM coordination:

Agent A (Executor): task_next(mode=executor) picks the next pending task with dependencies met
Agent A executes the task, then: task_update(task_id, status=pending_review, result=...)
Agent B (Reviewer): task_next(mode=reviewer) picks tasks waiting for review (sorted by priority)
Agent B reviews and decides:
- Approve: task_update(task_id, status=approved) — cascades to completed
- Reject: task_update(task_id, status=needs_revision, review_comment=...) — bumps retry_count, routes back to Agent A's queue
Auto-cascade: when all subtasks of a parent are approved/completed, the parent auto-completes

This is enforced entirely by the state machine — no agent-to-agent communication, no shared memory, no LLM coordination needed. The database is the coordination layer.

Model Router

A zero-LLM keyword-based rule engine that classifies tasks into 8 categories and routes them to optimal models per plan. No embedded LLM — pure deterministic matching.

Classification Engine (`model_classify`)

8 categories with Chinese keyword matching:

Category	Name	Example Keywords
`high_logic`	高邏輯推理	分析, 評估, 推理, 策略, 規劃
`code_generation`	代碼生成	代碼, 編程, debug, Vibe Coding
`creative_writing`	創意寫作	創作, 故事, 文案, 寫作
`info_reading`	資訊閱讀	總結, 摘要, 翻譯, PDF, 閱讀
`simple_repeat`	簡單重複	格式化, 批量, 轉換, 整理
`image_understanding`	圖片理解	圖片, 截圖, OCR, 識別
`image_generation`	圖片生成	畫, 設計, 海報, 文生圖
`video_audio`	影片/音頻	影片, 音頻, 字幕, 語音

Weighted keyword matching + input pattern matching + output type matching. Fallback to high_logic on low confidence (<0.2) or if no keywords match.

Plan A: International Models

Category	Primary	Fallback	Output Cost/1M
high_logic	claude-opus-4-7	claude-sonnet-4-6	$75 / $15
code_generation	claude-sonnet-4-6	gpt-5-4	$15 / $15
creative_writing	minimax-m2-5	claude-sonnet-4-6	$1.20 / $15
info_reading	gemini-2-5-flash-lite	deepseek-v4-flash	$0.40 / $0.28
simple_repeat	deepseek-v4-flash	gemini-2-5-flash-lite	$0.28 / $0.40
image_understanding	claude-sonnet-4-6	gemini-3-1-pro	$15 / $12
image_generation	dall-e-4	stable-diffusion-4	~$0.04-0.12/img
video_audio	gemini-3-1-pro	gpt-5-4	$12 / $15

Strategy: Flagship models for complex reasoning + code; cheap models for reading + repeat tasks. Blend cost ~30-50% of always-strongest.

Plan B: China Models (Default)

Category	Primary	Fallback	Output Cost/1M
high_logic	deepseek-v4-pro	glm-5-1	¥6 / ¥24
code_generation	deepseek-v4-pro	kimi-k2-5	¥6 / free
creative_writing	minimax-m2-5	deepseek-v4-pro	$1.20 / ¥6
info_reading	deepseek-v4-flash	glm-4-flash	¥2 / ¥1
simple_repeat	deepseek-v4-flash	glm-4-flash	¥2 / ¥1
image_understanding	glm-4-6v	minimax-vl-01	¥2
image_generation	cogview4	doubao-seed-2	~¥0.1-0.3/img
video_audio	kling-2-6	glm-4-voice	usage-based

Strategy: Domestic models cost 1-2 orders of magnitude less than international. Use flagships aggressively. DeepSeek Pro at ¥6/1M (2.5折 promo through May 31, 2026).

Plan C: GLM Lazy Mode

All GLM series — one ecosystem, no thinking required:

Category	Primary	Fallback	Output Cost/1M
high_logic	glm-5-1	glm-5	¥12
code_generation	glm-5-1	glm-5	¥12
creative_writing	glm-5-1	glm-5	¥12
info_reading	glm-4-flash	glm-5-1	¥1
simple_repeat	glm-4-flash	glm-5-1	¥1
image_understanding	glm-4-6v	glm-4v-flash	¥2
image_generation	cogview4	glm-image	~¥0.1-0.3/img
video_audio	glm-4-voice	glm-4-vision	usage-based

Auto-selects model variant by task complexity: glm-4-flash for quick, glm-5 for standard, glm-5-1 for complex.

How to Switch Plans

// Switch to Plan A (International)
{ "action": "switch", "plan": "A" }

// Switch to Plan C (GLM Lazy)
{ "action": "switch", "plan": "C" }

// View current config
{ "action": "list" }

// Override a specific category
{ "action": "set", "category": "high_logic", "primary_model": "openai/gpt-5-4" }

// Restore plan defaults
{ "action": "reset" }

Cost Savings Estimate

Scenario	Always Strongest	Dynamic Routing	Savings
Plan A: 60% reading tasks (Gemini $0.40 vs Claude $15)	$15/M tokens	~$5/M tokens	~67%
Plan B: Reading + repeat on Flash (¥2 vs Pro ¥6)	¥6/M tokens	~¥2.5/M tokens	~58%
Plan C: Quick tasks on Flash (¥1 vs ¥12 GLM-5.1)	¥12/M tokens	~¥3/M tokens	~75%

Dynamic routing reduces blended cost by ~60% vs always routing to the strongest model. The classifier's deterministic matching means no LLM inference cost is spent on the routing decision itself.

Architecture: The Infrastructure Moat

These four pillars make AgentPilot structurally defensible against model upgrades:

1. SQLite — Persistent State

Task state survives model restarts. Tasks, statuses, results, dependencies, and history all live in a single-file WAL-mode SQLite database.
Full audit trail via triggers. AFTER UPDATE OF status ON tasks automatically inserts into audit_log. Every status change is permanently recorded with old/new values and timestamps.
WAL mode enables concurrent readers during writes — multiple AI agents can query task state simultaneously.
Recursive CTEs power tree operations (walking subtrees, finding descendants, dependency resolution) as single queries.
Model upgrades cannot replicate this — they have no persistent storage layer. Every model session starts fresh. The database accumulates institutional memory over time.

2. State Machine — Enforced Valid Transitions

9 states, 17 transitions. Every task_update call validates the transition against VALID_TRANSITIONS before execution. Invalid transitions (e.g., completed -> in_progress) are rejected with a clear error message listing allowed next states.
Parent auto-cascade. When all siblings reach completed or approved, the parent auto-completes recursively. This is built on database queries, not LLM reasoning loops.
Dual-review states (pending_review, approved, needs_revision) encode a two-agent quality gate directly in the state machine.
Models cannot enforce this — they can suggest transitions but have no mechanism to validate or reject them against a consistent schema. The state machine provides a single source of truth for what can happen next.

3. FTS5 — Full-Text Tool Search

Tools registered in the tools table are auto-indexed in a virtual tools_fts table using SQLite's FTS5 engine.
Natural language queries (tool_search query="code generation with Python") return relevance-ranked results with normalized scores (0-1).
Tag filtering intersects FTS5 results with category constraints.
Model upgrades cannot create this — FTS5 is a compiled C extension built into SQLite. It provides tokenization, stemming, BM25 ranking, and phrase matching that would require a custom search index to replicate. Models have no indexed query capability.

4. Rule Engine — Zero-LLM Classification

model_classify uses 8 categories x 3 keyword layers (keywords, input patterns, output types) with weighted scoring. Zero API calls, zero latency beyond CPU.
Deterministic, auditable, reproducible. Given the same input, the classifier always produces the same output.
Models cannot replace this for routing — using an LLM to decide which LLM to call creates a bootstrapping problem: you need to spend inference cost and latency before you even know which model to route to. The rule engine eliminates this dilema entirely.

Register in Claude Code

Add to your Claude Code MCP configuration (~/.claude/claude_desktop_config.json or project .mcp.json):

{
  "mcpServers": {
    "agent-pilot": {
      "command": "node",
      "args": ["/home/administrator/agent-pilot/dist/index.js"],
      "env": {}
    }
  }
}

Register in OpenCode

Add to your OpenCode configuration (opencode.json or ~/.config/opencode/opencode.json):

{
  "mcpServers": {
    "agent-pilot": {
      "command": "node",
      "args": ["/home/administrator/agent-pilot/dist/index.js"]
    }
  }
}

Restart Claude Code / OpenCode after adding the configuration. Run system_info as a smoke test to verify the connection and database initialization.

Database

Path: data/orchestrator.db (auto-created on first run)
Tables: tasks, tools, tools_fts (FTS5), audit_log, snapshots, archived_tasks, model_config, app_config
Triggers: trg_tasks_status_audit — automatic audit logging on every status change
Journal mode: WAL (Write-Ahead Logging) for concurrent read/write
First-run seeding: Plan B model config (China models) loaded from src/config/plan-b.json

Project Structure

src/
  index.ts            # MCP server entry point, tool registration (ListTools + CallTool)
  types.ts            # Zod schemas, TypeScript types, MCP response helper
  db.ts               # SQLite initialization, schema, FTS5, triggers, seeding
  state-machine.ts    # VALID_TRANSITIONS, task_next, task_update, parent cascade
  planner.ts          # task_plan, task_status (tree/flat viewer)
  reflector.ts        # task_reflect — heuristic suggestions engine
  model-router.ts     # model_classify, model_route, model_config — rule engine + routing
  tool-registry.ts    # tool_register, tool_search, tool_update, tool_deprecate, tool_stats
  canceller.ts        # task_cancel with cascade
  exporter.ts         # task_export — JSON/CSV export
  importer.ts         # task_import — JSON import with ID remapping
  duplicator.ts       # task_duplicate — deep clone with reset
  batch-updater.ts    # task_batch_update — partial success
  dependency-graph.ts # task_dependency_graph — Mermaid/ASCII/JSON
  audit-log.ts        # task_audit_log — query SQLite trigger-audited history
  archiver.ts         # task_archive, task_archive_list, task_archive_restore
  metrics.ts          # task_metrics — aggregate stats across all trees
  rollback.ts         # task_rollback — revert via audit log
  snapshot.ts         # task_snapshot — Git-like versioning
  tool-export.ts      # tool_export — registry export to JSON
  tool-import.ts      # tool_import — bulk import with merge/replace
  integrity.ts        # data_integrity_check — 10-category scan + auto-repair
  system-info.ts      # system_info — DB introspection
  config/
    plan-a.json       # International models
    plan-b.json       # China models (default)
    plan-c.json       # GLM Lazy mode
test/
  test-all.ts         # End-to-end MCP test suite

Development

npm run build    # TypeScript compile + copy plan configs
npm run dev      # Build + start
npm test         # Run e2e test suite

Build output goes to dist/. Plan configs are copied from src/config/ to dist/config/ during build.

Version: 1.0.0 | License: MIT | Database: SQLite (WAL) | Model Router: Plan A/B/C (May 2026 models)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
dist		dist
logs		logs
node_modules		node_modules
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
build-all.sh		build-all.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AgentPilot MCP Server

Quick Start

Full API Reference

Phase 1: Core Engine

Phase 2: Task Lifecycle

Phase 3: Tool Registry

Phase 4: Quality & Accountability

Phase 5: Data Lifecycle

Phase 5.5: Model Router

Phase 7: System

State Machine

Dual-Review Workflow

Model Router

Classification Engine (model_classify)

Plan A: International Models

Plan B: China Models (Default)

Plan C: GLM Lazy Mode

How to Switch Plans

Cost Savings Estimate

Architecture: The Infrastructure Moat

1. SQLite — Persistent State

2. State Machine — Enforced Valid Transitions

3. FTS5 — Full-Text Tool Search

4. Rule Engine — Zero-LLM Classification

Register in Claude Code

Register in OpenCode

Database

Project Structure

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Classification Engine (`model_classify`)

Packages