A self-evolving superintelligence harness substrate, providing a reliable, observable, and recoverable execution environment for agent systems.
A next-generation agent collaboration framework designed on the principles of System, Control, and Information (SCI) theory.
Unlike platforms driven by mere prompt stacking, this project is built on the rigorous foundation of SCI theory (Systematics, Cybernetics, and Informatics) to establish underlying system order:
- Systematics β Seeing the whole: A six-role lane separation architecture that enables higher-order coordination and emergence (1+1>2).
- Informatics β Understanding communication: Transactional fact chains, structured protocol communication, and memory curation.
- Cybernetics β Achieving goals: A Workflow Orchestrator driving two-stage routing and cybernetic feedback loops.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Moss Harness Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Systematics β Coordinator β Planner β Reviewer β Executor β Evaluator β
β β (with Memory Curator for cross-run curation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Informatics β Structured Protocols β Transactional Fact Chain β Memory β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Cybernetics β Workflow Orchestrator β Two-Stage Routing β Feedback Loop β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Observability β Execution Tracing β Evaluation β Analytics β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- ποΈ Four-Layer Architecture - Strict separation of Strategy (Governance), Harness (Orchestration/Substrate), App (Case Studies), and Observability.
- π€ Multi-Agent Lanes - 6 dedicated role lanes for specialized collaboration, eliminating single-agent hallucinations.
- π Two-Stage Routing - Coarse Intent classification + precise Policy Evaluation.
- π Feedback-Driven Loop - Review and evaluation results act as control logic to dynamically alter execution paths.
- π Transactional Claiming - "Facts before broadcast." Agents autonomously scan and claim work from the Task Board atomically.
- π§ Memory & Emergence System - Cross-session learning that automatically extracts reusable patterns and candidate expert agents.
- π Read-Only Observability - State changes are persisted as a fact chain, providing timeline-based Execution Tracing, Quality Evaluation, and Analytics.
- π MCP Integration - Standardized interfaces for external tools and systems.
- π Constraint Guardrails - 4-level constraint system (Hard/Soft/Guidelines/Preferences).
- π οΈ Skill System - Reusable agent capability modules and code rule checks.
Unlike traditional agent frameworks that rely on "experience-driven" design or flat capability lists, Moss Harness builds underlying order based on mature system theories:
| Dimension | Traditional Frameworks | Moss Harness |
|---|---|---|
| Architecture | Experience-driven, API wrappers | Architecture-first, SCI theory driven |
| Routing | Static single-path or LLM free-roam | Two-Stage Routing (Intent + Policy) |
| Quality Control | Simple pass/fail, or post-logs | Cybernetic Feedback, directly altering workflow |
| Task Claiming | Centralized Dispatch | Transactional Autonomous Claiming |
The six-role separation architecture ensures execution quality through strict permission and objective isolation:
Coordinator β Planner β Reviewer β Executor β Evaluator
β β
βββββββββββ Feedback ββββββ
Memory Curator operates across runs to curate knowledge
- Planner/Reviewer/Evaluator: Read-only permissions, focusing on requirements analysis, plan review, and objective assessment.
- Executor: Read/write and execution permissions, focusing on code implementation and testing.
- A Planner never evaluates its own plan; an Executor never reviews its own code.
Moss Harness is not a linear workflow; it is a cybernetic closed-loop system powered by the Workflow Orchestrator:
Role Lane β Orchestrator β Fact (persist) β Audit β Broadcast
β execution β Execution Tracing / Evaluation / Analytics
Feedback β quality / confidence β Reviewer / Evaluator
Memory & Governance β inform the next run
Correction must change the path, not just add a log entry. This is the core purpose of the Orchestrator residing in the Harness layer.
# Clone the repository
git clone https://github.com/cybernetix-lab/moss-harness.git
cd moss-harness
# Install dependencies and build
npm install
npm --prefix apps/mosscli run buildmosscli is the first terminal application case built on top of moss-harness:
# View help
node apps/mosscli/dist/cli/index.js --help
# Run a harness case application
node apps/mosscli/dist/cli/index.js run --goal "Implement user authentication"# Check task status
node apps/mosscli/dist/cli/index.js status
# Run execution tracing and evaluation analytics
node apps/mosscli/dist/cli/index.js trace
node apps/mosscli/dist/cli/index.js evaluateThe project is strictly organized following the four-layer architecture (Strategy -> Harness -> App -> Observability):
moss-harness/
βββ apps/ # [App Layer] Case applications
β βββ agent-cli/ # Legacy bash CLI application
β βββ mosscli/ # TypeScript CLI-first validation app
βββ configs/ # [Strategy Layer] Configuration center
β βββ agents/ # Agent template configurations
β βββ constraints/ # System constraints and tool policies
β βββ orchestration/ # Orchestration and lane configs
β βββ protocols/ # Structured communication protocols
β βββ rules/ # Rules and tool constraints
β βββ skills/ # Skill registry
β βββ telemetry/ # Telemetry configurations
βββ deployments/ # Infrastructure orchestration
β βββ docker/ # Docker Compose configurations
β βββ helm/ # Helm Charts configurations
β βββ k8s-operator/ # Kubernetes Operator and CRDs
βββ docs/ # Documentation and architecture specs
βββ evals/ # [Observability Layer] Evaluation framework and metrics
βββ integrations/ # [Harness Layer] MCP and external Skill extensions
β βββ extensions/ # Core extensions (e.g., Mailbox system)
β βββ mcp/ # MCP server definitions
β βββ skills/ # Skill definitions (React, Security, etc.)
βββ observability/ # [Observability Layer] Prometheus and Grafana configs
βββ runtime/ # [Harness Layer] TypeScript core implementation
β βββ agents/ # Role Agent implementations
β βββ context/ # Policies and context compaction
β βββ memory/ # Memory and curation system
β βββ orchestration/ # Workflow Orchestrator and routing
β βββ sandbox/ # Execution sandbox management
β βββ storage/ # Storage implementations (SQLite, Base)
β βββ subagent/ # Sub-agent registry and scheduler
β βββ telemetry/ # Telemetry collectors and metrics
βββ scripts/ # [Harness Layer] DevOps and management scripts
βββ src/ # [Harness Layer] Shell core source code (Legacy)
β βββ core/ # Legacy bash Workflow Orchestrator
β βββ agents/ # Legacy bash Role Agent implementations
β βββ memory/ # Legacy bash Memory system
βββ tests/ # Test cases (Bats/Jest)
Skills are reusable capability modules provided at the App and Harness layers, either directly mounted or via MCP:
# Activate a skill (Example)
./scripts/skill-activate.sh code-review| Skill Category | Associated Role | Description |
|---|---|---|
architecture-design |
Planner | Architecture patterns and technical solution output |
security-review |
Reviewer | Security checks for code and plans |
test-driven-dev |
Executor | TDD red-green cycle implementation |
knowledge-extraction |
Memory Curator | Extracting candidate expert traits from context |
This project adopts a six-role multi-agent architecture and curates Experts within each Lane.
| Role Category | Responsibility | Permissions | Expert Agent Example |
|---|---|---|---|
| Coordinator | Intent recognition, requirement clarification | Read-only | api_coordinator |
| Planner | Requirement analysis, task breakdown, design | Read-only | db_planner |
| Reviewer | Risk identification, plan evaluation, suggestions | Read-only | sec_reviewer |
| Executor | Code implementation, test writing, self-testing | Read/Write/Exec | frontend_executor |
| Evaluator | Quality assessment, requirement validation | Read/Test | perf_evaluator |
| Memory Curator | Context compression, knowledge archiving | Read/Exec | doc_curator |
User submits intent
β
Coordinator clarifies requirements and outputs Requirement Task
β
Planner claims requirement, outputs Execution Plan
β
Reviewer performs structured review (APPROVED / NEEDS_REVISION)
β
Executor claims and executes code implementation
β
Evaluator assesses implementation quality (PASS / NEEDS_IMPROVEMENT)
β
ββββββββββββ΄βββββββββββ
β β
PASS/EXCELLENT NEEDS_IMPROVEMENT
β β
Memory Curator (Orchestrator Routing)
curates knowledge Returns to Executor/Planner to fix
Based on the objective fact chain produced by the Evaluator, the system supports metrics for agents in each lane:
# Evaluate a single agent
./scripts/agent-eval.sh run planner
# View evaluation report
./scripts/agent-eval.sh report plannerWhen a pattern successfully recurs multiple times within the same lane, the Memory Curator records it as a Candidate Expert. After review, it is automatically promoted to a formal Expert Agent.
# Analyze agent performance and curated patterns
./scripts/agent-evolve.sh analyze plannerA cybernetics-based feedback loop that directly intervenes in the Workflow Orchestrator routing:
- Task Governance: Based on risk feedback from the Reviewer/Evaluator, dynamically decides whether to "proceed", "rework", or "circuit break".
- Learning Progression: Automatically breaks down collection, extraction, synthesis, and validation subtasks based on knowledge gaps.
- Coarse Intent Classification:
intent-classifierdecides which Policy family to use (Governance vs Learning). - Precise Policy Evaluation:
task-governanceorlearning-progressioncalculates the exact next stage (e.g.,NEEDS_REVISIONβ route back to Planner).
export MOSS_AGENT=planner # Activate a specific role agent
export MOSS_PERMISSION_LEVEL=strict # Constraint level
export MOSS_TELEMETRY_ENABLED=true # Enable telemetry and observabilityModel configurations are located in configs/agents/ or config/models.yaml:
model:
provider: anthropic
model: claude-3-5-sonnet
temperature: 0.2
max_tokens: 4096You can easily route tasks to local, privacy-preserving models using the built-in local profile (which uses the OpenAI-compatible API format):
# In configs/agents/models.yaml
agent_models:
executor:
profile: local
profiles:
local:
provider: openai-compatible
base_url: http://localhost:11434/v1 # Example for Ollama
model: qwen2.5-coder:7bSystem rules are not just prompts; they are hard-coded governance strategies:
- Level 4 (Hard): Unauthorized sandbox execution isolation.
- Level 3 (Soft): Fact chain protocols that must be structurally persisted.
- ARCHITECTURE.md - Deeper system shape, runtime logic, and four-layer boundaries.
- docs/design-philosophy.md - Why SCI theory matters to the architecture.
- docs/agent-collaboration.md - Agent collaboration workflow instructions.
- apps/mosscli/README.md -
mossclias an app-layer case application. - CONTRIBUTING.md - Engineering conventions for contributing to the substrate.
We welcome all forms of contribution! Please follow the Architecture-First principle and check CONTRIBUTING.md to learn how to participate in building the substrate.
This project is licensed under the MIT License.
- Design philosophy inspired by Systematics, Cybernetics, and Informatics (SCI theory).
- Architecture design inspired by executive control and learning memory mechanisms in neuroscience.