Skip to content

cybernetix-lab/moss-harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Moss Harness

License: MIT

A self-evolving superintelligence harness substrate, providing a reliable, observable, and recoverable execution environment for agent systems.

A next-generation agent collaboration framework designed on the principles of System, Control, and Information (SCI) theory.

English | δΈ­ζ–‡


✨ Core Features

Scientific Design Philosophy

Unlike platforms driven by mere prompt stacking, this project is built on the rigorous foundation of SCI theory (Systematics, Cybernetics, and Informatics) to establish underlying system order:

  • Systematics β€” Seeing the whole: A six-role lane separation architecture that enables higher-order coordination and emergence (1+1>2).
  • Informatics β€” Understanding communication: Transactional fact chains, structured protocol communication, and memory curation.
  • Cybernetics β€” Achieving goals: A Workflow Orchestrator driving two-stage routing and cybernetic feedback loops.
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Moss Harness Architecture                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Systematics β”‚ Coordinator β†’ Planner β†’ Reviewer β†’ Executor β†’ Evaluator   β”‚
β”‚             β”‚ (with Memory Curator for cross-run curation)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Informatics β”‚ Structured Protocols β”‚ Transactional Fact Chain β”‚ Memory  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Cybernetics β”‚ Workflow Orchestrator β”‚ Two-Stage Routing β”‚ Feedback Loop β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Observability β”‚ Execution Tracing β”‚ Evaluation β”‚ Analytics              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Capabilities

  • πŸ›οΈ Four-Layer Architecture - Strict separation of Strategy (Governance), Harness (Orchestration/Substrate), App (Case Studies), and Observability.
  • πŸ€– Multi-Agent Lanes - 6 dedicated role lanes for specialized collaboration, eliminating single-agent hallucinations.
  • πŸ”€ Two-Stage Routing - Coarse Intent classification + precise Policy Evaluation.
  • πŸ”„ Feedback-Driven Loop - Review and evaluation results act as control logic to dynamically alter execution paths.
  • πŸ“‹ Transactional Claiming - "Facts before broadcast." Agents autonomously scan and claim work from the Task Board atomically.
  • 🧠 Memory & Emergence System - Cross-session learning that automatically extracts reusable patterns and candidate expert agents.
  • πŸ“Š Read-Only Observability - State changes are persisted as a fact chain, providing timeline-based Execution Tracing, Quality Evaluation, and Analytics.
  • πŸ”Œ MCP Integration - Standardized interfaces for external tools and systems.
  • πŸ”’ Constraint Guardrails - 4-level constraint system (Hard/Soft/Guidelines/Preferences).
  • πŸ› οΈ Skill System - Reusable agent capability modules and code rule checks.

🎯 Why Choose Moss Harness?

1. Scientific Foundation

Unlike traditional agent frameworks that rely on "experience-driven" design or flat capability lists, Moss Harness builds underlying order based on mature system theories:

Dimension Traditional Frameworks Moss Harness
Architecture Experience-driven, API wrappers Architecture-first, SCI theory driven
Routing Static single-path or LLM free-roam Two-Stage Routing (Intent + Policy)
Quality Control Simple pass/fail, or post-logs Cybernetic Feedback, directly altering workflow
Task Claiming Centralized Dispatch Transactional Autonomous Claiming

2. Preventing Optimism Bias

The six-role separation architecture ensures execution quality through strict permission and objective isolation:

Coordinator β†’ Planner β†’ Reviewer β†’ Executor β†’ Evaluator
                 ↑                         β”‚
                 └────────── Feedback β”€β”€β”€β”€β”€β”˜
      Memory Curator operates across runs to curate knowledge
  • Planner/Reviewer/Evaluator: Read-only permissions, focusing on requirements analysis, plan review, and objective assessment.
  • Executor: Read/write and execution permissions, focusing on code implementation and testing.
  • A Planner never evaluates its own plan; an Executor never reviews its own code.

3. A Truly Closed-Loop Orchestrator

Moss Harness is not a linear workflow; it is a cybernetic closed-loop system powered by the Workflow Orchestrator:

Role Lane β†’ Orchestrator β†’ Fact (persist) β†’ Audit β†’ Broadcast
            ↓ execution   ↓ Execution Tracing / Evaluation / Analytics
            Feedback ← quality / confidence ← Reviewer / Evaluator
            Memory & Governance β†’ inform the next run

Correction must change the path, not just add a log entry. This is the core purpose of the Orchestrator residing in the Harness layer.


πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/cybernetix-lab/moss-harness.git
cd moss-harness

# Install dependencies and build
npm install
npm --prefix apps/mosscli run build

Basic Usage (App Layer: mosscli)

mosscli is the first terminal application case built on top of moss-harness:

# View help
node apps/mosscli/dist/cli/index.js --help

# Run a harness case application
node apps/mosscli/dist/cli/index.js run --goal "Implement user authentication"

Running Observability

# Check task status
node apps/mosscli/dist/cli/index.js status

# Run execution tracing and evaluation analytics
node apps/mosscli/dist/cli/index.js trace
node apps/mosscli/dist/cli/index.js evaluate

πŸ“ Project Structure

The project is strictly organized following the four-layer architecture (Strategy -> Harness -> App -> Observability):

moss-harness/
β”œβ”€β”€ apps/                      # [App Layer] Case applications
β”‚   β”œβ”€β”€ agent-cli/             # Legacy bash CLI application
β”‚   └── mosscli/               # TypeScript CLI-first validation app
β”œβ”€β”€ configs/                   # [Strategy Layer] Configuration center
β”‚   β”œβ”€β”€ agents/                # Agent template configurations
β”‚   β”œβ”€β”€ constraints/           # System constraints and tool policies
β”‚   β”œβ”€β”€ orchestration/         # Orchestration and lane configs
β”‚   β”œβ”€β”€ protocols/             # Structured communication protocols
β”‚   β”œβ”€β”€ rules/                 # Rules and tool constraints
β”‚   β”œβ”€β”€ skills/                # Skill registry
β”‚   └── telemetry/             # Telemetry configurations
β”œβ”€β”€ deployments/               # Infrastructure orchestration
β”‚   β”œβ”€β”€ docker/                # Docker Compose configurations
β”‚   β”œβ”€β”€ helm/                  # Helm Charts configurations
β”‚   └── k8s-operator/          # Kubernetes Operator and CRDs
β”œβ”€β”€ docs/                      # Documentation and architecture specs
β”œβ”€β”€ evals/                     # [Observability Layer] Evaluation framework and metrics
β”œβ”€β”€ integrations/              # [Harness Layer] MCP and external Skill extensions
β”‚   β”œβ”€β”€ extensions/            # Core extensions (e.g., Mailbox system)
β”‚   β”œβ”€β”€ mcp/                   # MCP server definitions
β”‚   └── skills/                # Skill definitions (React, Security, etc.)
β”œβ”€β”€ observability/             # [Observability Layer] Prometheus and Grafana configs
β”œβ”€β”€ runtime/                   # [Harness Layer] TypeScript core implementation
β”‚   β”œβ”€β”€ agents/                # Role Agent implementations
β”‚   β”œβ”€β”€ context/               # Policies and context compaction
β”‚   β”œβ”€β”€ memory/                # Memory and curation system
β”‚   β”œβ”€β”€ orchestration/         # Workflow Orchestrator and routing
β”‚   β”œβ”€β”€ sandbox/               # Execution sandbox management
β”‚   β”œβ”€β”€ storage/               # Storage implementations (SQLite, Base)
β”‚   β”œβ”€β”€ subagent/              # Sub-agent registry and scheduler
β”‚   └── telemetry/             # Telemetry collectors and metrics
β”œβ”€β”€ scripts/                   # [Harness Layer] DevOps and management scripts
β”œβ”€β”€ src/                       # [Harness Layer] Shell core source code (Legacy)
β”‚   β”œβ”€β”€ core/                  # Legacy bash Workflow Orchestrator
β”‚   β”œβ”€β”€ agents/                # Legacy bash Role Agent implementations
β”‚   └── memory/                # Legacy bash Memory system
└── tests/                     # Test cases (Bats/Jest)

πŸ› οΈ Skill System

Skills are reusable capability modules provided at the App and Harness layers, either directly mounted or via MCP:

# Activate a skill (Example)
./scripts/skill-activate.sh code-review

Built-in Capability Directions

Skill Category Associated Role Description
architecture-design Planner Architecture patterns and technical solution output
security-review Reviewer Security checks for code and plans
test-driven-dev Executor TDD red-green cycle implementation
knowledge-extraction Memory Curator Extracting candidate expert traits from context

πŸ€– Agent Role Model

This project adopts a six-role multi-agent architecture and curates Experts within each Lane.

Role Category Responsibility Permissions Expert Agent Example
Coordinator Intent recognition, requirement clarification Read-only api_coordinator
Planner Requirement analysis, task breakdown, design Read-only db_planner
Reviewer Risk identification, plan evaluation, suggestions Read-only sec_reviewer
Executor Code implementation, test writing, self-testing Read/Write/Exec frontend_executor
Evaluator Quality assessment, requirement validation Read/Test perf_evaluator
Memory Curator Context compression, knowledge archiving Read/Exec doc_curator

Workflow

User submits intent
    ↓
Coordinator clarifies requirements and outputs Requirement Task
    ↓
Planner claims requirement, outputs Execution Plan
    ↓
Reviewer performs structured review (APPROVED / NEEDS_REVISION)
    ↓
Executor claims and executes code implementation
    ↓
Evaluator assesses implementation quality (PASS / NEEDS_IMPROVEMENT)
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     β”‚
PASS/EXCELLENT    NEEDS_IMPROVEMENT
β”‚                     β”‚
Memory Curator    (Orchestrator Routing)
curates knowledge   Returns to Executor/Planner to fix

πŸ“Š Evaluation & Emergence

Agent Evaluation

Based on the objective fact chain produced by the Evaluator, the system supports metrics for agents in each lane:

# Evaluate a single agent
./scripts/agent-eval.sh run planner

# View evaluation report
./scripts/agent-eval.sh report planner

Pattern Extraction & Evolution

When a pattern successfully recurs multiple times within the same lane, the Memory Curator records it as a Candidate Expert. After review, it is automatically promoted to a formal Expert Agent.

# Analyze agent performance and curated patterns
./scripts/agent-evolve.sh analyze planner

πŸ”¬ Feedback-Driven Loop

A cybernetics-based feedback loop that directly intervenes in the Workflow Orchestrator routing:

Core Mechanisms

  • Task Governance: Based on risk feedback from the Reviewer/Evaluator, dynamically decides whether to "proceed", "rework", or "circuit break".
  • Learning Progression: Automatically breaks down collection, extraction, synthesis, and validation subtasks based on knowledge gaps.

Two-Stage Routing

  1. Coarse Intent Classification: intent-classifier decides which Policy family to use (Governance vs Learning).
  2. Precise Policy Evaluation: task-governance or learning-progression calculates the exact next stage (e.g., NEEDS_REVISION β†’ route back to Planner).

βš™οΈ Configuration

Environment Variables

export MOSS_AGENT=planner            # Activate a specific role agent
export MOSS_PERMISSION_LEVEL=strict  # Constraint level
export MOSS_TELEMETRY_ENABLED=true       # Enable telemetry and observability

Model Configuration

Model configurations are located in configs/agents/ or config/models.yaml:

model:
  provider: anthropic
  model: claude-3-5-sonnet
  temperature: 0.2
  max_tokens: 4096

Local Models (Ollama, vLLM, LM Studio)

You can easily route tasks to local, privacy-preserving models using the built-in local profile (which uses the OpenAI-compatible API format):

# In configs/agents/models.yaml
agent_models:
  executor:
    profile: local

profiles:
  local:
    provider: openai-compatible
    base_url: http://localhost:11434/v1  # Example for Ollama
    model: qwen2.5-coder:7b

Governance Constraints

System rules are not just prompts; they are hard-coded governance strategies:

  • Level 4 (Hard): Unauthorized sandbox execution isolation.
  • Level 3 (Soft): Fact chain protocols that must be structurally persisted.

πŸ“ Docs Map


🀝 Contributing

We welcome all forms of contribution! Please follow the Architecture-First principle and check CONTRIBUTING.md to learn how to participate in building the substrate.


πŸ“„ License

This project is licensed under the MIT License.


πŸ™ Acknowledgments

  • Design philosophy inspired by Systematics, Cybernetics, and Informatics (SCI theory).
  • Architecture design inspired by executive control and learning memory mechanisms in neuroscience.

About

A production-grade AI Agent Harness engineering template providing a reliable, observable, and recoverable Agent runtime environment.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors