MorphSAT is a testbed for studying when a local LLM agent should stop gathering evidence and commit to an action.
An LLM agent with tool access can loop indefinitely — calling tools, reading results, calling more tools — without ever deciding. Or it can commit prematurely on insufficient evidence. MorphSAT wraps the agent's loop in a structured cognitive control stack — an external state machine that accumulates evidence, tracks posture, and holds decision authority. The model proposes actions; the gate decides when and how to commit.
The current checkpoint (v9) adds a dual-agent recomputation gate — two independent agents analyze the same alert, and disagreement routes to human review. On adversarial scenarios designed to fool single agents, disagreement precision is 100% (every disagreement involved a real error). The underlying single-agent loop (v8.3) achieves 100% accuracy on the standard 20-scenario benchmark.
For cognitive architecture researchers: See
docs/COGNITIVE_ARCHITECTURE_TRANSLATION.mdfor a term mapping to Soar, ACT-R, and active inference. Seedocs/morphsat_technical_note.mdfor the 2-page technical note with full results.
Nine versions tested on a 20-scenario security alert triage benchmark (Qwen2.5-Coder-7B, temperature 0, simulated tools):
| Version | Mechanism | Accuracy | Key finding |
|---|---|---|---|
| v1 | Static FSA constraints | 55% | 0 useful interventions — too weak |
| v2 | Fixed tool-call counter | 67.5% | Any pressure helps |
| v3 | Adaptive budget (2/3/5) | 55-67.5% | Ceiling irrelevant, floor matters |
| v4 | Evidence-pressure gate | 65% | Best escalation (77.8%), best pre-v7 |
| v5 | + pattern memory | 62.5% | Learned threats without tolerance |
| v6 | + bidirectional pressure | 55% | Novelty-as-penalty is the wrong abstraction |
| v7 | Anticipatory posture control | 70% | Benign recovery 78.6% (was 35.7%) |
| v8.0 | + gate authority (assists) | 90% | Model follows structured direction |
| v8.2 | + classifier/threshold fixes | 97.5% | 2 bugs: false yara match, threshold mismatch |
| v8.3 | + early-verdict guard | 100% | Blocks premature verdicts before min evidence |
| v9 | + dual-agent recomputation gate | 100% precision | Independent second agent; disagreement = escalate. Catches overconfident errors (0.95 conf, wrong). |
Three experimental conditions, 20 scenarios each (60 total runs):
| Condition | Accuracy | Benign | Suspicious | Escalate | Description |
|---|---|---|---|---|---|
| model_decides | 85.0% | 100% | 75.0% | 77.8% | Model alone, monitor runs silently |
| gate_overrides | 100% | 100% | 100% | 100% | Gate replaces model verdict |
| gate_assists | 100% | 100% | 100% | 100% | Gate steers model via strong prompt |
gate_overridescorrected 6 model errors (6 helped, 0 hurt).gate_assistsachieved 100% model agreement — the model followed the monitor's direction in every case.- Escalation accuracy: +22.2pp from model_decides to gate_assists.
Key insight: The control structure, not the model, is the decision authority. The same 7B model achieves 85% accuracy alone and 100% when embedded in the MorphSAT control loop. The gap is structural, not prompt engineering: an external state machine accumulates evidence, tracks posture, and communicates direction through a typed interface.
Layer 1: FSA lifecycle gate
Legal task-state transitions. Blocks impossible sequences.
Layer 2: Evidence sensors
Bidirectional classification: each tool result produces
(threat_delta, safety_delta). Coincidence detection boosts
on multi-signal convergence. Sidecar confidence from model output.
Layer 3: Shadow monitor (v7+)
Hidden posture state machine wrapping the agent's loop.
Novelty → ORIENT → bounded investigation → decide.
Safe evidence decays protective posture (tolerance).
The model never sees these states — they control what
happens AROUND the model.
Layer 4: Gate authority (v8+)
When the monitor commits to a direction, it communicates
that direction to the model (gate_assists) or overrides
the model's verdict entirely (gate_overrides).
Layer 5: Early-verdict guard (v8.3)
Blocks the model from issuing a verdict before gathering
minimum evidence (2 tool calls). Structural, not prompt-based.
Layer 6: Dual-agent recomputation gate (v9)
Second agent analyzes same alert independently (different
prompt framing, never sees first agent's output). Verdicts
compared: AGREE → emit. DISAGREE → escalate to human.
Unconditional — no confidence gating (overconfident errors
are the exact failure mode this catches).
Layer 7: Dual-store memory
Threat patterns and tolerance patterns stored separately.
Familiarity modulates future posture (the strange loop).
Layer 8: Receipts
Turn-by-turn JSON audit: state, evidence, posture, outcomes.
Every decision is reproducible. SHA256-stamped.
NORMAL ──→ ORIENTING ──→ SAFE_DISTANCE ──→ NORMAL (safe recovery)
│ └──→ ESCALATE_READY (threat confirmed)
│ └──→ ABSTAIN_READY (contradictory)
│
└──→ INVESTIGATING ──→ COMMIT_READY (clear evidence)
│ ESCALATE_READY (high threat)
│ ABSTAIN_READY (contradictory)
└──→ SWARM_CALL (multi-axis pressure)
Budget guards from any state: max tools, evidence loop, no new info → force commit
Full architecture diagram: See
docs/morphsat_control_diagram.mdfor control flow, shadow state machine, Soar mapping, and a worked example (supply_01 trace).
pip install morphsatOr from source:
git clone https://github.com/echo313unfolding/MorphSAT.git
cd MorphSAT
pip install -e ".[dev]"from morphsat import MorphSATGate, TaskState, TaskEvent
gate = MorphSATGate()
state, legal, action = gate.step(TaskEvent.NEW_TASK)
assert state == TaskState.PLANNING
assert legal is Truefrom morphsat import ShadowMonitor, SplitMemoryStore
memory = SplitMemoryStore("/tmp/memory.json")
monitor = ShadowMonitor(memory=memory)
monitor.initialize(alert_text="Unknown binary in /tmp")
# Monitor enters ORIENT if alert is novel
print(monitor.state) # ShadowState.ORIENTING
# Feed evidence — monitor transitions through posture states
action = monitor.process_evidence("check_hash", "Hash not in VirusTotal")
print(monitor.state) # ShadowState.INVESTIGATING
print(action.action) # "CONTINUE"
action = monitor.process_evidence("check_parent", "Parent: systemd")
print(monitor.state) # ShadowState.COMMIT_READY
print(action.action) # "COMMIT"
print(action.direction) # "benign"
# Gate authority: use monitor.last_action.direction to steer the model
# gate_assists: "The controller concluded this is BENIGN. Issue verdict."
# gate_overrides: verdict = monitor.last_action.direction (model discarded)
# Close episode — updates memory for next run (the strange loop)
monitor.close_episode("benign", confidence=0.8)from morphsat import DualAgentGate, AgentResult
def my_agent_runner(system_prompt: str, alert_text: str, context=None) -> AgentResult:
"""Your agent loop here — call LLM, simulate tools, extract verdict."""
...
gate = DualAgentGate(
primary_prompt="You are a security triage agent...",
verifier_prompt="You are an independent security analyst...",
runner=my_agent_runner,
)
result = gate.run(alert_text="Suspicious binary in /tmp")
if result.escalated:
# Agents disagreed — route to human review
print(f"ESCALATE: {result.disagreement_detail}")
else:
# Agents agreed — emit verdict
print(f"Verdict: {result.final_verdict} (conf={result.final_confidence})")Design decision: Dual-agent is unconditional. Pre-flight analysis showed overconfident wrong answers (0.95 confidence on incorrect verdict). Confidence does not predict error. See
docs/RECOMPUTATION_GATED_ARCHITECTURE.mdfor the full thesis.
10 adversarial scenarios (ambiguous alerts designed to fool single agents), Qwen2.5-Coder-3B:
| Metric | Value |
|---|---|
| Single-agent accuracy | 50-60% |
| Disagreement rate | 20-30% |
| Disagreement precision | 100% (every disagreement involved a real error) |
| Errors caught by disagreement | 2-3 per run |
On the standard 20-scenario set without the full control stack, agreement rate is 80% with 100% disagreement precision. G5 gate (>85% agreement on standard) narrowly missed — agents disagree more than ideal on easy cases, but every disagreement flags a genuine error. The recomputation gate adds no false alarms at the cost of routing ~20% of verdicts to human review.
morphsat/
├── morphsat/
│ ├── __init__.py # Public API
│ ├── core.py # FSA gate, TaskState/TaskEvent, classify_event
│ ├── token.py # Token adjacency scoring (4-lane structure)
│ ├── pressure_gate.py # v4 evidence-pressure gate
│ ├── commit_gate.py # v6 bidirectional commit gate + split memory
│ ├── shadow_monitor.py # v7 anticipatory posture controller
│ ├── recomp_gate.py # v9 dual-agent recomputation gate
│ └── receipt.py # Receipt wrapping with SHA256 content hash
├── tests/
│ ├── test_core.py # 31 tests: FSA structure, transitions, receipts
│ ├── test_token.py # 22 tests: lane scoring, temperature, masking
│ ├── test_shadow_monitor.py # 22 tests: v7 posture predictions
│ └── test_recomp_gate.py # 14 tests: dual-agent agreement, disagreement, receipts
├── docs/
│ ├── PRESSURE_GATE_SPEC.md
│ ├── COGNITIVE_ARCHITECTURE_TRANSLATION.md
│ ├── RECOMPUTATION_GATED_ARCHITECTURE.md # Four-layer RGA thesis
│ ├── morphsat_technical_note.md # 2-page technical note (v8.3 results)
│ └── morphsat_control_diagram.md # Architecture diagrams + Soar mapping
├── receipts/
│ ├── v7_shadow_monitor/ # v7 benchmark receipts (single-seed + 3-seed)
│ └── morphsat_v83_early_verdict_guard/ # v8.3 benchmark receipt (60 runs, 20 scenarios x 3 conditions, 100% gate modes)
├── tools/
│ └── bench_gate_authority.py # Gate authority benchmark harness
└── pyproject.toml
123/123 tests passing (Python 3.10).
- N=20 scenario benchmark with simulated tool responses
- Temperature=0 (deterministic) — no stochastic variance across seeds
- Qwen2.5-Coder-7B doing security triage — not its primary domain
- The shadow monitor is tested on one task type (alert triage)
- 100% gate_assists accuracy is an upper bound on this benchmark, not a claim about arbitrary inputs
- The evidence classifier is keyword-based, not learned
- This is a research testbed, not a production system
| Project | Description |
|---|---|
| helix-substrate | Calibration-free neural network compression (HXQ). |
| sentinel-hybrid-stack | Hybrid SSM-Transformer security monitoring pipeline. |
| helix-codec | Standalone C99 tensor codec library. |
| hxq-solana | Codec-aware provenance for off-chain artifacts on Solana. |
MIT — see LICENSE.