Skip to content

echo313unfolding/MorphSAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MorphSAT

Python 3.10+ License Tests Checkpoint PyPI

Structured commit control for LLM agent loops. The model proposes; the gate decides.

What is this?

MorphSAT is a testbed for studying when a local LLM agent should stop gathering evidence and commit to an action.

An LLM agent with tool access can loop indefinitely — calling tools, reading results, calling more tools — without ever deciding. Or it can commit prematurely on insufficient evidence. MorphSAT wraps the agent's loop in a structured cognitive control stack — an external state machine that accumulates evidence, tracks posture, and holds decision authority. The model proposes actions; the gate decides when and how to commit.

The current checkpoint (v9) adds a dual-agent recomputation gate — two independent agents analyze the same alert, and disagreement routes to human review. On adversarial scenarios designed to fool single agents, disagreement precision is 100% (every disagreement involved a real error). The underlying single-agent loop (v8.3) achieves 100% accuracy on the standard 20-scenario benchmark.

For cognitive architecture researchers: See docs/COGNITIVE_ARCHITECTURE_TRANSLATION.md for a term mapping to Soar, ACT-R, and active inference. See docs/morphsat_technical_note.md for the 2-page technical note with full results.

The proof chain

Nine versions tested on a 20-scenario security alert triage benchmark (Qwen2.5-Coder-7B, temperature 0, simulated tools):

Version Mechanism Accuracy Key finding
v1 Static FSA constraints 55% 0 useful interventions — too weak
v2 Fixed tool-call counter 67.5% Any pressure helps
v3 Adaptive budget (2/3/5) 55-67.5% Ceiling irrelevant, floor matters
v4 Evidence-pressure gate 65% Best escalation (77.8%), best pre-v7
v5 + pattern memory 62.5% Learned threats without tolerance
v6 + bidirectional pressure 55% Novelty-as-penalty is the wrong abstraction
v7 Anticipatory posture control 70% Benign recovery 78.6% (was 35.7%)
v8.0 + gate authority (assists) 90% Model follows structured direction
v8.2 + classifier/threshold fixes 97.5% 2 bugs: false yara match, threshold mismatch
v8.3 + early-verdict guard 100% Blocks premature verdicts before min evidence
v9 + dual-agent recomputation gate 100% precision Independent second agent; disagreement = escalate. Catches overconfident errors (0.95 conf, wrong).

v8.3 result

Three experimental conditions, 20 scenarios each (60 total runs):

Condition Accuracy Benign Suspicious Escalate Description
model_decides 85.0% 100% 75.0% 77.8% Model alone, monitor runs silently
gate_overrides 100% 100% 100% 100% Gate replaces model verdict
gate_assists 100% 100% 100% 100% Gate steers model via strong prompt
  • gate_overrides corrected 6 model errors (6 helped, 0 hurt).
  • gate_assists achieved 100% model agreement — the model followed the monitor's direction in every case.
  • Escalation accuracy: +22.2pp from model_decides to gate_assists.

Key insight: The control structure, not the model, is the decision authority. The same 7B model achieves 85% accuracy alone and 100% when embedded in the MorphSAT control loop. The gap is structural, not prompt engineering: an external state machine accumulates evidence, tracks posture, and communicates direction through a typed interface.

Architecture

Layer 1: FSA lifecycle gate
         Legal task-state transitions. Blocks impossible sequences.

Layer 2: Evidence sensors
         Bidirectional classification: each tool result produces
         (threat_delta, safety_delta). Coincidence detection boosts
         on multi-signal convergence. Sidecar confidence from model output.

Layer 3: Shadow monitor (v7+)
         Hidden posture state machine wrapping the agent's loop.
         Novelty → ORIENT → bounded investigation → decide.
         Safe evidence decays protective posture (tolerance).
         The model never sees these states — they control what
         happens AROUND the model.

Layer 4: Gate authority (v8+)
         When the monitor commits to a direction, it communicates
         that direction to the model (gate_assists) or overrides
         the model's verdict entirely (gate_overrides).

Layer 5: Early-verdict guard (v8.3)
         Blocks the model from issuing a verdict before gathering
         minimum evidence (2 tool calls). Structural, not prompt-based.

Layer 6: Dual-agent recomputation gate (v9)
         Second agent analyzes same alert independently (different
         prompt framing, never sees first agent's output). Verdicts
         compared: AGREE → emit. DISAGREE → escalate to human.
         Unconditional — no confidence gating (overconfident errors
         are the exact failure mode this catches).

Layer 7: Dual-store memory
         Threat patterns and tolerance patterns stored separately.
         Familiarity modulates future posture (the strange loop).

Layer 8: Receipts
         Turn-by-turn JSON audit: state, evidence, posture, outcomes.
         Every decision is reproducible. SHA256-stamped.

Shadow monitor states

NORMAL ──→ ORIENTING ──→ SAFE_DISTANCE ──→ NORMAL (safe recovery)
              │                  └──→ ESCALATE_READY (threat confirmed)
              │                  └──→ ABSTAIN_READY (contradictory)
              │
              └──→ INVESTIGATING ──→ COMMIT_READY (clear evidence)
                        │            ESCALATE_READY (high threat)
                        │            ABSTAIN_READY (contradictory)
                        └──→ SWARM_CALL (multi-axis pressure)

Budget guards from any state: max tools, evidence loop, no new info → force commit

Full architecture diagram: See docs/morphsat_control_diagram.md for control flow, shadow state machine, Soar mapping, and a worked example (supply_01 trace).

Install

pip install morphsat

Or from source:

git clone https://github.com/echo313unfolding/MorphSAT.git
cd MorphSAT
pip install -e ".[dev]"

Quick start

FSA lifecycle gate

from morphsat import MorphSATGate, TaskState, TaskEvent

gate = MorphSATGate()
state, legal, action = gate.step(TaskEvent.NEW_TASK)
assert state == TaskState.PLANNING
assert legal is True

Shadow monitor + gate authority (v8.3)

from morphsat import ShadowMonitor, SplitMemoryStore

memory = SplitMemoryStore("/tmp/memory.json")
monitor = ShadowMonitor(memory=memory)
monitor.initialize(alert_text="Unknown binary in /tmp")

# Monitor enters ORIENT if alert is novel
print(monitor.state)  # ShadowState.ORIENTING

# Feed evidence — monitor transitions through posture states
action = monitor.process_evidence("check_hash", "Hash not in VirusTotal")
print(monitor.state)       # ShadowState.INVESTIGATING
print(action.action)       # "CONTINUE"

action = monitor.process_evidence("check_parent", "Parent: systemd")
print(monitor.state)       # ShadowState.COMMIT_READY
print(action.action)       # "COMMIT"
print(action.direction)    # "benign"

# Gate authority: use monitor.last_action.direction to steer the model
# gate_assists: "The controller concluded this is BENIGN. Issue verdict."
# gate_overrides: verdict = monitor.last_action.direction (model discarded)

# Close episode — updates memory for next run (the strange loop)
monitor.close_episode("benign", confidence=0.8)

Dual-agent recomputation gate (v9)

from morphsat import DualAgentGate, AgentResult

def my_agent_runner(system_prompt: str, alert_text: str, context=None) -> AgentResult:
    """Your agent loop here — call LLM, simulate tools, extract verdict."""
    ...

gate = DualAgentGate(
    primary_prompt="You are a security triage agent...",
    verifier_prompt="You are an independent security analyst...",
    runner=my_agent_runner,
)

result = gate.run(alert_text="Suspicious binary in /tmp")

if result.escalated:
    # Agents disagreed — route to human review
    print(f"ESCALATE: {result.disagreement_detail}")
else:
    # Agents agreed — emit verdict
    print(f"Verdict: {result.final_verdict} (conf={result.final_confidence})")

Design decision: Dual-agent is unconditional. Pre-flight analysis showed overconfident wrong answers (0.95 confidence on incorrect verdict). Confidence does not predict error. See docs/RECOMPUTATION_GATED_ARCHITECTURE.md for the full thesis.

v9 benchmark (WO-RECOMP-04)

10 adversarial scenarios (ambiguous alerts designed to fool single agents), Qwen2.5-Coder-3B:

Metric Value
Single-agent accuracy 50-60%
Disagreement rate 20-30%
Disagreement precision 100% (every disagreement involved a real error)
Errors caught by disagreement 2-3 per run

On the standard 20-scenario set without the full control stack, agreement rate is 80% with 100% disagreement precision. G5 gate (>85% agreement on standard) narrowly missed — agents disagree more than ideal on easy cases, but every disagreement flags a genuine error. The recomputation gate adds no false alarms at the cost of routing ~20% of verdicts to human review.

Project structure

morphsat/
├── morphsat/
│   ├── __init__.py           # Public API
│   ├── core.py               # FSA gate, TaskState/TaskEvent, classify_event
│   ├── token.py              # Token adjacency scoring (4-lane structure)
│   ├── pressure_gate.py      # v4 evidence-pressure gate
│   ├── commit_gate.py        # v6 bidirectional commit gate + split memory
│   ├── shadow_monitor.py     # v7 anticipatory posture controller
│   ├── recomp_gate.py        # v9 dual-agent recomputation gate
│   └── receipt.py            # Receipt wrapping with SHA256 content hash
├── tests/
│   ├── test_core.py          # 31 tests: FSA structure, transitions, receipts
│   ├── test_token.py         # 22 tests: lane scoring, temperature, masking
│   ├── test_shadow_monitor.py # 22 tests: v7 posture predictions
│   └── test_recomp_gate.py   # 14 tests: dual-agent agreement, disagreement, receipts
├── docs/
│   ├── PRESSURE_GATE_SPEC.md
│   ├── COGNITIVE_ARCHITECTURE_TRANSLATION.md
│   ├── RECOMPUTATION_GATED_ARCHITECTURE.md  # Four-layer RGA thesis
│   ├── morphsat_technical_note.md      # 2-page technical note (v8.3 results)
│   └── morphsat_control_diagram.md     # Architecture diagrams + Soar mapping
├── receipts/
│   ├── v7_shadow_monitor/    # v7 benchmark receipts (single-seed + 3-seed)
│   └── morphsat_v83_early_verdict_guard/  # v8.3 benchmark receipt (60 runs, 20 scenarios x 3 conditions, 100% gate modes)
├── tools/
│   └── bench_gate_authority.py  # Gate authority benchmark harness
└── pyproject.toml

123/123 tests passing (Python 3.10).

Caveats

  • N=20 scenario benchmark with simulated tool responses
  • Temperature=0 (deterministic) — no stochastic variance across seeds
  • Qwen2.5-Coder-7B doing security triage — not its primary domain
  • The shadow monitor is tested on one task type (alert triage)
  • 100% gate_assists accuracy is an upper bound on this benchmark, not a claim about arbitrary inputs
  • The evidence classifier is keyword-based, not learned
  • This is a research testbed, not a production system

Companion projects

Project Description
helix-substrate Calibration-free neural network compression (HXQ).
sentinel-hybrid-stack Hybrid SSM-Transformer security monitoring pipeline.
helix-codec Standalone C99 tensor codec library.
hxq-solana Codec-aware provenance for off-chain artifacts on Solana.

License

MIT — see LICENSE.

About

Finite-state automata pipeline enforcer for agent constraint satisfaction

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages