A multi-agent MCP server that acts as a meta-orchestration layer between the client and the user. Problems flow through 7 specialized LLM-backed agents that analyse, strategise, plan, verify, execute, and score solutions in parallel competing branches. Failed approaches are remembered as anti-patterns; winning tactics become reusable patterns across sessions.
Parallel strategy fan-out with blackboard coordination:
┌───────────────────────────────────────────────┐
│ 1. INTAKE │
│ Reception → Issue classification │
│ Master → Client identity │
│ Strategist → N competing strategies │
└───────────────────────┬───────────────────────┘
│
┌────────────────────────────▼────────────────────────────┐
│ 2. PARALLEL STRATEGY FAN-OUT │
│ │
│ ┌─ Strategy 1: Taktik Planner → Judge → Mission ─┐ │
│ ├─ Strategy 2: Taktik Planner → Judge → Mission ─┤ │
│ └─ Strategy 3: Taktik Planner → Judge → Mission ─┘ │
│ │
│ Judge rejects? → Retry within branch (max 3×) │
│ All branches fail? → Re-strategise (max 2 rounds) │
└────────────────────────────┬───────────────────────────┘
│
┌───────────────────────▼───────────────────────┐
│ 3. EVALUATION │
│ Jury → 5-dimension scoring per mission │
│ Master → Synthesise final answer │
│ Master → Distribute learnings (memories) │
└───────────────────────────────────────────────┘
All agents read/write to a shared BLACKBOARD (SQLite + sqlite-vss)
Agent
Temp
Role
Reads
Writes
Reception
0.1
Intake, classify, structure
Client input, past memories
Issue, Session
Master
0.3
Orchestrate, teach, synthesise
All notes and knowledge
Identity, learnings, answer
Strategist
0.9
Creative strategy generation
Issue, long-term memories
N ranked Strategies
Taktik Planner
0.8
Step-by-step execution plan
Strategy, short-term memories
Taktik with steps + skills
Judge
0.1
Verify for errors and gaps
Taktik, postcondition
Verification, error notes
Mission
0.2
Execute steps, record results
Verified Taktik
Step-by-step results
Jury
0.2
Score and compare missions
All results, Judge notes
5-dim scores, rankings
Dimension
Weight
Dimension
Weight
Correctness
30%
Elegance
15%
Completeness
25%
Efficiency
10%
Robustness
20%
Two brains with three scopes:
Scope
Lifetime
Teaches
Contains
Short-term
Current session
Taktik Planner
Attempt results, observations
Long-term
Persists forever
Strategist
Good/bad practices
Permanent
Never decays
Strategist
Proven patterns, anti-patterns
Memory features: embedding-based recall (sqlite-vss, 384-dim), confidence decay on contradiction, near-duplicate supersession, relevance tracking.
PLANNED ──branch starts──▸ IN_PROGRESS ──mission succeeds──▸ SUCCEEDED
│ │
├── mission fails ──▸ FAILED │
│ │
Jury scores: Jury scores:
score ≥ 0.75 → "proven" score < 0.30 → "archived"
score ≥ 0.30 → "adequate"
Score thresholds drive memory creation: proven strategies become reusable pattern memories; archived strategies become anti_pattern warnings.
Every problem is structured by the Reception Agent into:
Field
Purpose
who
Who has the problem
where_location
Where it is happening
why_reason
Why it is happening
precondition
True before the problem
postcondition
True after the problem is solved
classification
bug, architecture, performance, refactor, security, testing, deployment, documentation
severity
critical, high, medium, low, info
key_points
Step-by-step breakdown
LLM vs. Deterministic Boundary
LLM does
Code does
Problem decomposition
State management (SQLite)
Strategy generation
Pipeline orchestration
Taktik planning
Parallel branch execution
Verification critique
Score aggregation & weighting
Mission execution
Memory lifecycle (decay, supersession)
Performance evaluation
Event streaming & observability
Output synthesis
Vector indexing (sqlite-vss)
Client profile tracking
The Master Agent builds a profile of each client across sessions:
Field
Example
expertise_level
beginner, intermediate, advanced, expert
known_domains
["python", "devops", "frontend"]
communication_style
concise, detailed, visual
total_sessions
42
The Reception Agent uses this to tailor intake; the Master Agent uses it to calibrate the final answer.
git clone < repo-url>
cd ss
pip install -e " .[dev]"
# stdio transport (for Claude Code)
ss
# or explicitly
python -m ss.server
{
"mcpServers" : {
"ss" : {
"command" : " /path/to/ss/.venv/bin/python" ,
"args" : [" -m" , " ss.server" ]
}
}
}
Setting
Default
Description
db_path
data/ss.db
SQLite database location
embedding_model
all-MiniLM-L6-v2
Sentence-transformers model
embedding_dimension
384
Vector dimensions
num_strategies
3
Strategies per session
max_judge_retries
3
Judge rejection retries per branch
max_restrategize_rounds
2
Full re-strategise rounds
max_concurrent_sampling
1
Concurrent MCP sampling calls
similarity_threshold
0.15
Distance for memory supersession
confidence_decay_rate
0.05
Per-contradiction confidence loss
proven_threshold
0.75
Score to label strategy "proven"
archive_threshold
0.30
Score to label strategy "archived"
pytest tests/ -v # 239 tests
Session Lifecycle Tools (4)
Tool
Description
solve
Start a new problem-solving session. Pipeline runs async. Returns session_id immediately.
get_events
Poll for pipeline events after a given event ID. Use for streaming progress.
get_result
Get the final result of a completed session.
cancel
Cancel a running session.
Memory & Learning Tools (2)
Tool
Description
recall
Search memories by semantic similarity. Filter by type: good_practice, bad_practice, pattern, anti_pattern, knowledge, insight.
get_session_history
List past sessions, optionally filtered by client.
Observability Tools (2)
Tool
Description
inspect_session
Full session trace: issue, strategies, taktiks, missions, scores, notes, memories.
get_agent_notes
Notes written by agents during a session. Filter by agent: reception, master, strategist, taktik_planner, judge, mission, jury.
Streaming Events
Event
Payload
session_created
{ session_id, problem_summary }
agent_started
{ agent_name, phase }
reception_intake
{ issue_summary, classification, who, where, why }
master_joined
{ client_identity }
strategies_generated
{ count, strategies: [{ id, description, rank }] }
taktik_planned
{ strategy_id, steps_count, required_skills }
judge_verified
{ taktik_id, verified, rejection_reason? }
judge_rejected_loop
{ taktik_id, attempt, max_attempts, reason }
mission_started
{ strategy_id, mission_id }
mission_step
{ mission_id, step_index, action, outcome }
mission_completed
{ mission_id, status }
jury_scored
{ scores: [{ strategy_id, score, metrics }] }
master_synthesized
{ winning_strategy_id, final_answer }
memory_created
{ type, scope, content_preview }
session_completed
{ session_id, status, total_events, duration_ms }
session_failed
{ session_id, error, last_agent }
Phase
Calls
Parallelism
Intake (Reception + Master + Strategist)
3
Sequential
Execution (Taktik + Judge + Mission) × N strategies
3 × N
Parallel branches
Evaluation (Jury + Master synthesis)
2
Sequential
Total for 3 strategies
14
Worst case with retries: 2 rounds × 3 strategies × 3 judge retries = 18 taktik attempts.
Session ─┬── Issue
├── Strategy ─┬── Taktik ── Mission ── MissionResult
│ └── StrategyScore
├── SessionEvent (append-only stream)
└── AgentNote
Memory (good_practice | bad_practice | pattern | anti_pattern | knowledge | insight)
ClientProfile ── ClientIssueHistory
EmbeddingRegistry ←→ vss_* virtual tables (sqlite-vss, 384-dim)
Every content-bearing entity is indexed in sqlite-vss for embedding-based similarity search.
src/ss/
├── server.py # FastMCP server — 8 tools
├── config.py # Configuration dataclass
├── blackboard/
│ ├── database.py # SQLite connection, WAL mode, sqlite-vss, migrations
│ ├── schema.sql # Full DDL — 13 tables, 14 indexes, 6 vss tables
│ ├── models.py # Pydantic models (Session, Issue, Strategy, Taktik, ...)
│ └── repository.py # CRUD operations per entity type
├── vectors/
│ ├── encoder.py # Lazy-loading SentenceTransformer wrapper
│ └── store.py # VectorStore with sqlite-vss (index, search, find_similar)
├── memory/
│ ├── manager.py # Store, recall, supersede, distribute learnings
│ ├── client_profile.py # Client identity tracking across sessions
│ └── cleanup.py # Expiry, confidence decay, garbage collection
├── agents/
│ ├── base.py # BaseAgent ABC — LLM calls, events, notes, memory recall
│ ├── reception.py # Reception Agent (temp 0.1) — intake, classify, structure
│ ├── master.py # Master Agent (temp 0.3) — orchestrate, teach, synthesise
│ ├── strategist.py # Strategist Agent (temp 0.9) — creative strategy generation
│ ├── taktik_planner.py # Taktik Planner Agent (temp 0.8) — step-by-step planning
│ ├── judge.py # Judge Agent (temp 0.1) — verify for errors and gaps
│ ├── mission.py # Mission Agent (temp 0.2) — execute and record results
│ └── jury.py # Jury Agent (temp 0.2) — score and compare missions
├── pipeline/
│ ├── runner.py # SessionRunner — full pipeline orchestration
│ ├── branch.py # StrategyBranch — parallel Taktik → Judge → Mission
│ └── events.py # EventType enum, event creation helpers
└── sampling/
└── adapter.py # MCP sampling wrapper with semaphore concurrency control
tests/
├── test_blackboard/ # Models, database, repository (59 tests)
├── test_vectors/ # Encoder, vector store (41 tests)
├── test_sampling/ # Sampling adapter (12 tests)
├── test_memory/ # Manager, cleanup (29 tests)
├── test_agents/ # Base agent + all 7 agents (82 tests)
├── test_pipeline/ # Events, branch, runner (16 tests)
└── test_integration/ # End-to-end pipeline (12 tests — full 239 total)
Phase
Status
1 — Core Pipeline (7 agents, blackboard, parallel fan-out)
Complete
2 — Memory System (sqlite-vss, decay, supersession, learning)
Complete
3 — Observability (event streaming, session inspection, agent notes)
Complete
4 — Direct LLM Provider (fallback when MCP sampling unavailable)
Planned
5 — Dashboard (web UI for session traces, memory browser, metrics)
Planned
6 — Skill Discovery (awesome-agent-skills-mcp integration in Taktik Planner)
Planned
See LICENSE for details.