Skip to content

Model Quartermaster

CortexPrism edited this page Jun 17, 2026 · 1 revision

Model Quartermaster (MQM)

MQM is a learning-based model selection engine that dynamically routes requests to the most appropriate LLM based on task characteristics, historical performance, cost constraints, and learned patterns.

Architecture

6-Signal Prediction Engine

Signal Source Purpose
Historical performance Per-task category stats Which model performed best for this task type
Episodic memory hits Memory system Similar past requests and their outcomes
Cost optimization Provider cost data Token/$ efficiency
Quality estimation Reflection feedback Confidence and correctness scores
Trajectory patterns Recent model usage Context from the current session
Reflection feedback Post-turn analysis Self-assessment of response quality

Decision Modes

Mode Confidence Behavior
enforce > 0.85 Override model selection entirely
suggest > 0.65 Inject hint into system prompt
defer ≤ 0.65 Use default provider

Adaptive Learning

Signal weights update via EMA:

new_weight = old_weight + learning_rate × (reward - old_weight)

Learning rate decays over time: 0.05 → 0.995^observations

Observation-First Startup

MQM starts in observe-only mode. It records performance data for the first 50 LLM calls before activating and making predictions.

Arbiter Strategies

Strategy Preference Confidence Required Best For
conservative Cheaper models High Cost-sensitive deployments
balanced Cost/quality balance Standard General use (default)
aggressive Highest quality Lower Quality-critical tasks

Task Categorization

Requests are automatically classified into:

  • code — Code generation, debugging, refactoring
  • analysis — Data analysis, summarization, evaluation
  • creative — Writing, design, ideation
  • factual — Research, lookups, verification
  • conversation — Chat, clarification, general Q&A

Database Schema

5 tables in cortex.db (migration 019):

  • mqm_model_stats — Per-model performance metrics
  • mqm_signal_weights — Learned signal importance
  • mqm_decisions — Full audit trail per decision
  • mqm_session_state — Per-session tracking
  • mqm_patterns — Learned tool-sequence patterns

Pipeline Integration

MQM runs as a pipeline hook at pre-llm and post-llm stages, providing model recommendations before each LLM call and recording outcomes afterward.

CLI

cortex mqm stats            # Performance statistics per model
cortex mqm decisions        # Recent routing decisions
cortex mqm weights          # Current signal weights
cortex mqm accuracy         # Prediction accuracy metrics

Web UI

The Quartermaster unified page provides:

  • Tool Orchestration — QM patterns, decisions, stats
  • Model Intelligence — MQM model stats, accuracy trends, signal weights
  • Settings — Enable/disable, pin provider, choose strategy

See Also

Clone this wiki locally