-
-
Notifications
You must be signed in to change notification settings - Fork 3
Model Quartermaster
MQM is a learning-based model selection engine that dynamically routes requests to the most appropriate LLM based on task characteristics, historical performance, cost constraints, and learned patterns.
| Signal | Source | Purpose |
|---|---|---|
| Historical performance | Per-task category stats | Which model performed best for this task type |
| Episodic memory hits | Memory system | Similar past requests and their outcomes |
| Cost optimization | Provider cost data | Token/$ efficiency |
| Quality estimation | Reflection feedback | Confidence and correctness scores |
| Trajectory patterns | Recent model usage | Context from the current session |
| Reflection feedback | Post-turn analysis | Self-assessment of response quality |
| Mode | Confidence | Behavior |
|---|---|---|
enforce |
> 0.85 | Override model selection entirely |
suggest |
> 0.65 | Inject hint into system prompt |
defer |
≤ 0.65 | Use default provider |
Signal weights update via EMA:
new_weight = old_weight + learning_rate × (reward - old_weight)
Learning rate decays over time: 0.05 → 0.995^observations
MQM starts in observe-only mode. It records performance data for the first 50 LLM calls before activating and making predictions.
| Strategy | Preference | Confidence Required | Best For |
|---|---|---|---|
conservative |
Cheaper models | High | Cost-sensitive deployments |
balanced |
Cost/quality balance | Standard | General use (default) |
aggressive |
Highest quality | Lower | Quality-critical tasks |
Requests are automatically classified into:
-
code— Code generation, debugging, refactoring -
analysis— Data analysis, summarization, evaluation -
creative— Writing, design, ideation -
factual— Research, lookups, verification -
conversation— Chat, clarification, general Q&A
5 tables in cortex.db (migration 019):
-
mqm_model_stats— Per-model performance metrics -
mqm_signal_weights— Learned signal importance -
mqm_decisions— Full audit trail per decision -
mqm_session_state— Per-session tracking -
mqm_patterns— Learned tool-sequence patterns
MQM runs as a pipeline hook at pre-llm and post-llm stages, providing model recommendations before each LLM call and recording outcomes afterward.
cortex mqm stats # Performance statistics per model
cortex mqm decisions # Recent routing decisions
cortex mqm weights # Current signal weights
cortex mqm accuracy # Prediction accuracy metricsThe Quartermaster unified page provides:
- Tool Orchestration — QM patterns, decisions, stats
- Model Intelligence — MQM model stats, accuracy trends, signal weights
- Settings — Enable/disable, pin provider, choose strategy
- Model Routing — Cascade and threshold routing strategies
- LLM Providers — All 24 supported providers
- Configuration — MQM config options
CortexPrism — Open-source agentic AI harness · MIT License · Built with Deno 2.x + TypeScript