Model Quartermaster

Model Quartermaster (MQM)

MQM is a learning-based model selection engine that dynamically routes requests to the most appropriate LLM based on task characteristics, historical performance, cost constraints, and learned patterns.

Architecture

6-Signal Prediction Engine

Signal	Source	Purpose
Historical performance	Per-task category stats	Which model performed best for this task type
Episodic memory hits	Memory system	Similar past requests and their outcomes
Cost optimization	Provider cost data	Token/$ efficiency
Quality estimation	Reflection feedback	Confidence and correctness scores
Trajectory patterns	Recent model usage	Context from the current session
Reflection feedback	Post-turn analysis	Self-assessment of response quality

Decision Modes

Mode	Confidence	Behavior
`enforce`	> 0.85	Override model selection entirely
`suggest`	> 0.65	Inject hint into system prompt
`defer`	≤ 0.65	Use default provider

Adaptive Learning

Signal weights update via EMA:

new_weight = old_weight + learning_rate × (reward - old_weight)

Learning rate decays over time: 0.05 → 0.995^observations

Observation-First Startup

MQM starts in observe-only mode. It records performance data for the first 50 LLM calls before activating and making predictions.

Arbiter Strategies

Strategy	Preference	Confidence Required	Best For
`conservative`	Cheaper models	High	Cost-sensitive deployments
`balanced`	Cost/quality balance	Standard	General use (default)
`aggressive`	Highest quality	Lower	Quality-critical tasks

Task Categorization

Requests are automatically classified into:

code — Code generation, debugging, refactoring
analysis — Data analysis, summarization, evaluation
creative — Writing, design, ideation
factual — Research, lookups, verification
conversation — Chat, clarification, general Q&A

Database Schema

5 tables in cortex.db (migration 019):

mqm_model_stats — Per-model performance metrics
mqm_signal_weights — Learned signal importance
mqm_decisions — Full audit trail per decision
mqm_session_state — Per-session tracking
mqm_patterns — Learned tool-sequence patterns

Pipeline Integration

MQM runs as a pipeline hook at pre-llm and post-llm stages, providing model recommendations before each LLM call and recording outcomes afterward.

CLI

cortex mqm stats            # Performance statistics per model
cortex mqm decisions        # Recent routing decisions
cortex mqm weights          # Current signal weights
cortex mqm accuracy         # Prediction accuracy metrics

Uh oh!

Model Quartermaster

Model Quartermaster (MQM)

Architecture

6-Signal Prediction Engine

Decision Modes

Adaptive Learning

Observation-First Startup

Arbiter Strategies

Task Categorization

Database Schema

Pipeline Integration

CLI

Web UI

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CortexPrism Wiki

Getting Started

Core Concepts

AI & Models

Features

Extending

API Reference

Operations

Development

Reference

Clone this wiki locally