Skip to content

Add embedding-based semantic intent routing policy #39

@carlosatta

Description

@carlosatta

Summary

Introduce a semantic routing layer that classifies incoming requests by intent using embeddings, before applying the existing complexity-based LLM policy.

The goal is to improve routing accuracy by first understanding the nature of the task, not only its complexity.

Problem

The current routing flow can estimate request complexity, but it does not explicitly identify the task domain.

This can lead to suboptimal model selection in cases such as:

  • SQL or data tasks routed to generic models
  • rewrite / extraction tasks sent to overly expensive models
  • tool-oriented tasks not separated from general chat

Goal

Add a new routing policy able to:

  • classify a request into a semantic intent
  • map that intent to a candidate model pool
  • pass the narrowed pool to the existing complexity policy
  • support fallback for unknown or ambiguous classifications

Proposed behavior

High-level flow:

  1. Receive user request
  2. Generate embedding for the request text
  3. Compare it against predefined intent embeddings
  4. Select the most relevant intent
  5. Resolve the candidate model pool for that intent
  6. Run the existing LLM complexity policy only within that pool
  7. Select final model

Initial intents

Suggested initial intent categories:

  • coding
  • sql_data
  • tool_use
  • reasoning
  • general_chat
  • rewrite_extract

Functional requirements

  • Add a new routing policy: semantic_intent
  • Support pluggable embedding providers
  • Support local and remote embedding backends
  • Store intent definitions and example utterances in config
  • Compute similarity between request embedding and intent embeddings
  • Return:
    • top intent
    • top score
    • second-best intent
    • margin between first and second
    • classification status: confident, ambiguous, unknown
  • Map each intent to a candidate model pool
  • Allow fallback behavior for ambiguous and unknown

Configuration

Example configuration:

{
  "routing": {
    "policy": "semantic_intent",
    "embedding_provider": "openai",
    "embedding_model": "text-embedding-3-small",
    "absolute_threshold": 0.60,
    "ambiguity_threshold": 0.08,
    "fallback_policy": "llm_complexity"
  },
  "intents": {
    "coding": {
      "examples": [
        "write a python function to reverse a linked list",
        "fix this typescript bug",
        "generate unit tests for this class"
      ],
      "candidate_models": ["qwen-coder", "claude-sonnet", "claude-opus"]
    },
    "sql_data": {
      "examples": [
        "write a sql query to find top customers by revenue",
        "fix this postgres join",
        "convert this question into sql"
      ],
      "candidate_models": ["deepseek-chat", "claude-sonnet", "claude-opus"]
    }
  }
}

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions