# ArchRouter Multimodal Routing Example

This notebook demonstrates how to use [Arch-Router-1.5B](https://huggingface.co/katanemo/Arch-Router-1.5B) for intelligent request routing in a multimodal LLM system.

## Overview

**Arch-Router** is a lightweight model designed to classify user intents and route requests to the most appropriate backend model. It supports:

- **Hard questions** â†’ Route to powerful reasoning models (e.g., GPT-5)
- **Chit-chat** â†’ Route to efficient conversational models (e.g., Nemotron)
- **Image understanding** â†’ Route to vision-language models (e.g., Nemotron-VL)
- **Retry requests** â†’ Handle cases where users indicate previous answers were incorrect

## Architecture

```
User Request â†’ Arch-Router â†’ Intent Classification â†’ Model Selection â†’ Response
                    â†“
            [hard_question, chit_chat, image_understanding, image_question, try_again, other]
```

---

## Prerequisites


In [None]:
%%capture
import sys
python = sys.executable

!{python} -m ensurepip --upgrade
!{python} -m pip install --upgrade pip setuptools wheel
!{python} -m pip install --upgrade --force-reinstall python-dotenv
%pip install aiohttp

Restart the Kernel

In [None]:
from IPython.display import clear_output
clear_output(wait=True)

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

Clone the repository, or skip this line if you are using the Brev launchable which already has the source code.

In [None]:
!git clone https://github.com/NVIDIA-AI-Blueprints/llm-router.git
!cd llm-router && git checkout experimental

In [None]:
import os
os.chdir('../llm-router')

Before running this notebook, you need to deploy the Arch-Router model using vLLM. Run the following Docker command on a Linux machine with GPU support:

In [None]:
# =====================================================
# PREREQUISITE: Deploy Arch-Router with Docker (Linux)
# =====================================================
# Run this command in your terminal BEFORE running this notebook.
# The router model must be running on a GPU server.

!docker run -d --rm --runtime nvidia --gpus "device=0" \
    --name arch_router \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8011:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model katanemo/Arch-Router-1.5B

# After starting, verify it's running:
# curl http://localhost:8011/health

print("âœ“ Make sure the Arch-Router Docker container is running before proceeding.")

---

## 1. Setup and Imports

In [None]:
%pip install uv

In [None]:
!uv pip install .

First, we set up the Python path and import required libraries.

In [None]:
import sys
import os

# Standard library imports
import json
import asyncio
import time
import logging
from typing import Any, Dict, List, Tuple, Optional
from functools import lru_cache

# Third-party imports
import requests
from pydantic import Field
from transformers import AutoTokenizer

# NAT Framework imports
from nat.builder.builder import Builder
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.function import FunctionBaseConfig

# Configure logging for the notebook
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("âœ“ Standard imports loaded successfully")
print("âœ“ NAT Framework imports loaded successfully")

## 2. Configuration

Set up the remote model endpoint and model name. The router uses a vLLM server hosting the Arch-Router model.

In [None]:
# =====================================================
# Remote Model Configuration
# =====================================================

# Set your Arch-Router endpoint (vLLM server URL)
# Default: http://localhost:8011 if running locally via Docker
REMOTE_MODEL_URL = os.getenv("ARCH_ROUTER_ENDPOINT", "http://localhost:8011")
MODEL_NAME = "katanemo/Arch-Router-1.5B"

# Global tokenizer (lazy loaded)
tokenizer = None

print(f"âœ“ Configuration set:")
print(f"  - Remote Model URL: {REMOTE_MODEL_URL}")
print(f"  - Model Name: {MODEL_NAME}")

## 3. Tokenizer and Health Check Functions

Utility functions to load the tokenizer and verify the remote model is available.

In [None]:
def _load_tokenizer():
    """Lazy load the tokenizer on first use."""
    global tokenizer
    if tokenizer is None:
        logger.info(f"Loading tokenizer for {MODEL_NAME}...")
        try:
            tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
            logger.info("Tokenizer loaded successfully")
        except Exception as e:
            logger.error(f"Failed to load tokenizer: {e}")
            raise
    return tokenizer


def _check_remote_model():
    """Check if remote model is available."""
    try:
        response = requests.get(f"{REMOTE_MODEL_URL}/health", timeout=5)
        if response.status_code == 200:
            logger.info(f"Remote model at {REMOTE_MODEL_URL} is available")
            return True
    except Exception as e:
        logger.error(f"Remote model at {REMOTE_MODEL_URL} is not available: {e}")
        return False

print("âœ“ Tokenizer and health check functions defined")

## 4. Prompt Templates

The Arch-Router model uses specific prompt templates for optimal performance. These templates structure the conversation and route definitions for the model to analyze.

In [None]:
# =====================================================
# Prompt Templates (Use as provided for best performance)
# =====================================================

TASK_INSTRUCTION = """
You are a helpful assistant designed to find the best suited route.
You are provided with route description within <routes></routes> XML tags:
<routes>

{routes}

</routes>

<conversation>

{conversation}

</conversation>
"""

FORMAT_PROMPT = """
Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags.  Follow the instruction:
1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
2. You must analyze the route descriptions and find the best match route for user latest intent. 
3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.

Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
{"route": "route_name"} 
"""

print("âœ“ Prompt templates defined")

## 5. JSON Encoder Utility

Custom JSON encoder that handles Pydantic models and other non-serializable objects commonly used in the routing pipeline.

In [None]:
class PydanticEncoder(json.JSONEncoder):
    """Custom JSON encoder for Pydantic models and non-serializable objects."""
    
    def default(self, obj):
        # Handle Pydantic models
        if hasattr(obj, 'model_dump'):
            return obj.model_dump()
        # Handle dict-like objects
        if hasattr(obj, '__dict__'):
            return obj.__dict__
        # Handle iterables (except strings)
        if hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes)):
            try:
                return list(obj)
            except TypeError:
                pass
        return super().default(obj)

print("âœ“ PydanticEncoder class defined")

## 6. Route Configuration

Define the available routes and their descriptions. Each route maps to a specific intent type that the router will classify incoming requests into.

The `MAP_INTENT_TO_PIPELINE` dictionary maps classified intents to the appropriate backend model.

In [None]:
# =====================================================
# Route Configuration
# =====================================================

# Define available routes with their descriptions
route_config = [
    {
        "name": "hard_question",
        "description": "A question that requires deep reasoning, or complex problem solving, or if the user asks for careful thinking or careful consideration",
    },
    {
        "name": "chit_chat",
        "description": "Any social chit chat, small talk, or casual conversation.",
    },
    {
        "name": "try_again",
        "description": "Only if the user explicitly says the previous answer was incorrect or incomplete.",
    },
    {
        "name": "image_understanding",
        "description": "A question that requires understanding an image.",
    },
    {
        "name": "image_question",
        "description": "A question that requires the assistant to see the user eg a question about their appearance, environment, scene or surroundings.",
    },
]

# Pre-compute routes JSON once to avoid repeated serialization
_ROUTES_JSON_CACHED = json.dumps(route_config, cls=PydanticEncoder)

# Map classified intents to backend models
MAP_INTENT_TO_PIPELINE = {
    "other": "nvidia/nvidia-nemotron-nano-9b-v2",
    "chit_chat": "nvidia/nvidia-nemotron-nano-9b-v2",
    "hard_question": "gpt-5-chat",
    "image_understanding": "nvidia/nemotron-nano-12b-v2-vl",
    "image_question": "nvidia/nemotron-nano-12b-v2-vl",
    "try_again": "gpt-5-chat",
}

print("âœ“ Route configuration defined:")
for route in route_config:
    print(f"  - {route['name']}: {route['description'][:50]}...")
print(f"\nâœ“ Intent-to-model mapping:")
for intent, model in MAP_INTENT_TO_PIPELINE.items():
    print(f"  - {intent} â†’ {model}")

## 7. Helper Functions

These helper functions handle:
- **Image redaction**: Removes image data from multimodal conversations (router is text-only but can detect image-related intent from text)
- **Prompt formatting**: Constructs the prompt for the Arch-Router model
- **Response parsing**: Extracts the route decision from model output

In [None]:
def redact_images_from_conversation(conversation: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """Remove image data from conversation while preserving text context.
    
    The router model doesn't support images directly, but can still determine
    if the text intent requires image understanding capabilities.
    """
    redacted = []
    for i, msg in enumerate(conversation):
        msg_copy = msg.copy()
        content = msg_copy.get("content")
        
        # If content is a list (multimodal format), process it
        if isinstance(content, list):
            text_parts = []
            
            for item in content:
                logger.info(f"  Item: {type(item)}, {item if not isinstance(item, dict) else list(item.keys())}")
                if isinstance(item, dict):
                    if item.get("type") == "text":
                        item_text = item.get("text", "")
                        text = f"<new msg>{item_text} </msg>"
                        text_parts.append(text)
                    elif item.get("type") == "image_url":
                        # Skip image content
                        continue
            
            # Combine text parts
            combined_text = " ".join(text_parts)
            msg_copy["content"] = combined_text
        
        redacted.append(msg_copy)
    
    return redacted


def format_prompt(conversation: List[Dict[str, Any]]) -> str:
    """Create the system prompt for the router model.
    
    Uses pre-computed routes JSON for efficiency.
    """
    return (
        TASK_INSTRUCTION.format(
            routes=_ROUTES_JSON_CACHED,
            conversation=json.dumps(conversation, cls=PydanticEncoder)
        )
        + FORMAT_PROMPT
    )


@lru_cache(maxsize=128)
def _parse_route_response(response: str) -> str:
    """Parse and cache route responses to avoid repeated JSON parsing."""
    try:
        return json.loads(response)["route"]
    except json.JSONDecodeError:
        # Handle single quote format
        import ast
        return ast.literal_eval(response)["route"]


def materialize_iterator(obj):
    """Recursively convert ValidatorIterator and other iterables to lists."""
    if hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, dict)):
        try:
            return [materialize_iterator(item) for item in obj]
        except TypeError:
            pass
    elif isinstance(obj, dict):
        return {k: materialize_iterator(v) for k, v in obj.items()}
    return obj

print("âœ“ Helper functions defined")

## 8. Core Routing Function

The main routing function that:
1. Takes a conversation (list of messages)
2. Redacts any images from multimodal content
3. Formats the prompt for the Arch-Router model
4. Calls the remote vLLM API
5. Parses the response to determine the best route
6. Returns the appropriate backend model to use

In [None]:
def get_route_from_conversation(conversation: List[Dict[str, Any]]) -> str:
    """Determine the best route for the conversation using the remote Arch-Router model.
    
    Args:
        conversation: List of message dictionaries with 'role' and 'content' keys
        
    Returns:
        str: The name of the selected route (e.g., 'hard_question', 'chit_chat', etc.)
    """
    inference_start = time.perf_counter()
    
    # Redact images from messages because the router does not support them
    # But it can still determine if the text intent requires image understanding
    redacted_conversation = redact_images_from_conversation(conversation)
    
    # ===== FORMAT PROMPT =====
    prompt_start = time.perf_counter()
    route_prompt = format_prompt(redacted_conversation)
    prompt_time = time.perf_counter() - prompt_start
    
    # ===== CONSTRUCT MESSAGES =====
    construct_start = time.perf_counter()
    messages = [
        {"role": "user", "content": route_prompt},
    ]
    construct_time = time.perf_counter() - construct_start

    # ===== ENCODE (TOKENIZE) =====
    # Not needed for remote API, but keeping for timing consistency
    encode_start = time.perf_counter()
    encode_time = time.perf_counter() - encode_start

    # ===== GENERATION (REMOTE API CALL) =====
    generation_start = time.perf_counter()
    try:
        # Call remote vLLM OpenAI-compatible API
        response = requests.post(
            f"{REMOTE_MODEL_URL}/v1/chat/completions",
            json={
                "model": MODEL_NAME,
                "messages": messages,
                "max_tokens": 32,
                "temperature": 0.3,
                "top_p": 0.9,
            },
            timeout=30,
        )
        response.raise_for_status()
        result = response.json()
        response_text = result["choices"][0]["message"]["content"]
    except Exception as e:
        logger.error(f"Failed to call remote model: {e}")
        raise
    
    generation_time = time.perf_counter() - generation_start

    # ===== DECODING =====
    decode_start = time.perf_counter()
    # Response is already decoded text from remote API
    decode_time = time.perf_counter() - decode_start
    
    # Use cached parser
    route = _parse_route_response(response_text)
    
    total_time = time.perf_counter() - inference_start
    
    # Log timing breakdown
    logger.info(
        f"Route inference timing breakdown | "
        f"Format: {prompt_time*1000:.2f}ms | "
        f"Construct: {construct_time*1000:.2f}ms | "
        f"Encode: {encode_time*1000:.2f}ms | "
        f"Generate: {generation_time*1000:.2f}ms | "
        f"Decode: {decode_time*1000:.2f}ms | "
        f"Total: {total_time*1000:.2f}ms"
    )
    logger.debug(f"Route: {route}, Response: {response_text[:100]}")
    
    return route


def route_request(conversation: List[Dict[str, Any]]) -> Tuple[str, str]:
    """Route a conversation to the appropriate backend model.
    
    This is a simplified synchronous version for demonstration purposes.
    
    Args:
        conversation: List of message dictionaries
        
    Returns:
        Tuple of (model_name, intent_name)
    """
    response_start = time.perf_counter()
    
    # Get the route from the conversation
    user_intent = get_route_from_conversation(conversation)
    
    total_response_time = time.perf_counter() - response_start
    
    logger.info(f"User intent: {user_intent} (total response time: {total_response_time*1000:.2f}ms)")
    return MAP_INTENT_TO_PIPELINE[user_intent], user_intent

print("âœ“ Core routing functions defined")

---

## 9. Example Usage

Let's test the router with different types of user messages to see how it classifies intents and routes to appropriate models.

**Note**: These examples require the Arch-Router Docker container to be running.

In [None]:
# First, check if the remote model is available
print("Checking remote model availability...")
is_available = _check_remote_model()

if is_available:
    print("\nâœ“ Remote model is available! You can run the examples below.")
else:
    print("\nâœ— Remote model is not available.")
    print("  Please start the Arch-Router Docker container first.")
    print(f"  Expected endpoint: {REMOTE_MODEL_URL}")

In [None]:
# =====================================================
# Test Cases for Different Intent Types
# =====================================================

test_conversations = [
    # Test 1: Chit-chat
    {
        "name": "Chit-chat",
        "messages": [{"role": "user", "content": "Hey! How's it going? Nice weather today, isn't it?"}]
    },
    
    # Test 2: Hard question (requires reasoning)
    {
        "name": "Hard Question",
        "messages": [{"role": "user", "content": "Can you carefully think through this problem: If a train leaves station A at 3pm traveling at 60mph, and another train leaves station B at 4pm traveling at 80mph towards station A, and the stations are 200 miles apart, when and where will they meet?"}]
    },
    
    # Test 3: Image understanding request
    {
        "name": "Image Understanding",
        "messages": [{"role": "user", "content": "Can you analyze this image and tell me what objects are in it?"}]
    },
    
    # Test 4: Image question (about user's surroundings)
    {
        "name": "Image Question (Surroundings)",
        "messages": [{"role": "user", "content": "What can you see in my room? Can you describe my surroundings?"}]
    },
    
    # Test 5: Try again request
    {
        "name": "Try Again",
        "messages": [
            {"role": "user", "content": "What is 2+2?"},
            {"role": "assistant", "content": "2+2 equals 5."},
            {"role": "user", "content": "That's wrong! Please try again with the correct answer."}
        ]
    },
    
    # Test 6: General question (should route to 'other')
    {
        "name": "General Question",
        "messages": [{"role": "user", "content": "What is the capital of France?"}]
    },
]

print("Test conversations defined. Run the next cell to test routing.")
print(f"\nTotal test cases: {len(test_conversations)}")
for i, test in enumerate(test_conversations, 1):
    print(f"  {i}. {test['name']}")

In [None]:
# =====================================================
# Run Routing Tests
# =====================================================
# Note: This requires the Arch-Router Docker container to be running

print("Running routing tests...\n")
print("=" * 70)

results = []
for test in test_conversations:
    print(f"\nTest: {test['name']}")
    print(f"Input: {test['messages'][-1]['content'][:60]}...")
    
    try:
        model, intent = route_request(test['messages'])
        results.append({
            "test": test['name'],
            "intent": intent,
            "model": model,
            "status": "success"
        })
        print(f"â†’ Intent: {intent}")
        print(f"â†’ Routed to: {model}")
    except Exception as e:
        results.append({
            "test": test['name'],
            "intent": "error",
            "model": "N/A",
            "status": str(e)
        })
        print(f"â†’ Error: {e}")
    
    print("-" * 70)

print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
for r in results:
    status = "âœ“" if r['status'] == 'success' else "âœ—"
    print(f"{status} {r['test']}: {r['intent']} â†’ {r['model']}")

## 10. Multimodal Example

The router can also handle multimodal messages (with images). The images are redacted for the router, but it can still determine intent from the text context.

In [None]:
# =====================================================
# Multimodal Message Example
# =====================================================

# Example multimodal message with text and image
multimodal_conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image? Can you describe it in detail?"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSk..."}}
        ]
    }
]

print("Testing multimodal message...")
print(f"Original message has {len(multimodal_conversation[0]['content'])} parts")

# Show the redaction process
redacted = redact_images_from_conversation(multimodal_conversation)
print(f"\nRedacted message: {redacted[0]['content']}")

# Route the request (requires remote model)
try:
    model, intent = route_request(multimodal_conversation)
    print(f"\nâ†’ Intent: {intent}")
    print(f"â†’ Routed to: {model}")
except Exception as e:
    print(f"\nâ†’ Error (remote model may not be available): {e}")

---

## 11. NAT Framework Integration

This section registers the router as an objective function with the NAT framework for production deployment.

The `hf_intent_objective_fn` is registered as a function that can be used with the `sfc_router` component to handle incoming chat requests and route them to the appropriate backend model.

In [None]:
# =====================================================
# NAT Framework Integration
# =====================================================

from nat_sfc_router.schema.openai_chat_request import OpenAIChatRequest


class HFIntentObjectiveConfig(FunctionBaseConfig, name="hf_intent_objective_fn"):
    """HF intent objective function for best route."""
    pass


@register_function(config_type=HFIntentObjectiveConfig)
async def hf_intent_objective_fn(config: HFIntentObjectiveConfig,
                                 _builder: Builder):
    """HF intent objective function for best route."""

    # Check if remote model is available
    _check_remote_model()
    
    # Load tokenizer (model is remote)
    loaded_tokenizer = _load_tokenizer()

    async def _response_fn(chat_request: OpenAIChatRequest) -> Tuple[str, str]:
        """HF intent objective function for best route."""
        response_start = time.perf_counter()

        # ===== EXTRACT MESSAGES =====
        extract_start = time.perf_counter()
        messages = chat_request.messages
        extract_time = time.perf_counter() - extract_start

        if messages:
            # ===== CONVERT TO DICT =====
            dict_convert_start = time.perf_counter()
            last_msg = messages[-1]
            last_msg_dict = last_msg.model_dump() if hasattr(last_msg, 'model_dump') else dict(last_msg)
            dict_convert_time = time.perf_counter() - dict_convert_start

            # ===== MATERIALIZE ITERATORS =====
            materialize_start = time.perf_counter()
            last_msg_dict = materialize_iterator(last_msg_dict)
            materialize_time = time.perf_counter() - materialize_start

            # Assign a list containing only the last message's dictionary
            messages_dict = [last_msg_dict]
            
            logger.debug(
                f"Message preparation timing | "
                f"Extract: {extract_time*1000:.2f}ms | "
                f"Dict convert: {dict_convert_time*1000:.2f}ms | "
                f"Materialize: {materialize_time*1000:.2f}ms"
            )
        else:
            # Handle the case where the list of messages is empty
            messages_dict = []
            logger.warning("No messages received in chat request")

        # Run model inference (blocking call in event loop)
        user_intent = get_route_from_conversation(messages_dict)
        
        total_response_time = time.perf_counter() - response_start

        logger.info(f"User intent: {user_intent} (total response time: {total_response_time*1000:.2f}ms)")
        return MAP_INTENT_TO_PIPELINE[user_intent], ""
    

    yield FunctionInfo.from_fn(
        _response_fn,
        description="Demonstrative objective function for best model.")

print("âœ“ HFIntentObjectiveConfig class defined")
print("âœ“ hf_intent_objective_fn registered with NAT framework")

## 12. NAT Framework Example Usage

Now let's demonstrate how to use the NAT-registered `hf_intent_objective_fn` with sample chat requests. This shows how the function would be invoked in a production NAT pipeline.


In [None]:
# =====================================================
# NAT Framework Example Usage
# =====================================================
# This demonstrates how to use the registered hf_intent_objective_fn
# with the NAT framework in a production-like scenario.

async def run_nat_framework_example():
    """Run example routing requests through the NAT framework function."""
    
    # Create the config instance
    config = HFIntentObjectiveConfig()
    
    # In production, this is handled by the NAT Builder
    async with hf_intent_objective_fn(config, None) as func_info:
        response_fn = func_info.single_fn
        
        # Define test requests using OpenAI chat format
        test_requests = [
            {
                "name": "Chit-chat",
                "messages": [{"role": "user", "content": "Hey there! How's your day going?"}]
            },
            {
                "name": "Hard Question", 
                "messages": [{"role": "user", "content": "Please think carefully: What are the philosophical implications of GÃ¶del's incompleteness theorems?"}]
            },
            {
                "name": "Image Request",
                "messages": [{"role": "user", "content": "Can you look at this photo and tell me what you see?"}]
            },
            {
                "name": "Retry Request",
                "messages": [
                    {"role": "user", "content": "What is 15 * 7?"},
                    {"role": "assistant", "content": "15 * 7 = 95"},
                    {"role": "user", "content": "That's incorrect, please try again."}
                ]
            },
        ]
        
        print("=" * 70)
        print("NAT FRAMEWORK ROUTING EXAMPLES")
        print("=" * 70)
        
        for test in test_requests:
            print(f"\nðŸ“¨ Test: {test['name']}")
            print(f"   Input: {test['messages'][-1]['content'][:50]}...")
            
            try:
                # Create an OpenAIChatRequest-like object
                # In production, this comes from the incoming API request
                chat_request = OpenAIChatRequest(
                    model="router",
                    messages=test['messages']
                )
                
                # Call the response function (this is what NAT does internally)
                model, _ = await response_fn(chat_request)
                
                print(f"   âœ“ Routed to: {model}")
                
            except Exception as e:
                print(f"   âœ— Error: {e}")
            
            print("-" * 70)
        
        print("\nâœ“ NAT Framework example completed!")

# Run the async example
print("Running NAT Framework example...")
print("Note: This requires the Arch-Router Docker container to be running.\n")

try:
    # Use asyncio.run() or await depending on environment
    import nest_asyncio
    nest_asyncio.apply()
    asyncio.run(run_nat_framework_example())
except ImportError:
    # nest_asyncio not available, try direct run
    try:
        asyncio.run(run_nat_framework_example())
    except RuntimeError:
        # Already in an async context (e.g., Jupyter)
        await run_nat_framework_example()
except Exception as e:
    print(f"Error running example: {e}")
    print("\nMake sure:")
    print("  1. The Arch-Router Docker container is running")
    print("  2. All previous cells have been executed")
    print(f"  3. The endpoint {REMOTE_MODEL_URL} is accessible")


Once you're done, let's spin down the ArchRouter and free up space for the next notebook

In [None]:
!docker kill arch_router

---

## Conclusion

This notebook demonstrated how to use the Arch-Router model for intelligent request routing:

1. **Setup**: Deploy the model using vLLM with Docker
2. **Configuration**: Define routes and their descriptions
3. **Routing**: Send user messages to the router to classify intent
4. **Model Selection**: Route to the appropriate backend model based on intent
5. **NAT Integration**: Register the router as a NAT framework objective function

### Key Benefits

- **Cost Efficiency**: Route simple queries to smaller, cheaper models
- **Quality**: Route complex queries to more capable models
- **Multimodal Support**: Automatically detect when vision capabilities are needed
- **Low Latency**: The 1.5B router model is fast and lightweight
- **Production Ready**: NAT framework integration for seamless deployment

### Next Steps

- Customize the `route_config` for your specific use cases
- Integrate with your backend model serving infrastructure
- See `2_Embedding_NN_Training.ipynb` for training your own router model