# Lab 1C: Add Model Router to Landing Zone

Deploy the **Model Router** to the Landing Zone and enable it for project spokes.

## What Gets Deployed

| Resource | Purpose |
|----------|--------|
| Model Router Deployment | Intelligent model selection that routes queries to the optimal model |

## What is Model Router?

Model Router automatically selects the best model for each query based on:
- **Performance**: Matches query complexity to model capabilities
- **Cost**: Uses smaller models for simple queries, larger models for complex ones  
- **Latency**: Optimizes response time by routing appropriately

> ‚ö†Ô∏è **Prerequisite**: Complete **Lab 1A** and **Lab 1B** first

## Step 1: Load Existing Configuration

In [None]:
import os

env_file = '/workspaces/getting-started-with-foundry/.env'
with open(env_file) as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('#') and '=' in line:
            key, value = line.split('=', 1)
            os.environ[key] = value

AI_ENDPOINT = os.environ['AI_ENDPOINT']
APIM_URL = os.environ['APIM_URL']
APIM_KEY = os.environ['APIM_KEY']
MODEL_NAME = os.environ['MODEL_NAME']

print(f"AI Endpoint: {AI_ENDPOINT}\nAPIM URL: {APIM_URL}\nModel: {MODEL_NAME}")

## Step 2: Set Variables

In [2]:
RG = "foundry-lz-parent"
LOCATION = "eastus2"

## Step 3: Deploy Model Router and gpt-4.1-nano to Landing Zone

Model Router requires underlying models to route to. We'll add gpt-4.1-nano for simple queries.

‚è±Ô∏è Takes ~1-2 minutes

In [None]:
import subprocess

!az deployment group create -g "{RG}" --template-file main.bicep -o table

AI_ACCOUNT_NAME = subprocess.run(f'az deployment group show -g "{RG}" -n main --query properties.outputs.aiAccountName.value -o tsv', 
                                  shell=True, capture_output=True, text=True).stdout.strip()

print(f"\nDeploying gpt-4.1-nano to {AI_ACCOUNT_NAME}...")
!az cognitiveservices account deployment create -g "{RG}" -n "{AI_ACCOUNT_NAME}" \
  --deployment-name gpt-4.1-nano --model-name gpt-4.1-nano --model-version 2025-04-14 \
  --model-format OpenAI --sku-capacity 30 --sku-name GlobalStandard -o table

## Step 4: Get Model Router Outputs

In [4]:
import subprocess, json
from pathlib import Path

r = subprocess.run(f'az deployment group show -g "{RG}" -n main --query properties.outputs -o json', shell=True, capture_output=True, text=True)
MODEL_ROUTER_NAME = json.loads(r.stdout)['modelRouterName']['value']

env_file = Path("/workspaces/getting-started-with-foundry/.env")
with open(env_file, 'a') as f:
    f.write(f"\nMODEL_ROUTER_NAME={MODEL_ROUTER_NAME}\n")

print(f"‚úÖ Model Router: {MODEL_ROUTER_NAME}")

‚úÖ Model Router: model-router


## Step 5: Test Model Router Directly (Landing Zone)

First, test that the model router is working directly against the landing zone.

In [None]:
!pip install openai azure-identity -q

In [6]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
client = AzureOpenAI(azure_endpoint=AI_ENDPOINT, azure_ad_token_provider=token_provider, api_version="2024-10-21")

response = client.chat.completions.create(model=MODEL_ROUTER_NAME, messages=[{"role": "user", "content": "What is 2+2?"}])
print(f"Simple query ‚Üí Model: {response.model}\nResponse: {response.choices[0].message.content}")

Simple query ‚Üí Model: gpt-4.1-nano-2025-04-14
Response: 2 + 2 equals 4.


In [7]:
response = client.chat.completions.create(model=MODEL_ROUTER_NAME, 
    messages=[{"role": "user", "content": "Explain supervised vs unsupervised machine learning with use cases."}])
print(f"Complex query ‚Üí Model: {response.model}\nResponse: {response.choices[0].message.content[:400]}...")

Complex query ‚Üí Model: gpt-4.1-mini-2025-04-14
Response: Certainly! Here's an explanation of supervised vs. unsupervised machine learning along with common use cases:

### Supervised Learning

**Definition:**  
Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset. This means that each training example is paired with an output label. The goal is for the model to learn a mapping from inputs to outputs so it...


## Step 6: Update Spoke Project Connection

Now we need to update the spoke project's APIM connection to include the model-router in its available models.

In [None]:
SPOKE_ACCOUNT = os.environ.get('SPOKE_ACCOUNT')
SPOKE_PROJECT = os.environ.get('SPOKE_PROJECT')
APIM_CONNECTION = os.environ.get('APIM_CONNECTION')

if SPOKE_ACCOUNT:
    print(f"Spoke: {SPOKE_ACCOUNT} / {SPOKE_PROJECT}\nConnection: {APIM_CONNECTION}")
else:
    print("‚ö†Ô∏è Run Lab 1B first")

In [9]:
import subprocess, json, tempfile

if SPOKE_ACCOUNT:
    SPOKE_RG = "foundry-child-1"
    models_config = [
        {"name": MODEL_NAME, "properties": {"model": {"name": MODEL_NAME, "version": "", "format": "OpenAI"}}},
        {"name": MODEL_ROUTER_NAME, "properties": {"model": {"name": MODEL_ROUTER_NAME, "version": "", "format": "OpenAI"}}}
    ]
    
    sub_id = subprocess.run('az account show --query id -o tsv', shell=True, capture_output=True, text=True).stdout.strip()
    connection_uri = f"https://management.azure.com/subscriptions/{sub_id}/resourceGroups/{SPOKE_RG}/providers/Microsoft.CognitiveServices/accounts/{SPOKE_ACCOUNT}/projects/{SPOKE_PROJECT}/connections/{APIM_CONNECTION}?api-version=2025-04-01-preview"
    
    existing = json.loads(subprocess.run(f'az rest --method GET --uri "{connection_uri}" -o json', shell=True, capture_output=True, text=True).stdout)
    
    update_body = {"properties": {"category": "ApiManagement", "target": existing['properties']['target'], "authType": "ApiKey",
        "credentials": {"key": APIM_KEY}, "metadata": {"deploymentInPath": "true", "inferenceAPIVersion": "2024-10-21", "models": json.dumps(models_config)}}}
    
    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
        json.dump(update_body, f); body_file = f.name
    
    result = subprocess.run(f'az rest --method PUT --uri "{connection_uri}" --body @{body_file}', shell=True, capture_output=True, text=True)
    os.unlink(body_file)
    
    print(f"‚úÖ Connection updated with models: {MODEL_NAME}, {MODEL_ROUTER_NAME}" if result.returncode == 0 else f"‚ùå Error: {result.stderr}")
else:
    print("‚è≠Ô∏è Skipping - run Lab 1B first")

‚úÖ Connection updated with models: gpt-4.1-mini, model-router


## Step 7: Test Model Router from Spoke via Agent

Test that the spoke project can access model-router via the APIM gateway using the Agent/Responses API.

> **Note**: In spoke projects, you must use agents with the Responses API - direct chat completions don't work with APIM connections.

In [10]:
!pip install azure-ai-projects==2.0.0b2 azure-identity -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [11]:
if SPOKE_ACCOUNT:
    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import PromptAgentDefinition
    
    SPOKE_ENDPOINT = os.environ.get('SPOKE_ENDPOINT', '')
    account_host = SPOKE_ENDPOINT.replace("https://", "").replace(".cognitiveservices.azure.com/", "")
    PROJECT_ENDPOINT = f"https://{account_host}.services.ai.azure.com/api/projects/{SPOKE_PROJECT}"
    
    project_client = AIProjectClient(credential=DefaultAzureCredential(), endpoint=PROJECT_ENDPOINT)
    openai_client = project_client.get_openai_client()
    
    GATEWAY_MODEL_ROUTER = f"{APIM_CONNECTION}/{MODEL_ROUTER_NAME}"
    agent = project_client.agents.create_version(agent_name="model-router-agent",
        definition=PromptAgentDefinition(model=GATEWAY_MODEL_ROUTER, instructions="You are a helpful assistant."))
    
    print(f"‚úÖ Agent: {agent.name} v{agent.version} using {GATEWAY_MODEL_ROUTER}")
else:
    print("‚è≠Ô∏è Skipping - run Lab 1B first")

‚úÖ Agent: model-router-agent v1 using landing-zone-apim/model-router


In [12]:
if SPOKE_ACCOUNT:
    from IPython.display import display, Markdown, HTML
    
    def test_query(query, label, expected_model):
        response = openai_client.responses.create(input=query,
            extra_body={"agent": {"name": agent.name, "version": agent.version, "type": "agent_reference"}})
        text = response.output_text if hasattr(response, 'output_text') else str(response.output)
        model = response.model
        match = "‚úÖ" if expected_model in model else "‚ùå"
        return {"label": label, "expected": expected_model, "model": model, "match": match, "response": text[:150]}
    
    results = [
        test_query("What is 2+2?", "Simple", "nano"),
        test_query("Explain supervised vs unsupervised ML with algorithms and use cases.", "Complex", "mini")
    ]
    
    display(Markdown("## üß™ Model Router Verification"))
    
    rows = "".join(f"<tr><td>{r['label']}</td><td>{r['expected']}</td><td>{r['model']}</td><td>{r['match']}</td></tr>" for r in results)
    display(HTML(f"<table><tr><th>Query Type</th><th>Expected</th><th>Actual Model</th><th>Result</th></tr>{rows}</table>"))
    
    display(Markdown("### üìã Sample Responses"))
    for r in results:
        display(Markdown(f"**{r['label']}:** {r['response']}..."))
else:
    print("‚è≠Ô∏è Skipping")

## üß™ Model Router Verification

Query Type,Expected,Actual Model,Result
Simple,nano,gpt-4.1-nano-2025-04-14,‚úÖ
Complex,mini,gpt-4.1-mini-2025-04-14,‚úÖ


### üìã Sample Responses

**Simple:** 2 + 2 = 4...

**Complex:** Certainly! Here's an explanation of supervised vs unsupervised machine learning, including common algorithms and use cases for each.

---

## Supervis...

## Done!

Model Router is now deployed in the landing zone and accessible from spoke projects via Agents.

### Verified Model Routing Behavior

| Query Type | Expected Model | Why |
|------------|----------------|-----|
| Simple (e.g., "What is 2+2?") | `gpt-4.1-nano` | Cost-effective for trivial queries |
| Complex (e.g., ML explanation) | `gpt-4.1-mini` | More capable for detailed responses |

### Key Patterns

| Pattern | Description |
|---------|-------------|
| Gateway Model Format | `<connection-name>/<model-id>` (e.g., `landing-zone-apim/model-router`) |
| Agent Required | Spoke projects must use Agent/Responses API, not direct chat completions |
| PromptAgentDefinition | Defines agent with model and instructions |
| Responses API | Invoke agents via `openai_client.responses.create()` with `agent_reference` |

### Model Router Pool (v2025-05-19)

Model Router can route to: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, o4-mini

---

## Cleanup (Optional)

In [13]:
# To remove model-router deployment:
# !az cognitiveservices account deployment delete -g "{RG}" -n "foundry-hub-{suffix}" --deployment-name "model-router" --yes