<a id='0'></a>
### 0Ô∏è‚É£ Initialize Notebook Variables

Configure the following variables according to your environment before running the notebook:

In [None]:
import os
import sys, json, requests, time
sys.path.insert(1, '../shared')  # add the shared directory to the Python path
import utils
from apimtools import APIMClientTool

inference_api_version = "2024-05-01-preview"

# ============================================================================
# REQUIRED: Update these values for your environment
# ============================================================================
governance_hub_resource_group = "REPLACE"  ## specify the resource group name where the Governance Hub is located
location = "REPLACE"  ## e.g., "eastus", "westus2", etc.

# ============================================================================
# OPTIONAL: LLM Backend Configuration (pre-configured with sample values from main.bicepparam)
# These values match the AI Foundry backends defined in the template
# ============================================================================
llm_backends_config = [
    {
        "backendId": "aif-citadel-primary",
        "backendType": "ai-foundry",
        "endpoint": "https://REPLACE-0.services.ai.azure.com/models",  # Replace RESOURCE_TOKEN
        "authScheme": "managedIdentity",
        "supportedModels": ["gpt-4o", "gpt-4o-mini", "DeepSeek-R1", "Phi-4"],
        "priority": 1,
        "weight": 100
    },
    {
        "backendId": "aif-citadel-secondary",
        "backendType": "ai-foundry",
        "endpoint": "https://REPLACE-1.services.ai.azure.com/models",  # Replace RESOURCE_TOKEN
        "authScheme": "managedIdentity",
        "supportedModels": ["gpt-5", "DeepSeek-R1"],
        "priority": 2,
        "weight": 50
    }
]

# Managed Identity for APIM authentication (will be auto-discovered if not specified)
apim_managed_identity_name = ""  # Leave empty to auto-discover

<a id='1'></a>
### 1Ô∏è‚É£ Verify Azure CLI and Connected Subscription

Ensure Azure CLI is authenticated and connected to the correct subscription:

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")

<a id='init'></a>
### ‚öôÔ∏è Initialize APIM Client Tool

üëâ Initialize the APIM client to interact with your existing Governance Hub deployment:

In [None]:
try:
    apimClientTool = APIMClientTool(
        governance_hub_resource_group
    )
    apimClientTool.initialize()
    
    apim_resource_name = apimClientTool.apim_resource_name
    apim_resource_gateway_url = str(apimClientTool.apim_resource_gateway_url)
    
    utils.print_ok(f"APIM Client Tool initialized successfully!")
    utils.print_info(f"APIM Resource Name: {apim_resource_name}")
    utils.print_info(f"APIM Gateway URL: {apim_resource_gateway_url}")
    
except Exception as e:
    utils.print_error(f"Error initializing APIM Client Tool: {e}")

<a id='2'></a>
### 2Ô∏è‚É£ Extract Current APIM Backend-Pools Configuration

Retrieve and analyze the existing backend pools and backends configured in your APIM instance:

In [None]:
# Extract current backends from APIM using the SDK
utils.print_info("Extracting current APIM backends configuration...")

try:
    # Use the APIMClientTool's new get_backends method (uses Azure SDK instead of CLI)
    existing_backends, existing_backend_pools = apimClientTool.get_backends()
    
except Exception as e:
    utils.print_error(f"Error extracting backends: {e}")
    existing_backends = []
    existing_backend_pools = []

In [None]:
# Get supported models from the policy fragment (if exists)
try:
    supported_models_from_policy = apimClientTool.get_policy_fragment_supported_models("set-backend-pools")
    utils.print_ok(f"Supported models in APIM policy fragment 'set-backend-pools':")
    for model in supported_models_from_policy:
        print(f"  ‚Ä¢ {model}")
except Exception as e:
    utils.print_warning(f"Could not retrieve policy fragment (may not exist yet): {e}")
    supported_models_from_policy = []

In [None]:
# Display summary of current configuration
utils.print_info("\n" + "="*60)
utils.print_info("CURRENT APIM BACKEND CONFIGURATION SUMMARY")
utils.print_info("="*60)

if existing_backends:
    print("\nüìã Individual Backends:")
    for backend in existing_backends:
        print(f"  ‚Ä¢ {backend['name']}")
        print(f"    URL: {backend['url']}")
        if backend['supportedModels']:
            print(f"    Models: {', '.join(backend['supportedModels'])}")

if existing_backend_pools:
    print("\nüì¶ Backend Pools:")
    for pool in existing_backend_pools:
        print(f"  ‚Ä¢ {pool['name']}")
        for svc in pool['services']:
            print(f"    - {svc.get('id', 'N/A')} (priority: {svc.get('priority', 'N/A')}, weight: {svc.get('weight', 'N/A')})")

if supported_models_from_policy:
    print(f"\nü§ñ Total Supported Models: {len(supported_models_from_policy)}")
    print(f"   {', '.join(supported_models_from_policy)}")

<a id='3'></a>
### 3Ô∏è‚É£ Discover Managed Identity for APIM Authentication

Auto-discover or specify the user-assigned managed identity used by APIM:

In [None]:
# Discover managed identity from APIM using the SDK
utils.print_info("Discovering managed identity configuration...")

# Use the APIMClientTool's get_managed_identity_info method
managed_identity_info = apimClientTool.get_managed_identity_info()

managed_identity_client_id = managed_identity_info.get('clientId')
managed_identity_name = managed_identity_info.get('name') or apim_managed_identity_name
managed_identity_resource_group = managed_identity_info.get('resourceGroup') or governance_hub_resource_group

if not managed_identity_client_id:
    utils.print_warning("Could not auto-discover managed identity. Please specify it manually in the configuration.")
else:
    utils.print_info(f"Client ID: {managed_identity_client_id}")

if managed_identity_name:
    utils.print_ok(f"Managed Identity Name: {managed_identity_name}")
    utils.print_ok(f"Managed Identity Resource Group: {managed_identity_resource_group}")

<a id='4'></a>
### 4Ô∏è‚É£ Generate LLM Backend Parameter File

Generate a customizable `.bicepparam` file with the full list of LLM backends to be integrated with APIM:

In [None]:
# Configure the LLM backends for deployment
# You can modify the llm_backends_config list defined in the initialization cell

utils.print_info("LLM Backends to be deployed:")
for backend in llm_backends_config:
    print(f"\n  üîó {backend['backendId']}")
    print(f"     Type: {backend['backendType']}")
    print(f"     Endpoint: {backend['endpoint']}")
    print(f"     Auth: {backend['authScheme']}")
    print(f"     Models: {', '.join(backend['supportedModels'])}")
    print(f"     Priority: {backend.get('priority', 1)}, Weight: {backend.get('weight', 100)}")

In [None]:
# Generate the .bicepparam file content
bicep_dir = "../bicep/infra/llm-backend-onboarding"
params_file = os.path.join(bicep_dir, "llm-backends-generated-local.bicepparam")

# Format backends array for Bicep
def format_backend_for_bicep(backend):
    models_str = "\n      ".join([f"'{m}'" for m in backend['supportedModels']])
    return f"""  {{
    backendId: '{backend['backendId']}'
    backendType: '{backend['backendType']}'
    endpoint: '{backend['endpoint']}'
    authScheme: '{backend['authScheme']}'
    supportedModels: [
      {models_str}
    ]
    priority: {backend.get('priority', 1)}
    weight: {backend.get('weight', 100)}
  }}"""

backends_bicep_str = "\n".join([format_backend_for_bicep(b) for b in llm_backends_config])

params_content = f"""using './main.bicep'

// ============================================================================
// LLM Backend Onboarding - Generated Parameter File
// Generated: {time.strftime('%Y-%m-%d %H:%M:%S')}
// ============================================================================

// ============================================================================
// API Management (APIM) Configuration
// ============================================================================
param apim = {{
  subscriptionId: '{subscription_id}'
  resourceGroupName: '{governance_hub_resource_group}'
  name: '{apim_resource_name}'
}}

// ============================================================================
// APIM Managed Identity Configuration
// ============================================================================
param apimManagedIdentity = {{
  subscriptionId: '{subscription_id}'
  resourceGroupName: '{managed_identity_resource_group}'
  name: '{managed_identity_name}'
}}

// ============================================================================
// LLM Backend Configuration Array
// ============================================================================
param llmBackendConfig = [
{backends_bicep_str}
]

// ============================================================================
// Circuit Breaker Configuration
// ============================================================================
param configureCircuitBreaker = true
"""

# Write the parameter file
utils.print_info(f"Generating parameter file: {params_file}")
with open(params_file, 'w') as f:
    f.write(params_content)

utils.print_ok(f"Parameter file generated successfully!")
print("\n" + "="*60)
print("GENERATED PARAMETER FILE CONTENT:")
print("="*60)
print(params_content)

<a id='5'></a>
### 5Ô∏è‚É£ Deploy LLM Backend Onboarding Bicep

Deploy the LLM backends, backend pools, and policy fragments to APIM:

In [None]:
# Deploy the LLM backend onboarding
deployment_name = f"llm-backend-onboarding-{time.strftime('%Y%m%d%H%M%S')}"
template_file = os.path.join(bicep_dir, "main.bicep")

utils.print_info(f"Starting deployment: {deployment_name}")
utils.print_info(f"Template: {template_file}")
utils.print_info(f"Parameters: {params_file}")

# Run the subscription-level deployment
deployment_cmd = f"az deployment sub create --name {deployment_name} --location {location} --template-file {template_file} --parameters {params_file}"

output = utils.run(
    deployment_cmd,
    f"Deployment '{deployment_name}' succeeded",
    f"Deployment '{deployment_name}' failed"
)

if output.success:
    utils.print_ok("Deployment completed successfully!")
    
    # Display deployment outputs if available
    outputs = output.json_data.get('properties', {}).get('outputs', {}) if output.json_data else {}
    
    if outputs:
        print("\n" + "="*60)
        print("DEPLOYMENT OUTPUTS:")
        print("="*60)
        
        for key, value in outputs.items():
            print(f"  {key}: {value.get('value')}")
    else:
        utils.print_info("No deployment outputs returned.")
else:
    utils.print_error("Deployment failed. Check the error messages above.")

<a id='6'></a>
### 6Ô∏è‚É£ Verify Deployed Configuration

Verify that the backends, pools, and policy fragments were created successfully:

In [None]:
# Re-initialize APIM client to pick up new backends
apimClientTool.initialize()

# Get updated supported models from policy fragment
try:
    updated_supported_models = apimClientTool.get_policy_fragment_supported_models("set-backend-pools")
    utils.print_ok(f"Updated supported models in APIM policy fragment 'set-backend-pools':")
    for model in updated_supported_models:
        print(f"  ‚Ä¢ {model}")
except Exception as e:
    utils.print_error(f"Error retrieving policy fragment: {e}")
    updated_supported_models = []

In [None]:
# Display summary of current configuration
utils.print_info("\n" + "="*60)
utils.print_info("CURRENT APIM BACKEND CONFIGURATION SUMMARY")
utils.print_info("="*60)

if existing_backends:
    print("\nüìã Individual Backends:")
    for backend in existing_backends:
        print(f"  ‚Ä¢ {backend['name']}")
        print(f"    URL: {backend['url']}")
        if backend['supportedModels']:
            print(f"    Models: {', '.join(backend['supportedModels'])}")

if existing_backend_pools:
    print("\nüì¶ Backend Pools:")
    for pool in existing_backend_pools:
        print(f"  ‚Ä¢ {pool['name']}")
        for svc in pool['services']:
            print(f"    - {svc.get('id', 'N/A')} (priority: {svc.get('priority', 'N/A')}, weight: {svc.get('weight', 'N/A')})")

if supported_models_from_policy:
    print(f"\nü§ñ Total Supported Models: {len(supported_models_from_policy)}")
    print(f"   {', '.join(supported_models_from_policy)}")

---
## üß™ Test Deployed Models

The following sections test the deployed models through both the Universal LLM API and Azure OpenAI API endpoints.

<a id='test-universal'></a>
### üß™ Test via Universal LLM API (models/chat/completions)

Test the deployed models using the Universal LLM API which routes based on the `model` field in the request body:

In [None]:
# Discover the Universal LLM API endpoint
apimClientTool.discover_api("models")
azure_endpoint_models = str(apimClientTool.azure_endpoint)
chat_completions_url_models = f"{azure_endpoint_models}models/chat/completions?api-version={inference_api_version}"

utils.print_info(f"Universal LLM API Endpoint: {chat_completions_url_models}")

# Get an API key from subscriptions
if apimClientTool.apim_subscriptions:
    api_key = apimClientTool.apim_subscriptions[-1].get("key")
    utils.print_ok(f"Using subscription: {apimClientTool.apim_subscriptions[-1].get('name')}")
else:
    utils.print_error("No APIM subscriptions found. Please create a subscription first.")
    api_key = None

In [None]:
# Test each supported model via Universal LLM API
if api_key and updated_supported_models:
    utils.print_info(f"\nTesting {len(updated_supported_models)} models via Universal LLM API...\n")
    
    test_messages = [
        {"role": "system", "content": "You are a helpful assistant. Be concise."},
        {"role": "user", "content": "What is 2+2? Answer in one word."}
    ]
    
    for model_name in updated_supported_models[:3]:  # Test first 3 models
        utils.print_info(f"Testing model: {model_name}")
        
        payload = {
            "model": model_name,
            "messages": test_messages
        }
        
        try:
            response = requests.post(
                chat_completions_url_models,
                headers={"api-key": api_key},
                json=payload,
                timeout=60
            )
            
            utils.print_response_code(response)
            
            if response.status_code == 200:
                data = response.json()
                answer = data.get("choices", [{}])[0].get("message", {}).get("content", "No response")
                region = response.headers.get("x-ms-region", "unknown")
                print(f"  üí¨ Response: {answer}")
                print(f"  üìç Backend Region: {region}")
                utils.print_ok(f"Model '{model_name}' - SUCCESS\n")
            else:
                utils.print_error(f"Model '{model_name}' - FAILED: {response.text}\n")
                
        except Exception as e:
            utils.print_error(f"Model '{model_name}' - ERROR: {str(e)}\n")
else:
    utils.print_warning("Cannot run tests - missing API key or supported models")

<a id='test-openai'></a>
### üß™ Test via Azure OpenAI API (openai/deployments/{model}/chat/completions)

Test the deployed models using the Azure OpenAI compatible API which uses the deployment name in the URL path:

In [None]:
# Discover the Azure OpenAI API endpoint
try:
    apimClientTool.discover_api("openai")
    azure_endpoint_openai = str(apimClientTool.azure_endpoint)
    utils.print_info(f"Azure OpenAI API Base Endpoint: {azure_endpoint_openai}")
except Exception as e:
    utils.print_warning(f"Azure OpenAI API not found in APIM: {e}")
    azure_endpoint_openai = None

In [None]:
# Test models via Azure OpenAI API format
if api_key and azure_endpoint_openai and updated_supported_models:
    utils.print_info(f"\nTesting models via Azure OpenAI API format...\n")
    
    test_messages = [
        {"role": "system", "content": "You are a helpful assistant. Be concise."},
        {"role": "user", "content": "What is the capital of France? Answer in one word."}
    ]
    
    for model_name in updated_supported_models[:3]:  # Test first 3 models
        utils.print_info(f"Testing model: {model_name}")
        
        # Azure OpenAI format uses deployment name in URL path
        chat_completions_url_openai = f"{azure_endpoint_openai}openai/deployments/{model_name}/chat/completions?api-version={inference_api_version}"
        
        payload = {
            "messages": test_messages  # No model field needed - it's in the URL
        }
        
        try:
            response = requests.post(
                chat_completions_url_openai,
                headers={"api-key": api_key},
                json=payload,
                timeout=60
            )
            
            utils.print_response_code(response)
            
            if response.status_code == 200:
                data = response.json()
                answer = data.get("choices", [{}])[0].get("message", {}).get("content", "No response")
                region = response.headers.get("x-ms-region", "unknown")
                print(f"  üí¨ Response: {answer}")
                print(f"  üìç Backend Region: {region}")
                utils.print_ok(f"Model '{model_name}' - SUCCESS\n")
            else:
                utils.print_error(f"Model '{model_name}' - FAILED: {response.text}\n")
                
        except Exception as e:
            utils.print_error(f"Model '{model_name}' - ERROR: {str(e)}\n")
else:
    utils.print_warning("Cannot run Azure OpenAI API tests - missing API key, endpoint, or supported models")

<a id='test-sdk'></a>
### üß™ Test using Azure OpenAI Python SDK

Test using the official Azure OpenAI Python SDK:

In [None]:
from openai import AzureOpenAI

if api_key and azure_endpoint_openai and updated_supported_models:
    model_name = updated_supported_models[0]  # Use first available model
    utils.print_info(f"Testing with Azure OpenAI SDK using model: {model_name}")
    
    try:
        client = AzureOpenAI(
            azure_endpoint=azure_endpoint_openai,
            api_key=api_key,
            api_version=inference_api_version
        )
        
        response = client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "Say 'Hello from Azure OpenAI SDK!'"}
            ]
        )
        
        utils.print_ok("SDK Test Successful!")
        print(f"üí¨ Response: {response.choices[0].message.content}")
        print(f"üìä Usage: {response.usage.total_tokens} tokens")
        
    except Exception as e:
        utils.print_error(f"SDK Test Failed: {str(e)}")
else:
    utils.print_warning("Cannot run SDK test - missing prerequisites")

<a id='test-streaming'></a>
### üß™ Test Streaming Response

Test streaming responses using the Azure OpenAI SDK:

In [None]:
from openai import AzureOpenAI

if api_key and azure_endpoint_openai and updated_supported_models:
    model_name = updated_supported_models[0]  # Use first available model
    utils.print_info(f"Testing streaming with model: {model_name}")
    
    try:
        client = AzureOpenAI(
            azure_endpoint=azure_endpoint_openai,
            api_key=api_key,
            api_version=inference_api_version
        )
        
        start_time = time.time()
        
        response = client.chat.completions.with_raw_response.create(
            model=model_name,
            messages=[
                {"role": "user", "content": "Count from 1 to 10 with commas between each number."}
            ],
            stream=True
        )
        
        print(f"üì° x-ms-region: {response.headers.get('x-ms-region', 'unknown')}")
        print(f"üì° x-ms-stream: {response.headers.get('x-ms-stream', 'N/A')}")
        print("\nüí¨ Streaming response:")
        
        completion = response.parse()
        collected_content = []
        
        for chunk in completion:
            if chunk.choices and chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                collected_content.append(content)
                print(content, end='', flush=True)
        
        elapsed = time.time() - start_time
        print(f"\n\n‚úÖ Stream completed in {elapsed:.2f} seconds")
        print(f"üìù Full response: {''.join(collected_content)}")
        
    except Exception as e:
        utils.print_error(f"Streaming Test Failed: {str(e)}")
else:
    utils.print_warning("Cannot run streaming test - missing prerequisites")

---
## üìä Summary

This notebook completed the following tasks:

1. ‚úÖ **Extracted** current APIM backend-pools configurations
2. ‚úÖ **Generated** a customizable LLM backend parameter file (`.bicepparam`)
3. ‚úÖ **Deployed** the LLM onboarding Bicep templates
4. ‚úÖ **Tested** the deployed models through:
   - Universal LLM API (`/models/chat/completions`)
   - Azure OpenAI API (`/openai/deployments/{model}/chat/completions`)
   - Azure OpenAI Python SDK
   - Streaming responses

### Next Steps

- Modify the `llm_backends_config` in the initialization cell to add more backends
- Re-run the deployment cells to update the APIM configuration
- Use the generated parameter file as a template for CI/CD pipelines