# APIM ❤️ Azure AI Foundry

## Azure AI Foundry OpenAI Compatibility lab
![flow](../../images/azure-ai-foundry-openai-compatibility.gif)

Playground to configure [Azure AI Foundry](https://azure.microsoft.com/en-us/products/ai-foundry) with Azure API Management for optimal OpenAI API compatibility with VS Code extensions like [Cline](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev), LangChain, and other OpenAI-compatible clients. This lab demonstrates how to set up endpoints, configure model switching through deployment names, and provide seamless integration for development tools.

### Key Features
- ✅ **OpenAI API Compatibility**: Full compatibility with OpenAI SDK and extensions
- 🔄 **Model Switching**: Switch models by changing deployment names in the endpoint
- 🔧 **VS Code Extension Ready**: Optimized for Cline and other AI coding assistants
- 🛡️ **Secure**: Uses managed identity and Azure API Management for security
- 📊 **Monitoring**: Built-in token tracking and usage analytics

[View policy configuration](policy.xml)

### TOC
- [0️⃣ Initialize notebook variables](#0)
- [1️⃣ Create deployment using 🦾 Bicep](#1)
- [2️⃣ Get the deployment outputs](#2)
- [3️⃣ Install Python packages](#3)
- [4️⃣ Configure VS Code Extensions](#4)
- [🧪 Test OpenAI compatibility](#test)
- [📊 Monitor usage](#monitor)
- [🗑️ Clean up resources](#clean)

### Prerequisites
- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed
- [An Azure Subscription](https://azure.microsoft.com/free/) with Contributor permissions
- [Sign in to Azure with Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)
- Azure AI Foundry project with deployed models

<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Configure your Azure AI Foundry project details

In [None]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
from utils import get_unique_identifier

# Set your Azure subscription ID and resource group name
subscription_id = !az account show --query id --output tsv
subscription_id = subscription_id[0]

# Azure location and resource group name
location = "eastus"
resource_group_name = f"rg-ai-foundry-openai-compatibility-{get_unique_identifier()}"

# Azure AI Foundry configuration
# Replace these with your actual Azure AI Foundry project details
ai_foundry_project_name = "your-ai-foundry-project"  # Update this
ai_foundry_resource_group = "your-ai-foundry-rg"     # Update this
ai_foundry_subscription_id = subscription_id

# Available models in your Azure AI Foundry project
available_models = [
    "gpt-4o",
    "gpt-4o-mini", 
    "gpt-35-turbo",
    "gpt-4"
]

# Default model for testing
default_model = "gpt-4o"

# APIM Configuration
apim_sku = "Consumption"  # or "Developer", "Standard", "Premium"

print(f"🔍 Subscription ID: {subscription_id}")
print(f"📍 Location: {location}")
print(f"📦 Resource Group: {resource_group_name}")
print(f"🤖 AI Foundry Project: {ai_foundry_project_name}")
print(f"🎯 Default Model: {default_model}")

<a id='1'></a>
### 1️⃣ Create deployment using 🦾 Bicep

This step will create:
- Azure API Management instance
- Azure AI Foundry backend configuration
- OpenAI-compatible API with policies
- Subscriptions for VS Code extensions
- Application Insights for monitoring

In [None]:
# Create resource group
!az group create --name {resource_group_name} --location {location}

In [None]:
# Deploy the Bicep template
deployment_name = f"ai-foundry-openai-compatibility-{get_unique_identifier()}"

deployment_command = f"""az deployment group create \
  --resource-group {resource_group_name} \
  --template-file main.bicep \
  --name {deployment_name} \
  --parameters location={location} \
  --parameters apimSku={apim_sku} \
  --parameters aiFoundryConfig='{{"projectName":"{ai_foundry_project_name}","resourceGroupName":"{ai_foundry_resource_group}","subscriptionId":"{ai_foundry_subscription_id}"}}'"""

print(f"🚀 Deploying infrastructure...\n")
print(f"📋 Command: {deployment_command}\n")

!{deployment_command}

<a id='2'></a>
### 2️⃣ Get the deployment outputs

Extract the necessary information from the deployment to configure VS Code extensions.

In [None]:
# Get deployment outputs
outputs = !az deployment group show --resource-group {resource_group_name} --name {deployment_name} --query properties.outputs --output json
outputs_json = json.loads(outputs[0])

# Extract values
apim_service_name = outputs_json['apimServiceName']['value']
apim_gateway_url = outputs_json['apimResourceGatewayURL']['value']
subscription_keys = outputs_json['apimSubscriptionKeys']['value']
openai_endpoints = outputs_json['openaiCompatibleEndpoints']['value']
vscode_configs = outputs_json['vsCodeExtensionConfigs']['value']

# Get primary subscription key for Cline extension
cline_subscription_key = subscription_keys[0]['primaryKey']

print(f"✅ Deployment completed successfully!\n")
print(f"🔗 APIM Gateway URL: {apim_gateway_url}")
print(f"🔑 Subscription Key (Cline): {cline_subscription_key[:8]}...")
print(f"🎯 Base API URL: {openai_endpoints['baseUrl']}")

<a id='3'></a>
### 3️⃣ Install Python packages

Install required packages for testing OpenAI compatibility.

In [None]:
%pip install openai requests

<a id='4'></a>
### 4️⃣ Configure VS Code Extensions

Here are the configuration examples for popular VS Code extensions:

In [None]:
print("🔧 VS Code Extension Configurations\n")
print("=" * 50)

# Cline Extension Configuration
print("\n📱 Cline Extension (claude-dev)")
print("-" * 30)
cline_config = vscode_configs['clineExtension']['settings']
print("Add these settings to your VS Code settings.json:")
print(json.dumps(cline_config, indent=2))

# Generic OpenAI Configuration
print("\n🔌 Generic OpenAI-Compatible Extensions")
print("-" * 40)
generic_config = vscode_configs['genericOpenAI']
print(f"Base URL: {generic_config['endpoint']}")
print(f"API Key: {generic_config['apiKey'][:8]}...")
print(f"Model: {generic_config['model']}")

# Model switching examples
print("\n🔄 Model Switching Examples")
print("-" * 25)
for model in available_models:
    endpoint = openai_endpoints['deploymentUrlTemplate'].replace('{deployment-name}', model)
    print(f"Model '{model}': {endpoint}")

print("\n💡 Tips:")
print("- To switch models, change the deployment name in the endpoint URL")
print("- Use the base URL for extensions that send model name in request body")
print("- Use deployment-specific URLs for extensions that specify model via endpoint")

<a id='test'></a>
### 🧪 Test OpenAI compatibility

Test the API with both OpenAI SDK and direct HTTP requests to ensure compatibility.

In [None]:
# Test with OpenAI SDK
from openai import OpenAI
import time

# Initialize OpenAI client with our Azure AI Foundry endpoint
client = OpenAI(
    base_url=openai_endpoints['baseUrl'],
    api_key=cline_subscription_key
)

print("🧪 Testing OpenAI SDK compatibility...\n")

# Test messages
messages = [
    {"role": "system", "content": "You are a helpful assistant that responds concisely."},
    {"role": "user", "content": "What is Azure AI Foundry? Please respond in 2-3 sentences."}
]

try:
    start_time = time.time()
    
    # Make the API call
    response = client.chat.completions.create(
        model=default_model,
        messages=messages,
        max_tokens=150,
        temperature=0.7
    )
    
    response_time = time.time() - start_time
    
    print(f"✅ Success! Response time: {response_time:.2f} seconds")
    print(f"📝 Response: {response.choices[0].message.content}")
    
    if response.usage:
        print(f"📊 Token usage:")
        print(f"   - Total tokens: {response.usage.total_tokens}")
        print(f"   - Prompt tokens: {response.usage.prompt_tokens}")
        print(f"   - Completion tokens: {response.usage.completion_tokens}")

except Exception as e:
    print(f"❌ Error: {str(e)}")

In [None]:
# Test deployment-specific endpoint
import requests

print("🧪 Testing deployment-specific endpoint...\n")

# Test with deployment-specific URL
deployment_url = openai_endpoints['deploymentUrlTemplate'].replace('{deployment-name}', default_model)

headers = {
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': cline_subscription_key
}

payload = {
    'messages': [
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Hello! Can you confirm this API is working?'}
    ],
    'max_tokens': 50,
    'temperature': 0.5
}

try:
    response = requests.post(deployment_url, json=payload, headers=headers)
    
    if response.status_code == 200:
        result = response.json()
        print(f"✅ Deployment endpoint test successful!")
        print(f"📝 Response: {result['choices'][0]['message']['content']}")
        print(f"🔗 Endpoint: {deployment_url}")
    else:
        print(f"❌ HTTP {response.status_code}: {response.text}")

except Exception as e:
    print(f"❌ Error: {str(e)}")

In [None]:
# Test model switching
print("🔄 Testing model switching...\n")

for model in available_models[:2]:  # Test first 2 models to save tokens
    print(f"Testing model: {model}")
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": f"Hello! Which model are you? Just state your model name."}
            ],
            max_tokens=30
        )
        
        print(f"  ✅ {model}: {response.choices[0].message.content}")
        
    except Exception as e:
        print(f"  ❌ {model}: {str(e)}")
    
    print()

<a id='monitor'></a>
### 📊 Monitor usage

Check the Application Insights for usage metrics and logs.

In [None]:
# Get Application Insights information
app_insights_name = f"appi-{get_unique_identifier()}"

print("📊 Monitoring Information\n")
print("=" * 30)
print(f"🔍 Application Insights: {app_insights_name}")
print(f"📈 Resource Group: {resource_group_name}")
print(f"🌐 Portal Link: https://portal.azure.com/#@/resource/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Insights/components/{app_insights_name}")

print("\n📋 Available Metrics:")
print("- Token usage per subscription")
print("- Request latency")
print("- Error rates")
print("- Model usage distribution")
print("- Client IP tracking")

print("\n🔍 KQL Queries to try in Application Insights:")
print("\n1. Token usage by model:")
print('customMetrics | where name == "llm.usage.completion_tokens" | summarize sum(value) by tostring(customDimensions.Model)')

print("\n2. Request count by subscription:")
print('requests | summarize count() by tostring(customDimensions["Subscription ID"])')

print("\n3. Average response time:")
print('requests | summarize avg(duration) by bin(timestamp, 5m)')

### 🎯 Next Steps

Your Azure AI Foundry OpenAI-compatible API is now ready! Here's what you can do next:

#### 🔧 Configure VS Code Extensions
1. **Cline Extension**: Copy the settings from above to your VS Code settings.json
2. **Other Extensions**: Use the provided endpoint URLs and API key
3. **Test**: Try creating a chat completion in your extension

#### 🔄 Model Management
- Switch models by changing the deployment name in the endpoint URL
- Add new models to your Azure AI Foundry project
- Update the `available_models` list in this notebook

#### 📊 Monitoring
- Check Application Insights for usage metrics
- Set up alerts for high token usage
- Monitor response times and error rates

#### 🛡️ Security
- Rotate subscription keys regularly
- Configure IP restrictions if needed
- Set up rate limiting policies

#### 🚀 Production Considerations
- Upgrade to a higher APIM tier for production workloads
- Configure custom domains
- Set up backup regions for high availability

<a id='clean'></a>
### 🗑️ Clean up resources

When you're finished with the lab, remove all resources to avoid extra charges.

In [None]:
# Uncomment the line below to delete the resource group and all resources
# !az group delete --name {resource_group_name} --yes --no-wait

print(f"To clean up resources, run:")
print(f"az group delete --name {resource_group_name} --yes")
print(f"\nOr use the clean-up-resources notebook for a guided approach.")