# APIM ‚ù§Ô∏è Microsoft Foundry

## AI Foundry with APIM Model Gateway lab - Static Model Listing

![flow](../../images/foundry-model-gateway.gif)

This lab demonstrates how to configure Azure API Management (APIM) as a **Model Gateway** for Azure AI Foundry. By connecting APIM as a model gateway, you can leverage APIM's enterprise-grade features including rate limiting, caching, monitoring, security policies, and load balancing for AI model inference requests made through Microsoft Foundry Agents Service.

### What you'll learn:

- Deploy Azure AI Foundry with AI Services and model deployments
- Configure Azure API Management with Developer SKU
- Connect APIM as a Model Gateway to Foundry
- Test inference requests through the APIM gateway
- Monitor and trace gateway traffic

### Architecture:

```
Foundry Agent ‚Üí APIM Gateway (Model Gateway) ‚Üí AI Services ‚Üí OpenAI Models
                      ‚Üì
              Policies, Monitoring,
              Rate Limiting, Caching
```

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r ../../requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)

‚ñ∂Ô∏è Click `Run All` to execute all steps sequentially, or execute them `Step by Step`...

<a id='0'></a>
### 0Ô∏è‚É£ Initialize notebook variables

Adjust the configuration below to customize your deployment.

In [None]:
#### Install required packages for Foundry v2
%pip install requests 'azure-ai-agents>=1.2.0b5' 'azure-ai-projects>=2.0.0b1'

In [None]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}"
resource_group_location = "swedencentral"

# AI Services configuration
aiservices_config = [{"name": "models-foundry", "location": "swedencentral", "weight": 1},    ## Models Foundry will be the inferencing AI service.
                      {"name": "agents-foundry", "location": "swedencentral", "weight": 0}]   ## Agents Foundry shouldn't have any models deployed to it. 

# Models configuration
models_config = [
    {"name": "gpt-4o-mini", "publisher": "OpenAI", "version": "2024-07-18", "sku": "GlobalStandard", "capacity": 10, "aiservice": "models-foundry"},
    {"name": "gpt-4.1-mini", "publisher": "OpenAI", "version": "2025-04-14", "sku": "GlobalStandard", "capacity": 10, "aiservice": "models-foundry"}
]

# APIM configuration
apim_sku = 'Basicv2'  # Using Basicv2 SKU as specified
apim_subscriptions_config = [
    {"name": "foundry-subscription", "displayName": "Foundry AI Gateway Subscription"}
]

# API configuration
inference_api_path = "inference"
inference_api_type = "PassThrough"
inference_api_version = "v1"
foundry_project_name = "foundry-project"

utils.print_ok('Notebook initialized')

<a id='1'></a>
### 1Ô∏è‚É£ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")

<a id='2'></a>
### 2Ô∏è‚É£ Create deployment using ü¶æ Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declaratively define all the resources including:

- **Log Analytics Workspace** - For centralized logging
- **Application Insights** - For monitoring and telemetry
- **API Management (Developer SKU)** - AI Gateway with enterprise features
- **AI Foundry** - Azure AI Services with project management
- **Model Deployments** - OpenAI models (GPT-4o-mini)
- **Model Gateway Connection** - Connects APIM as a gateway to Foundry

The deployment also configures RBAC permissions and creates the necessary connections. Change the parameters above or the [main.bicep](main.bicep) directly to try different configurations.

In [None]:
# Create the resource group if it doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Define the Bicep parameters
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "apimSku": { "value": apim_sku },
        "aiServicesConfig": { "value": aiservices_config },
        "modelsConfig": { "value": models_config },
        "apimSubscriptionsConfig": { "value": apim_subscriptions_config },
        "inferenceAPIPath": { "value": inference_api_path },
        "inferenceAPIType": { "value": inference_api_type },
        "foundryProjectName": { "value": foundry_project_name },
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters, indent=2))

# Run the deployment
output = utils.run(
    f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json",
    f"Deployment '{deployment_name}' succeeded", 
    f"Deployment '{deployment_name}' failed"
)

<a id='3'></a>
### 3Ô∏è‚É£ Get the deployment outputs

Retrieve the deployment outputs including gateway URLs, subscription keys, and Foundry endpoints.

In [None]:
# Obtain all of the outputs from the deployment
output = utils.run(
    f"az deployment group show --name {deployment_name} -g {resource_group_name}", 
    f"Retrieved deployment: {deployment_name}", 
    f"Failed to retrieve deployment: {deployment_name}"
)

if output.success and output.json_data:
    log_analytics_id = utils.get_deployment_output(output, 'logAnalyticsWorkspaceId', 'Log Analytics Id')
    apim_service_id = utils.get_deployment_output(output, 'apimServiceId', 'APIM Service Id')
    apim_resource_gateway_url = utils.get_deployment_output(output, 'apimResourceGatewayURL', 'APIM API Gateway URL')
    foundry_project_endpoint = utils.get_deployment_output(output, 'agentsfoundryProjectEndpoint', 'Foundry Project Endpoint')
    foundry_ai_services_endpoint = utils.get_deployment_output(output, 'agentsfoundryAIServicesEndpoint', 'Foundry AI Services Endpoint')
    model_gateway_url = utils.get_deployment_output(output, 'aiGatewayUrl', 'AI Gateway URL (APIM)')
    model_gateway_connection = utils.get_deployment_output(output, 'aiGatewayConnectionName', 'AI Gateway Connection Name')
    
    apim_subscriptions = json.loads(utils.get_deployment_output(output, 'apimSubscriptions').replace("\'", "\""))
    for subscription in apim_subscriptions:
        subscription_name = subscription['name']
        subscription_key = subscription['key']
        utils.print_info(f"Subscription Name: {subscription_name}")
        utils.print_info(f"Subscription Key: ****{subscription_key[-4:]}")
    
    api_key = apim_subscriptions[0].get("key")
    
    utils.print_ok("\n‚úÖ AI Gateway Configuration Complete!")
    utils.print_info(f"APIM is now configured as a AI Gateway for Foundry")
    utils.print_info(f"Connection Name: {model_gateway_connection}")

<a id='test'></a>
### üß™ Test the APIM Model Gateway

Now let's test the gateway by sending a chat completion request through APIM. The request will flow through APIM's policies before reaching the AI Services endpoint.

**Tip:** Use the [tracing tool](../../tools/tracing.ipynb) to track the behavior and troubleshoot the [policy](policy.xml).

In [None]:
import requests
import json

# Prepare the request headers
headers = {
    "Content-Type": "application/json",
    "api-key": api_key
}

for model in models_config:

    # Construct the full API URL
    api_url = f"{model_gateway_url}/openai/deployments/{model['name']}/chat/completions?api-version=2024-12-01-preview"

    # Prepare the request body
    payload = {
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful AI assistant."
            },
            {
                "role": "user",
                "content": "Tell me a short joke about AI."
            }
        ],
        "max_tokens": 500,
        "temperature": 0.7,
        "model": model['name']
    }

    utils.print_info(f"üöÄ Sending request to: {api_url}")

    # Send the request
    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        
        result = response.json()
        
        utils.print_ok("Request successful!")
        utils.print_info(f"Model: {result['model']}")
        utils.print_info(f"Completion tokens: {result['usage']['completion_tokens']}")
        utils.print_info(f"Total tokens: {result['usage']['total_tokens']}")
        utils.print_info(f"üìù Response:")
        print(result['choices'][0]['message']['content'] + "\n")
        
    except requests.exceptions.RequestException as e:
        utils.print_error(f"‚ùå Request failed: {str(e)}")
        if hasattr(e.response, 'text'):
            utils.print_error(f"Response: {e.response.text}")

In [None]:
# Before running the sample:
#    pip install --pre azure-ai-projects>=2.0.0b1
#    pip install azure-identity

import os
import json
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import PromptAgentDefinition

agent_id=None

credential = DefaultAzureCredential()

project_client = AIProjectClient(
    endpoint=foundry_project_endpoint,
    credential=credential,
)

### Set the model deployment name environment variable
### The model name is now a path format of {model_gateway_connection}/{model_name} - only usable from SDK atm
model = f'{model_gateway_connection}/gpt-4.1-mini'

agent = project_client.agents.create_version(
    agent_name="my-test-agent",
    definition=PromptAgentDefinition(
        model=model,
        instructions='you are a helpful but sarcastic assistant.'
    ),
)
print(f"V2 Agent created (id: {agent.id}, name: {agent.name}, version: {agent.version})")

agents = project_client.agents.list()
print(f"\nüìã List of agents in project '{foundry_project_endpoint}':")
for agent in agents:
    print(f"- {agent.name} (id: {agent.id})")

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

project_client = AIProjectClient(
    endpoint=foundry_project_endpoint,
    credential=DefaultAzureCredential(),
)

myAgent = "my-test-agent"
# Get an existing agent
agent = project_client.agents.get(agent_name=myAgent)
utils.print_ok(f"Retrieved agent: {agent.name}")

openai_client = project_client.get_openai_client()

# it's not required but helps with the traceability of conversations in the logs to have a conversation id, especially when testing with multiple requests.
conversation_id = openai_client.conversations.create().id

# Reference the agent to get a response
response = openai_client.responses.create(
    input=[{"role": "user", "content": "Tell me what you can help with. and tell me a funny short joke."}],
    conversation=conversation_id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)

utils.print_info(f"üìù Response: {response.output_text}")

<a id='verify'></a>
### üîç Verify Model Gateway Connection in Azure Portal

You can verify the Model Gateway connection in the Azure Portal:

1. Navigate to your **AI Services** resource in the Azure Portal
2. Go to **Connections** under the **Resource Management** section
3. Look for the connection named `ai-gateway`
4. Verify the connection type is **ApiManagement**
5. Check that the target URL points to your APIM gateway

The connection enables Foundry Agents Service to route model requests through APIM, giving you:
- **Rate limiting** and throttling
- **Caching** for improved performance
- **Monitoring** and analytics
- **Security policies** and authentication
- **Load balancing** across multiple backends

<a id='monitor'></a>
### üìä Monitor Gateway Traffic

View gateway metrics and logs in:

- **Application Insights**: Telemetry, performance metrics, and distributed tracing
- **Log Analytics**: Query logs using KQL
- **APIM Analytics**: Built-in gateway analytics and dashboards

Example KQL query for Log Analytics:

```kusto
ApiManagementGatewayLogs
| where OperationId == "chat-completions"
| project TimeGenerated, Method, Url, BackendResponseCode, ResponseSize, TotalTime
| order by TimeGenerated desc
```

<a id='clean'></a>
### üóëÔ∏è Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.

Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.