# APIM ❤️ AI Foundry

## Session Awareness lab

Playground to demonstrate session awareness capabilities in [Azure API Management backend pools](https://learn.microsoft.com/azure/api-management/backends?tabs=bicep) when using the [OpenAI Responses API](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/responses) for multi-turn conversations.

### Overview

The OpenAI Responses API requires maintaining conversation state across multiple API calls by using a response ID from previous calls. When using a backend pool without session affinity, requests can be routed to different backend instances, breaking the conversation state. This lab demonstrates:

1. **Backend Pool without Session Affinity**: Shows how conversation state breaks when requests are distributed across different backends
2. **Backend Pool with Session Affinity**: Shows how session affinity ensures requests from the same conversation stay on the same backend

### Key Technologies Demonstrated

- **Azure API Management**: Load balancing and backend pool management
- **Session Affinity**: Cookie-based session stickiness in APIM
- **OpenAI Responses API**: Stateful API requiring conversation continuity
- **Multi-region Deployment**: AI Foundry instances in different Azure regions
- **Managed Identity**: Secure authentication to AI services

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)

▶️ Click `Run All` to execute all steps sequentially, or execute them `Step by Step`... 

<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management)
- Adjust the models and versions according the [availability by region.](https://learn.microsoft.com/azure/ai-services/openai/concepts/models) 

In [None]:
import os, sys, json, requests, time, subprocess
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}" # change the name to match your naming style
resource_group_location = "eastus"

aiservices_config = [
    {"name": "foundry1", "location": "eastus"},
    {"name": "foundry2", "location": "westus"}
]

models_config = [
    {"name": "gpt-4o-mini", "publisher": "OpenAI", "version": "2024-07-18", "sku": "GlobalStandard", "capacity": 1}
]

apim_sku = "Basicv2"  # Fixed: Use Basicv2 instead of BasicV2_1
apim_subscriptions_config = [
    {
        "name": "session-awareness-subscription",
        "displayName": "Session Awareness Lab Subscription",
        "allowTracing": True,
        "state": "active"
    }
]

print("✅ Variables initialized")
print(f"📍 Resource group: {resource_group_name}")
print(f"📍 Location: {resource_group_location}")
print(f"🤖 AI Services: {len(aiservices_config)} instances")
print(f"🚀 APIM SKU: {apim_sku}")

<a id='1'></a>
### 1️⃣ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    print(f"✅ Current user: {current_user}")
    print(f"✅ Tenant ID: {tenant_id}")
    print(f"✅ Subscription ID: {subscription_id}")

<a id='2'></a>
### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declaratively define all the resources that will be deployed in the specified resource group. This will create:
- 2 AI Foundry instances in different regions (eastus, westus)
- Each with a gpt-4o-mini model deployment
- Azure API Management instance
- Backend pool (initially without session affinity)
- Inference API with load balancing policies

The lab initially deploys with a standard backend pool. We'll enable session affinity later to demonstrate the difference.

In [None]:
# Create the resource group if it doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Set up deployment parameters
bicep_parameters_file = "params.json"

# Create the parameters for the Bicep template (ARM deployment parameters format)
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "aiServicesConfig": {"value": aiservices_config},
        "modelsConfig": {"value": models_config},
        "apimSku": {"value": apim_sku},
        "apimSubscriptionsConfig": {"value": apim_subscriptions_config},
        "inferenceAPIType": {"value": "AzureOpenAI"},
        "inferenceAPIPath": {"value": "inference"},
        "foundryProjectName": {"value": deployment_name}
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

print("✅ Parameters file created successfully")
print(f"📄 Parameters file: {bicep_parameters_file}")
print(f"🎯 Using APIM SKU: {apim_sku}")

# Run the deployment
output = utils.run(f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json",
    f"Deployment '{deployment_name}' succeeded", f"Deployment '{deployment_name}' failed")

<a id='3'></a>
### 3️⃣ Get the deployment outputs

Retrieve the required outputs from the Bicep deployment.

In [None]:
# Obtain all of the outputs from the deployment
output = utils.run(f"az deployment group show --name {deployment_name} -g {resource_group_name}", f"Retrieved deployment: {deployment_name}", f"Failed to retrieve deployment: {deployment_name}")

if output.success and output.json_data:
    deployment_outputs = output.json_data['properties']['outputs']
    
    # Extract values safely
    apim_service_id = deployment_outputs.get('apimServiceId', {}).get('value', 'not-found')
    apim_resource_gateway_url = deployment_outputs.get('apimResourceGatewayURL', {}).get('value', 'not-found')
    apim_subscriptions = deployment_outputs.get('apimSubscriptions', {}).get('value', [])
    foundry_instances = deployment_outputs.get('foundryInstances', {}).get('value', [])
    apim_service_name = deployment_outputs.get('apimServiceName', {}).get('value', 'not-found')
    
    # Get the subscription key for testing
    subscription_key = 'not-found'
    if apim_subscriptions and len(apim_subscriptions) > 0:
        if isinstance(apim_subscriptions[0], dict):
            subscription_key = apim_subscriptions[0].get('key', 'unknown')
        else:
            print(f"Warning: Unexpected subscription structure: {type(apim_subscriptions[0])}")
    
    print(f"📡 APIM Gateway URL: {apim_resource_gateway_url}")
    print(f"🔧 APIM Service Name: {apim_service_name}")
    print(f"🔑 Subscription Count: {len(apim_subscriptions) if apim_subscriptions else 0}")
    print(f"🧠 Foundry Instances: {len(foundry_instances) if foundry_instances else 0}")
    print(f"🗝️  Subscription Key: {subscription_key[:8]}..." if subscription_key != 'not-found' and subscription_key != 'unknown' else "❌ No subscription key found")
    
    if subscription_key != 'not-found' and apim_resource_gateway_url != 'not-found':
        print("\n✅ Ready to test session affinity behavior!")
    else:
        print("\n❌ Missing required deployment outputs")
else:
    print("❌ Failed to retrieve deployment outputs")
    raise Exception("Could not get deployment outputs")

<a id='4'></a>
### 4️⃣ Test Session Awareness Without Affinity

First, let's test the backend pool **without session affinity**. We'll use the OpenAI Responses API to start a conversation and then try to continue it. When requests hit different backends, the conversation state breaks because the `previous_response_id` only exists on the original backend.

In [None]:
import httpx
from openai import AzureOpenAI

# Custom transport to log backend routing info (like the demo)
class BackendLoggingTransport(httpx.HTTPTransport):
    def handle_request(self, request):
        cookies = request.headers.get('cookie', 'None')
        print(f"🍪 Request cookies: {cookies}")
        
        response = super().handle_request(request)
        
        # Log session affinity cookies
        set_cookies = response.headers.get_list('set-cookie')
        for cookie in set_cookies:
                print(f"🍪 Session cookie: {cookie}")
        
        # Log backend info from APIM policy
        request_id = response.headers.get('x-request-id', 'unknown')
        backend_region = response.headers.get('x-backend-region', 'unknown')
        
        print(f"🆔 Request ID: {request_id}")
        print(f"🌍 Backend Region: {backend_region}")
        print("-" * 50)
        
        return response

def get_response_text(response):
    """Extract text from the OpenAI Responses API response structure"""
    try:
        return response.output[0].content[0].text
    except (IndexError, AttributeError):
        return str(response)[:100] + "..."

def test_session_affinity():
    """Simple test based on the GitHub demo"""
    print("🧪 Testing Session Affinity with OpenAI Responses API")
    print("=" * 60)
    
    # Create HTTP client with cookie support and logging
    http_client = httpx.Client(
        transport=BackendLoggingTransport(),
    )
    
    # Create OpenAI client
    client = AzureOpenAI(
        api_key=subscription_key,
        base_url=f"{apim_resource_gateway_url}/inference/openai",
        api_version="2025-03-01-preview",
        http_client=http_client
    )
    
    try:
        print("\n1️⃣ Creating first response...")
        first_response = client.responses.create(
            model="gpt-4o-mini",
            input="Explain photosynthesis in simple terms"
        )
        
        print(f"✅ First response ID: {first_response.id}")
        print(f"📄 Content: {get_response_text(first_response)[:100]}...")
        
        print("\n2️⃣ Creating second response with previous_response_id...")
        second_response = client.responses.create(
            model="gpt-4o-mini", 
            input="Can you summarize that so it can be understood by a college freshman?",
            previous_response_id=first_response.id
        )
        
        print(f"✅ Second response ID: {second_response.id}")
        print(f"📄 Content: {get_response_text(second_response)[:100]}...")
        
        print("\n✅ SUCCESS: Session affinity is working!")
        print("💡 Both requests completed successfully, indicating session stickiness.")
        
    except Exception as e:
        print(f"❌ FAILURE: {e}")
        if "not found" in str(e).lower():
            print("💡 This is the expected error when session affinity is broken!")
            print("💡 The previous_response_id exists on a different backend.")
    
    finally:
        # Clean up
        http_client.close()
    
    print("\n" + "=" * 60)
    print("🔍 ANALYSIS:")
    print("- If you see different regions between requests → session affinity broken")
    print("- If you get 'not found' error → previous_response_id is on different backend") 
    print("- If both requests succeed → session affinity is working")
    print("- Check for APIM-Backend-Affinity cookies in the logs above")

# Run the test
test_session_affinity()

<a id='5'></a>
### 5️⃣ Enable Session Affinity on Backend Pool

Now let's enable session affinity on the backend pool. This configuration ensures that requests from the same client session are routed to the same backend instance, maintaining conversation state for the OpenAI Responses API.

Session affinity uses cookies to track which backend instance should handle requests from a specific client session.

In [None]:
# Enable session affinity on the backend pool
print("🔄 Enabling session affinity on backend pool...")

backend_pool_name = "inference-backend-pool"

# First get current backend pool configuration to preserve services
get_result = utils.run(
    f'az rest --method get --url "https://management.azure.com{apim_service_id}/backends/{backend_pool_name}?api-version=2023-05-01-preview"',
    "✅ Retrieved backend pool configuration",
    "❌ Failed to get backend pool configuration"
)

if get_result.success and get_result.json_data:
    current_config = get_result.json_data
    current_services = current_config.get('properties', {}).get('pool', {}).get('services', [])
    print(f"📋 Found {len(current_services)} services in backend pool")
    
    # Create the session affinity configuration with existing services
    session_affinity_config = {
        "properties": {
            "pool": {
                "services": current_services,  # Preserve existing services
                "sessionAffinity": {
                    "enabled": True,
                    "affinityType": "Cookie",
                    "cookieName": "APIM-Backend-Affinity",
                    "sessionId": {
                        "source": "cookie",  # Session ID comes from cookie
                        "name": "APIM-Backend-Affinity"  # Cookie name for session ID
                    }
                }
            }
        }
    }
else:
    print("⚠️ Could not get current services, using minimal configuration...")
    # Create minimal session affinity configuration
    session_affinity_config = {
        "properties": {
            "pool": {
                "sessionAffinity": {
                    "enabled": True,
                    "affinityType": "Cookie",
                    "cookieName": "APIM-Backend-Affinity",
                    "sessionId": {
                        "source": "cookie",
                        "name": "APIM-Backend-Affinity"
                    }
                }
            }
        }
    }

# Apply session affinity to the backend pool
result = utils.run(
    f'az rest --method patch --url "https://management.azure.com{apim_service_id}/backends/{backend_pool_name}?api-version=2023-05-01-preview" --body \'{json.dumps(session_affinity_config)}\'',
    "✅ Backend pool session affinity enabled",
    "❌ Failed to enable backend pool session affinity"
)

if result.success:
    print("✅ Session affinity enabled on backend pool!")
    print("🍪 Cookie-based session stickiness: APIM-Backend-Affinity")
    print("🎯 Ready to test session awareness!")
else:
    print("❌ Failed to enable session affinity on backend pool")

<a id='6'></a>
### 6️⃣ Test Session Affinity

Now let's test the session affinity to verify it's working correctly. With session affinity enabled, requests from the same client should stick to the same backend, allowing the OpenAI Responses API to maintain conversation state across multiple requests.

In [None]:
# Test session affinity behavior
test_session_affinity()

# 🎯 Lab Summary

## What We Accomplished

This lab demonstrated session awareness in Azure API Management backend pools using the OpenAI Responses API:

### ✅ **Core Functionality Tested:**
- **Backend Pool Load Balancing**: Multiple AI Foundry instances across regions
- **Session Affinity**: Cookie-based session stickiness in APIM
- **OpenAI Responses API**: Stateful conversations with response chaining
- **Managed Identity**: Secure authentication to AI services

### 🔧 **Key Configuration:**
- **Without Session Affinity**: Requests distribute randomly, breaking conversation state
- **With Session Affinity**: Requests stick to same backend, maintaining conversation state

### 💡 **When to Use Session Affinity:**
- OpenAI Responses API (multi-turn conversations)
- OpenAI Assistants API (thread-based conversations)  
- Any stateful AI workload requiring conversation continuity

### 🎉 **Success Criteria:**
- Without affinity: "not found" errors when using `previous_response_id`
- With affinity: Both requests succeed, same backend region, session cookies present

This simplified approach follows proven patterns and is much more reliable than complex diagnostic approaches.

<a id='cleanup'></a>
### 🗑️ Clean up resources

When you're finished with the lab, run the [clean-up-resources notebook](clean-up-resources.ipynb) to remove all deployed resources from Azure and avoid extra charges.