# APIM ❤️ OpenAI

## Advanced Load Balancing lab
![flow](../../images/advanced-load-balancing.gif)

Playground to try the advanced load balancing to either a list of Azure OpenAI endpoints or mock servers.

This load balancer is based on a custom policy configuration forked from this great [smart load balancer repo](https://github.com/andredewes/apim-aoai-smart-loadbalancing) and adds the following enhancements:
- Loads the load balancer configuration from a named value
- Uses backends to enable the combination with the built-in circuit breaking feature
- The policy doesn't have to be changed to add/modify endpoints or configure the load balancer 
- Dynamicly supports any number of OpenAI endpoints

### Prerequisites
- [Python 3.8 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) installed
- [An Azure Subscription](https://azure.microsoft.com/en-us/free/) with Contributor permissions
- [Access granted to Azure OpenAI](https://aka.ms/oai/access) or just enable the mock service
- [Sign in to Azure with Azure CLI](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli-interactively)

### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id
- The ```mock_webapps``` variable sets the list of deployed Web Apps for the mocking functionality. Clean the ```openai_resources``` list to simulate the OpenAI behaviour with the mocking service.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Adjust the OpenAI model and version according the [availability by region.](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) 

In [1]:
import os
import json
import datetime
import requests

resource_group_name = "lab-ai-gateway" # change the name to match your naming style
resource_group_location = "westeurope"
apim_resource_name = "apim13"
apim_resource_location = "westeurope"
apim_resource_sku = "Basicv2"
openai_resources = [ {"name": "openai1", "location": "swedencentral"}, {"name": "openai2", "location": "francecentral"}  ] # list of OpenAI resources to deploy. Clear this list to use only the mock resources
openai_lb_config = [ {"name": "openai1", "priority": 1, "weight": 100}, {"name": "openai2", "priority": 2, "weight": 300}  ] # advanced load balancer configuration for OpenAI resources that will be stored as a named value
openai_resources_sku = "S0"
openai_model_name = "gpt-35-turbo"
openai_model_version = "0613"
openai_deployment_name = "gpt-35-turbo"
openai_api_version = "2024-02-01"
openai_specification_url='https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/' + openai_api_version + '/inference.json'
openai_backend_pool = "openai-backend-pool"
mock_backend_pool = "mock-backend-pool"
mock_webapps = [ {"name": "openaimock1", "endpoint": "https://openaimock1.azurewebsites.net"}, {"name": "openaimock2", "endpoint": "https://openaimock2.azurewebsites.net"} ]
mock_lb_config = [ {"name": "openaimock1", "priority": 1, "weight": 100}, {"name": "openaimock2", "priority": 2, "weight": 300}  ] # advanced load balancer configuration for mock services that will be stored as a named value
deployment_name = os.path.basename(globals()['__vsc_ipynb_file__']).replace(".ipynb", "")


### 1️⃣ Create the Azure Resource Group
All resources deployed in this lab will be created in the specified resource group.

In [2]:
resource_group_stdout = ! az group create --name {resource_group_name} --location {resource_group_location}
if resource_group_stdout.n.startswith("ERROR"):
    print(resource_group_stdout)
else:
    print("✅ Azure Resource Grpup ", resource_group_name, " created ⌚ ", datetime.datetime.now().time())

✅ Azure Resource Grpup  lab-ai-gateway  created ⌚  23:20:41.398216


### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declarative define all the resources that will be deployed. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 

In [3]:
openai_spec = requests.get(openai_specification_url)
open("openai.json", "wb").write(openai_spec.content) # the openai.json will be saved with the latest OpenAI API specification for the version specified

with open("policy.xml", 'r') as policy_xml_file:
    policy_template_xml = policy_xml_file.read()
    policy_xml = policy_template_xml.replace("{backend-id}", openai_backend_pool if len(openai_resources) > 0 else mock_backend_pool)
    policy_xml_file.close()
open("policy.xml", 'w').write(policy_xml)

loadbalancer_config = openai_lb_config if len(openai_resources) > 0 else mock_lb_config
loadbalancer_config = json.dumps(loadbalancer_config).replace("\"", "'")

bicep_parameters = {
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "mockWebApps": { "value": mock_webapps },
    "mockBackendPoolName": { "value": mock_backend_pool },
    "openAIBackendPoolName": { "value": openai_backend_pool },
    "openAIConfig": { "value": openai_resources },
    "openAILoadBalancingConfigValue": { "value": loadbalancer_config },
    "openAIDeploymentName": { "value": openai_deployment_name },
    "openAISku": { "value": openai_resources_sku },
    "openAIModelName": { "value": openai_model_name },
    "openAIModelVersion": { "value": openai_model_version },
    "apimResourceName": { "value": apim_resource_name},
    "apimResourceLocation": { "value": apim_resource_location},
    "apimSku": { "value": apim_resource_sku}
  }
}
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

deployment_stdout = ! az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file "main.bicep" --parameters "params.json"
if deployment_stdout.n.find("ERROR"):
    print(deployment_stdout)
else:
    print("✅ Deployment finished on the resource group ", resource_group_name, " ⌚ ", datetime.datetime.now().time())
open("policy.xml", 'w').write(policy_template_xml)
os.remove('params.json')



### 3️⃣ Get the deployment outputs

We are now at the stage where we only need to retrieve the gateway URL and the subscription before we are ready for testing.

In [4]:
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.apimResourceGatewayURL.value -o tsv
apim_resource_gateway_url = deployment_stdout.n
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.apimSubscriptionKey.value -o tsv
apim_subscription_key = deployment_stdout.n
print("👉🏻 API Gateway URL: ", apim_resource_gateway_url)

👉🏻 API Gateway URL:  https://apim13-jfa7x6r74jtki.azure-api.net


### 🧪 Test the API using a direct HTTP call
Requests is an elegant and simple HTTP library for Python that will be used here to make raw API requests and inspect the responses.

In [44]:
url = apim_resource_gateway_url + "/openai/deployments/" + openai_deployment_name + "/chat/completions?api-version=" + openai_api_version
if len(openai_resources) > 0:
    messages={"messages":[
        {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]}
elif len(mock_webapps) > 0:
    messages={
        "messages": [
            {
                "role": "system", 
                "content": {
                    "simulation": {
                        "default": {"response_status_code": 200, "wait_time_ms": 0},
                        "openaimock1.azurewebsites.net": {"response_status_code": 429}
                    }
                }
            }
        ]
    }
response = requests.post(url, headers = {'api-key':apim_subscription_key}, json = messages)
print("status code: ", response.status_code)
print("headers ", response.headers)
print("x-ms-region: ", response.headers.get("x-ms-region")) # this header is useful to determine the region of the backend that served the request
print("x-ms-openai: ", response.headers.get("x-ms-openai")) 

if (response.status_code == 200):
    data = json.loads(response.text)
    print("response: ", data.get("choices")[0].get("message").get("content"))
else:
    print(response.text)

status code:  200
headers  {'Content-Length': '1052', 'Content-Type': 'application/json', 'Date': 'Wed, 10 Apr 2024 22:51:44 GMT', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'no-cache, must-revalidate', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'apim-request-id': '356491eb-528a-4e64-b4dc-65a80428760a', 'X-Content-Type-Options': 'nosniff', 'x-ms-region': 'France Central', 'x-ratelimit-remaining-requests': '16', 'x-ratelimit-remaining-tokens': '19920', 'x-accel-buffering': 'no', 'x-ms-rai-invoked': 'true', 'X-Request-ID': '7ad5b202-073d-4306-b007-11d085f16866', 'x-ms-client-request-id': 'Not-Set', 'azureml-model-session': 'd026-20240326120602', 'x-ms-openai': 'openai2'}
x-ms-region:  France Central
x-ms-openai:  openai2
response:  Sure, let me consult my vast knowledge and check the time for you... Oh, wait, sorry, I forgot that I don't care. Time is just a meaningless construct, isn't it? You can always rely on the sun's position or, you know

### 🧪 Test the API using the Azure OpenAI Python SDK
OpenAPI provides a widely used [Python library](https://github.com/openai/openai-python). The library includes type definitions for all request params and response fields. The goal of this test is to assert that APIM can seamlessly proxy requests to OpenAI without disrupting its functionality.
- Note: run ```pip install openai``` in a terminal before executing this step.

In [None]:
from openai import AzureOpenAI
if len(openai_resources) > 0:
    messages=[
        {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]
elif len(mock_webapps) > 0:
    messages=[
            {
                "role": "system", 
                "content": {
                    "simulation": {
                        "default": {"response_status_code": 200, "wait_time_ms": 0},
                        "openaimock1.azurewebsites.net": {"response_status_code": 429}
                    }
                }
            }
        ]
client = AzureOpenAI(
    azure_endpoint=apim_resource_gateway_url,
    api_key=apim_subscription_key,
    api_version=openai_api_version
)
response = client.chat.completions.create(model=openai_model_name, messages=messages)
print(response.choices[0].message.content)

### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered. Removing the resource group is the fastest way to remove all Azure resources that you have created.

In [6]:
run_cell = False # Set to False to avoid running this cell when executing the Run All command

if run_cell:
    def log(stdout, name, action):
        if stdout.n.startswith("ERROR"):
            print("👎🏻 ", name, " was NOT ", action, ": ", stdout)
        else:
            print("👍🏻 ", name, " was ", action, " ⌚ ", datetime.datetime.now().time())
    deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} -o json    
    deployment = json.loads(deployment_stdout.n)
    for resource in deployment.get("properties").get("outputResources"):
        resource_id = resource.get("id")
        try:
            query = "\"{type:type, name:name, location:location}\""
            resource_stdout = ! az resource show --id {resource_id} --query {query} -o json
            resource = json.loads(resource_stdout.n)
            resource_name = resource.get("name")
            resource_type = resource.get("type")  
            if resource_type == "Microsoft.CognitiveServices/accounts":
                resource_location = "\"" + resource.get("location") + "\""
                delete_stdout = ! az cognitiveservices account delete -g {resource_group_name} -n {resource_name}
                log(delete_stdout, resource_name, "deleted")
                delete_stdout = ! az cognitiveservices account purge -g {resource_group_name} -n {resource_name} -l {resource_location}
                log(delete_stdout, resource_name, "purged")
            elif resource_type == "Microsoft.ApiManagement/service":
                resource_location = "\"" + resource.get("location") + "\""
                delete_stdout = ! az apim delete -n {resource_name} -g {resource_group_name} -y
                log(delete_stdout, resource_name, "deleted")
                delete_stdout = ! az apim deletedservice purge --service-name {resource_name} --location {resource_location}
                log(delete_stdout, resource_name, "purged")
        except:
            print("✌🏻 ", resource_id, " ignored")
    delete_stdout = ! az group delete --name {resource_group_name} -y
    log(delete_stdout, resource_group_name, "deleted")



TypeError: 'NoneType' object is not iterable