# Getting metrics and logs from Azure API Management

## Built-in logging and metrics

Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided [notebook](openai-usage-analysis-workbook.json).

This is also to try the [emit token metric policy](https://learn.microsoft.com/en-us/azure/api-management/azure-openai-emit-token-metric-policy). The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

![](images/built-in-logging.gif)

Notes:
- Token count metrics include: Total Tokens, Prompt Tokens, and Completion Tokens.
- This policy supports OpenAI response streaming! Use the [streaming tool](../../tools/streaming.ipynb) to test and troubleshoot response streaming.
- Use the [tracing tool](../../tools/tracing.ipynb) to track the behavior and troubleshoot the [policy](policy.xml).

[View policy configuration](policy.xml)

<a id='2'></a>
### 1️⃣ Create deployment using Terraform

This lab uses Terraform to declaratively define all the resources that will be deployed. Change the [variables.tf](variables.tf) directly to try different configurations.

In [None]:
! $env:ARM_SUBSCRIPTION_ID=(az account show --query id -o tsv)   # if using Windows PowerShell
# ! setenv ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv) # if using macOS or Linux

! terraform init
! terraform apply -auto-approve

The following resources should be created.
![](images/resources.png)

<a id='3'></a>
### 3️⃣ Get the deployment outputs

We are now at the stage where we only need to retrieve the gateway URL and the subscription before we are ready for testing.

In [12]:
apim_resource_gateway_url = ! terraform output -raw apim_resource_gateway_url
apim_resource_gateway_url = apim_resource_gateway_url.n
print("👉🏻 APIM Resource Gateway URL: ", apim_resource_gateway_url)

app_insights_app_id = ! terraform output -raw app_insights_app_id
app_insights_app_id = app_insights_app_id.n
print("👉🏻 Application Insights App ID: ", app_insights_app_id)

apim_subscription_key_1 = ! terraform output -raw apim_subscription_key_1
apim_subscription_key_1 = apim_subscription_key_1.n
print("👉🏻 APIM Subscription Key 1: ", apim_subscription_key_1)

apim_subscription_key_2 = ! terraform output -raw apim_subscription_key_2
apim_subscription_key_2 = apim_subscription_key_2.n
print("👉🏻 APIM Subscription Key 2: ", apim_subscription_key_2)

apim_subscription_key_3 = ! terraform output -raw apim_subscription_key_3
apim_subscription_key_3 = apim_subscription_key_3.n
print("👉🏻 APIM Subscription Key 3: ", apim_subscription_key_3)

resource_group_name = ! terraform output -raw resource_group_name
resource_group_name = resource_group_name.n
print("👉🏻 Resource Group Name: ", resource_group_name)

app_insights_resource_name = ! terraform output -raw app_insights_resource_name
app_insights_resource_name = app_insights_resource_name.n
print("👉🏻 Application Insights Resource Name: ", app_insights_resource_name)

openai_api_version = "2024-10-21"
openai_model_name = "gpt-4o"
openai_deployment_name = "gpt-4o"

👉🏻 APIM Resource Gateway URL:  https://apim-genai-330.azure-api.net
👉🏻 Application Insights App ID:  cfd51794-ee37-4b68-bef3-545f29999084
👉🏻 APIM Subscription Key 1:  db4dffd4ab0349bb8a89ffbe44197fce
👉🏻 APIM Subscription Key 2:  50348efdb74e47d5b0a32cd692c46987
👉🏻 APIM Subscription Key 3:  5a610f0ad3be4827a51bc2cc796e216c
👉🏻 Resource Group Name:  rg-apim-genai-openai-330
👉🏻 Application Insights Resource Name:  app-insights


<a id='requests'></a>
### 🧪 Test the API using a direct HTTP call
Requests is an elegant and simple HTTP library for Python that will be used here to make raw API requests and inspect the responses. 

You will not see HTTP 429s returned as API Management's `retry` policy will select an available backend. If no backends are viable, an HTTP 503 will be returned.

Tip: Use the [tracing tool](../../tools/tracing.ipynb) to track the behavior of the backend pool.

In [13]:
import time
import os
import json
import datetime
import requests

runs = 10
url = apim_resource_gateway_url + "/openai/deployments/" + openai_deployment_name + "/chat/completions?api-version=" + openai_api_version
api_runs = []

for i in range(runs):
    print("▶️ Run:", i+1, "/", runs)
    

    messages={"messages":[
        {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]}

    start_time = time.time()
    response = requests.post(url, headers = {'api-key':apim_subscription_key_1}, json = messages)
    response_time = time.time() - start_time
    
    print(f"⌚ {response_time:.2f} seconds")
    # Check the response status code and apply formatting
    if 200 <= response.status_code < 300:
        status_code_str = '\x1b[1;32m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and green
    elif response.status_code >= 400:
        status_code_str = '\x1b[1;31m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and red
    else:
        status_code_str = str(response.status_code)  # No formatting

    # Print the response status with the appropriate formatting
    print("Response status:", status_code_str)
    
    print("Response headers:", response.headers)
    
    if "x-ms-region" in response.headers:
        print("x-ms-region:", '\x1b[1;31m'+response.headers.get("x-ms-region")+'\x1b[0m') # this header is useful to determine the region of the backend that served the request
        api_runs.append((response_time, response.headers.get("x-ms-region")))
    
    if (response.status_code == 200):
        data = json.loads(response.text)
        print("Token usage:", data.get("usage"), "\n")
        print("💬 ", data.get("choices")[0].get("message").get("content"), "\n")
    else:
        print(response.text)   

▶️ Run: 1 / 10
⌚ 4.04 seconds
Response status: [1;32m200 - OK[0m
Response headers: {'Content-Type': 'application/json', 'Date': 'Tue, 28 Jan 2025 20:28:12 GMT', 'Cache-Control': 'private', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'apim-request-id': 'dffa8c7b-b890-4b48-b1f1-58bb072977d9', 'X-Content-Type-Options': 'nosniff', 'x-accel-buffering': 'no', 'x-ms-rai-invoked': 'true', 'X-Request-ID': 'a561f1aa-dda5-4a71-b63d-8d668770d005', 'x-ms-region': 'Sweden Central', 'x-ratelimit-remaining-requests': '19', 'x-ratelimit-remaining-tokens': '19340', 'azureml-model-session': 'd007-20250114101137', 'x-envoy-upstream-service-time': '818', 'x-ms-client-request-id': 'Not-Set', 'Request-Context': 'appId=cid-v1:cfd51794-ee37-4b68-bef3-545f29999084'}
x-ms-region: [1;31mSweden Central[0m
Token usage: {'completion_tokens': 43, 'completion_tokens_details': {'accepted_predictio

<a id='sdk'></a>
### 🧪 Test the API using the Azure OpenAI Python SDK

Repeat the same test using the Python SDK to ensure compatibility.

In [14]:
import time
from openai import AzureOpenAI

runs = 3

for i in range(runs):
    print("▶️ Run: ", i+1)

    messages=[
        {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]

    client = AzureOpenAI(
        azure_endpoint=apim_resource_gateway_url,
        api_key=apim_subscription_key_3,
        api_version=openai_api_version
    )

    start_time = time.time()

    response = client.chat.completions.create(model=openai_model_name, messages=messages)
    
    response_time = time.time() - start_time
    print(f"⌚ {response_time:.2f} seconds")
    print("💬 ", response.choices[0].message.content)


▶️ Run:  1
⌚ 0.69 seconds
💬  Oh sure, let me just consult my crystal ball... Oh wait, it's broken. Why not check a clock like the highly advanced human you are?
▶️ Run:  2
⌚ 0.63 seconds
💬  Oh sure, let me just grab my non-existent watch and tell you. Spoiler alert: it’s "time to get your own clock."
▶️ Run:  3
⌚ 1.37 seconds
💬  Oh sure, let me just consult my magical time-telling crystal ball—that I totally have. Wait a sec... Nope, it’s broken. Guess you'll have to look at literally anything else that tells time.


<a id='kql'></a>
### 🔍 Analyze Application Insights requests

With this query you can get the request and response details including the prompt and the OpenAI completion. It also returns token counters.

In [15]:
import pandas as pd

query = "\"" + "requests  \
| project timestamp, duration, customDimensions \
| extend duration = round(duration, 2) \
| extend parsedCustomDimensions = parse_json(customDimensions) \
| extend apiName = tostring(parsedCustomDimensions.['API Name']) \
| extend apimSubscription = tostring(parsedCustomDimensions.['Subscription Name']) \
| extend userAgent = tostring(parsedCustomDimensions.['Request-User-agent']) \
| extend request_json = tostring(parsedCustomDimensions.['Request-Body']) \
| extend request = parse_json(request_json) \
| extend model = tostring(request.['model']) \
| extend messages = tostring(request.['messages']) \
| extend region = tostring(parsedCustomDimensions.['Response-x-ms-region']) \
| extend remainingTokens = tostring(parsedCustomDimensions.['Response-x-ratelimit-remaining-tokens']) \
| extend remainingRequests = tostring(parsedCustomDimensions.['Response-x-ratelimit-remaining-requests']) \
| extend response_json = tostring(parsedCustomDimensions.['Response-Body']) \
| extend response = parse_json(response_json) \
| extend promptTokens = tostring(response.['usage'].['prompt_tokens']) \
| extend completionTokens = tostring(response.['usage'].['completion_tokens']) \
| extend totalTokens = tostring(response.['usage'].['total_tokens']) \
| extend completion = tostring(response.['choices'][0].['message'].['content']) \
| project timestamp, apiName, apimSubscription, duration, userAgent, model, messages, completion, region, promptTokens, completionTokens, totalTokens, remainingTokens, remainingRequests \
| order by timestamp desc" + "\""

result_stdout = !  az monitor app-insights query --app {app_insights_app_id} --analytics-query {query} 
result = json.loads(result_stdout.n)

table = result.get('tables')[0]
pd.DataFrame(table.get("rows"), columns=[col.get("name") for col in table.get('columns')])


Unnamed: 0,timestamp,apiName,apimSubscription,duration,userAgent,model,messages,completion,region,promptTokens,completionTokens,totalTokens,remainingTokens,remainingRequests
0,2025-01-28T20:28:54.9551699Z,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,1143.39,AzureOpenAI/Python 1.41.0,gpt-4o,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just consult my magical time-t...",Sweden Central,30,42,72,11420,17
1,2025-01-28T20:28:53.7633397Z,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,409.92,AzureOpenAI/Python 1.41.0,gpt-4o,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just grab my non-existent watc...",Sweden Central,30,29,59,12080,18
2,2025-01-28T20:28:52.5601924Z,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,433.89,AzureOpenAI/Python 1.41.0,gpt-4o,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just consult my crystal ball.....",Sweden Central,30,30,60,12740,19
3,2025-01-28T20:28:20.3240797Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,580.06,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh, sure, because I totally have a built-in cl...",Sweden Central,30,38,68,13400,10
4,2025-01-28T20:28:19.6286636Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,481.25,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh, sure, let me just consult my invisible wat...",Sweden Central,30,25,55,14060,11
5,2025-01-28T20:28:18.8623472Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,567.0,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just consult my magical time-t...",Sweden Central,30,38,68,14720,12
6,2025-01-28T20:28:18.2489058Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,383.15,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh, sure! Let me just check my magical, non-ex...",Sweden Central,30,33,63,15380,13
7,2025-01-28T20:28:17.24887Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,797.85,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just consult my magical Time-O...",Sweden Central,30,39,69,16040,14
8,2025-01-28T20:28:15.6237887Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,1422.42,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me just tap into my imaginary clo...",Sweden Central,30,43,73,16700,15
9,2025-01-28T20:28:14.6327346Z,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,779.9,python-requests/2.32.3,,"[{""role"":""system"",""content"":""You are a sarcast...","Oh sure, let me consult my magical time-tellin...",Sweden Central,30,43,73,17360,16


<a id='portal'></a>
### 🔍 Open the workbook in the Azure Portal

Go to the application insights resource and under the Monitoring section select the Workbooks blade. You should see the OpenAI Usage Analysis workbook with the above query and some others to check token counts, performance, failures, etc.

<a id='sdk'></a>
### 🧪 Execute multiple runs for each subscription using the Azure OpenAI Python SDK

We will send requests for each subscription. Adjust the number of `runs` to your test scenario.


In [16]:
import time
from openai import AzureOpenAI
runs = 3

for i in range(runs):
    print("▶️ Run: ", i+1)
    messages=[
        {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]
    client = AzureOpenAI(azure_endpoint=apim_resource_gateway_url, api_key=apim_subscription_key_1, api_version=openai_api_version)
    response = client.chat.completions.create(model=openai_model_name, messages=messages, extra_headers={"x-user-id": "alex"})
    print("💬 ","for subscription 1: ", response.choices[0].message.content)

    client = AzureOpenAI(azure_endpoint=apim_resource_gateway_url, api_key=apim_subscription_key_2, api_version=openai_api_version)
    response = client.chat.completions.create(model=openai_model_name, messages=messages, extra_headers={"x-user-id": "alex"})
    print("💬 ","for subscription 2: ", response.choices[0].message.content)

    client = AzureOpenAI(azure_endpoint=apim_resource_gateway_url, api_key=apim_subscription_key_3, api_version=openai_api_version)
    response = client.chat.completions.create(model=openai_model_name, messages=messages, extra_headers={"x-user-id": "alex"})
    print("💬 ","for subscription 3: ", response.choices[0].message.content)


▶️ Run:  1
💬  for subscription 1:  Oh, absolutely! Let me just consult my magical, non-existent watch that runs on pure vibes. Yep, it's currently *who knows o'clock*! You're welcome.
💬  for subscription 2:  Oh, sure, let me check my invisible watch for you. Oh wait, I forgot—I’m not your clock. Try looking at, I don’t know, any device around you? Revolutionary idea, I know.
💬  for subscription 3:  Oh, sure, let me just magically pull a clock out of thin air for you. Would you like me to predict the future too, or is that enough? Why not just look at the thing literally every device around you shows—THE TIME.
▶️ Run:  2
💬  for subscription 1:  Oh, sure, let me just consult my imaginary watch that runs on fairy dust. Yeah, no clue. Maybe try asking a clock?
💬  for subscription 2:  Oh sure, let me just consult my imaginary watch. Yep, it's precisely "figure-it-out-yourself" o'clock.
💬  for subscription 3:  Oh, sure! Let me just consult my magical time-telling powers that don't exist. Why

<a id='kql'></a>
### 🔍 Analyze Application Insights custom metrics with a KQL query

With this query you can get the custom metrics that were emitted by Azure APIM.

In [19]:
import pandas as pd
import json

query = "\"" + "customMetrics  \
| where name == 'Total Tokens' \
| extend parsedCustomDimensions = parse_json(customDimensions) \
| extend clientIP = tostring(parsedCustomDimensions.['Client IP']) \
| extend apiId = tostring(parsedCustomDimensions.['API ID']) \
| extend apimSubscription = tostring(parsedCustomDimensions.['Subscription ID']) \
| extend UserId = tostring(parsedCustomDimensions.['User ID']) \
| project timestamp, value, clientIP, apiId, apimSubscription, UserId \
| order by timestamp asc" + "\""

result_stdout = ! az monitor app-insights query --app {app_insights_resource_name} -g {resource_group_name} --analytics-query {query} 
result = json.loads(result_stdout.n)

table = result.get('tables')[0]
df = pd.DataFrame(table.get("rows"), columns=[col.get("name") for col in table.get('columns')])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M')

df


Unnamed: 0,timestamp,value,clientIP,apiId,apimSubscription,UserId
0,20:28,672,176.177.25.47,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,
1,20:28,191,176.177.25.47,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,
2,20:35,141,176.177.25.47,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,alex
3,20:35,199,176.177.25.47,api-azure-openai,a62ca781-d4f2-4f1b-bb49-0416b0352d70,alex
4,20:35,178,176.177.25.47,api-azure-openai,492fda85-596c-4b95-9e9d-bcde843e6a5a,alex
5,20:36,67,176.177.25.47,api-azure-openai,c43d9b01-290e-4e43-9941-6178cfad08fd,alex


### 🔍 See the metrics on the Azure Portal

Open the Application Insights resource, navigate to the Metrics blade, then select the defined namespace (openai). Choose the metric "Total Tokens" with a Sum aggregation. Then, apply splitting by 'Subscription Id' to view values for each dimension.

![result](images/result.png)


## View logs and metrics on Azure Managed Grafana dashboard

You can also view the logs and metrics on the Azure Managed Grafana dashboard. Navigate to the Azure Managed Grafana. Login, then go to the dashboards section. Then import the following 2 dashboards:
1. Dashboard with ID : 16604 to view logs and metrics from API Management.
2. Dashboard from JSON file [./grafana-dashboard-apim-openai.json](./grafana-dashboard-apim-openai.json) to view logs and metrics from Application Insights.
   Alternatively, you can also import the dashbord from the [Grafana dashboard ID: 22552](https://grafana.com/grafana/dashboards/22552).

![](images/grafana-dashboard-openai.png)