# APIM ‚ù§Ô∏è all the Models

## Google Gemini API lab
![flow](../../images/google-gemini-api.gif)

Playground to try the [Google Gemini models](https://learn.microsoft.com/en-us/azure/api-management/openai-compatible-google-gemini-api) with the AI Gateway using the OpenAI-compatible endpoint.

This lab demonstrates how to import an OpenAI-compatible Google Gemini API into Azure API Management, enabling you to:
- Use the familiar OpenAI SDK and API format with Gemini models
- Apply Azure API Management policies for rate limiting, caching, and security
- Monitor token consumption and usage through Azure Monitor

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)
- [Google AI Studio API Key](https://aistudio.google.com/apikey) - Create a Gemini API key

‚ñ∂Ô∏è Click `Run All` to execute all steps sequentially, or execute them `Step by Step`...

<a id='0'></a>
### 0Ô∏è‚É£ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Obtain a [Gemini API Key](https://aistudio.google.com/apikey) to be able to use Gemini models

**IMPORTANT:** Please DO NOT check in the Notebook with your API Key still in the cell below

In [None]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}" # change the name to match your naming style
resource_group_location = "eastus2"

# üîë Add your Gemini API key here (get one from https://aistudio.google.com/apikey)
gemini_api_key = "xxxxxxxxx"  # Replace with your Gemini API key

# Gemini model configuration
gemini_model = "gemini-3-flash-preview"  # Available models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, etc.
gemini_api_path = "gemini/openai"  # API path in APIM

# APIM configuration
apim_sku = "Basicv2"  # Change to your desired APIM SKU
apim_subscriptions_config = [
    {"name": "subscription1", "displayName": "Subscription 1"}, 
    {"name": "subscription2", "displayName": "Subscription 2"}
]

utils.print_ok('Notebook initialized')

<a id='1'></a>
### 1Ô∏è‚É£ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")

<a id='2'></a>
### 2Ô∏è‚É£ Create deployment using ü¶æ Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declaratively define all the resources that will be deployed in the specified resource group. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 

In [None]:
# Create the resource group if doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Define the Bicep parameters
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "apimSku": { "value": apim_sku },
        "geminiApiKey": { "value": gemini_api_key },
        "geminiAPIPath": { "value": gemini_api_path },
        "apimSubscriptionsConfig": { "value": apim_subscriptions_config },
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

# Run the deployment
output = utils.run(f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json --verbose",
    f"Deployment '{deployment_name}' succeeded", f"Deployment '{deployment_name}' failed")

<a id='3'></a>
### 3Ô∏è‚É£ Get the deployment outputs

Retrieve the required outputs from the Bicep deployment.

In [None]:
# Obtain all of the outputs from the deployment
output = utils.run(f"az deployment group show --name {deployment_name} -g {resource_group_name}", 
                   f"Retrieved deployment: {deployment_name}", 
                   f"Failed to retrieve deployment: {deployment_name}")

if output.success and output.json_data:
    log_analytics_id = utils.get_deployment_output(output, 'logAnalyticsWorkspaceId', 'Log Analytics Id')
    apim_resource_name = utils.get_deployment_output(output, 'apimResourceName', 'APIM Resource Name')
    apim_service_id = utils.get_deployment_output(output, 'apimServiceId', 'APIM Service Id')
    apim_resource_gateway_url = utils.get_deployment_output(output, 'apimResourceGatewayURL', 'APIM API Gateway URL')
    app_insights_name = utils.get_deployment_output(output, 'applicationInsightsName', 'Application Insights Name')
    
    apim_subscriptions = json.loads(utils.get_deployment_output(output, 'apimSubscriptions').replace("\'", "\""))
    for subscription in apim_subscriptions:
        subscription_name = subscription['name']
        subscription_key = subscription['key']
        utils.print_info(f"Subscription Name: {subscription_name}")
        utils.print_info(f"Subscription Key: ****{subscription_key[-4:]}")
    api_key = apim_subscriptions[0].get("key") # default api key to the first subscription key

<a id='4'></a>
### üß™ Test the Gemini API with OpenAI SDK

Test the Gemini API using the OpenAI Python SDK. Since Gemini provides an OpenAI-compatible endpoint, you can use the familiar OpenAI SDK to interact with Gemini models.

In [None]:
from openai import OpenAI

# Create the OpenAI client pointing to the APIM gateway
# Note: We use the standard OpenAI client (not AzureOpenAI) since Gemini uses the standard OpenAI API format
# We pass the APIM subscription key via the default_headers since APIM expects it in the 'api-key' header
client = OpenAI(
    base_url=f"{apim_resource_gateway_url}/{gemini_api_path}",
    api_key="not-used",  # Required by OpenAI client but we use default_headers for APIM
    default_headers={"api-key": api_key}
)

# Make a chat completion request
response = client.chat.completions.create(
    model=gemini_model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Azure API Management and how can it help with AI workloads?"}
    ],
    max_tokens=500
)

# Print the response
print("üí¨ Response:")
print(response.choices[0].message.content)
print(f"\nüíµ Usage - Prompt Tokens: {response.usage.prompt_tokens}, Completion Tokens: {response.usage.completion_tokens}, Total Tokens: {response.usage.total_tokens}")

<a id='5'></a>
### üß™ Test with Streaming

With a streaming API call, the response is sent back incrementally in chunks via an [event stream](https://developer.mozilla.org/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). In Python, you can iterate over these events with a for loop.

In [None]:
from openai import OpenAI

# Create the OpenAI client pointing to the APIM gateway
client = OpenAI(
    base_url=f"{apim_resource_gateway_url}/{gemini_api_path}",
    api_key="not-used",
    default_headers={"api-key": api_key}
)

# Make a streaming chat completion request
stream = client.chat.completions.create(
    model=gemini_model,
    messages=[
        {"role": "system", "content": "You are a creative storyteller."},
        {"role": "user", "content": "Tell me a short story about a robot learning to paint."}
    ],
    max_tokens=300,
    stream=True
)

print("üí¨ Streaming Response:")
for chunk in stream:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="", flush=True)

print("\n\n‚úÖ Streaming complete!")

<a id='6'></a>
### üß™ Test with direct HTTP requests

You can also make direct HTTP requests to the Gemini API through APIM using the OpenAI-compatible chat completions endpoint.

In [None]:
import requests

# Construct the API URL
api_url = f"{apim_resource_gateway_url}/{gemini_api_path}/chat/completions"

# Set up headers
headers = {
    "Content-Type": "application/json",
    "api-key": api_key
}

# Request payload
payload = {
    "model": gemini_model,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "How are you today?"}
    ],
    "max_tokens": 100
}

# Make the request
response = requests.post(api_url, headers=headers, json=payload)

if response.status_code == 200:
    data = response.json()
    print("üí¨ Response:")
    print(data['choices'][0]['message']['content'])
    print(f"\nüíµ Total Tokens: {data['usage']['total_tokens']}")
else:
    print(f"‚ùå Error: {response.status_code}")
    print(response.text)

<a id='7'></a>
### üìä View Token Metrics in Azure Monitor

The API policy emits token metrics that can be viewed in Azure Monitor. Run the following query to see token usage over time.

In [None]:
import time
import pandas as pd

# Wait a few seconds for metrics to be ingested
print("‚è≥ Waiting for metrics to be ingested (this may take a few minutes)...")
time.sleep(10)

# Query Application Insights for token metrics (emitted by azure-openai-emit-token-metric policy)
query = "\"" + "customMetrics \
| where name == 'Total Tokens' \
| extend parsedCustomDimensions = parse_json(customDimensions) \
| extend clientIP = tostring(parsedCustomDimensions.['Client IP']) \
| extend apiId = tostring(parsedCustomDimensions.['API ID']) \
| extend apimSubscription = tostring(parsedCustomDimensions.['Subscription ID']) \
| extend UserId = tostring(parsedCustomDimensions.['User ID']) \
| project timestamp, value, clientIP, apiId, apimSubscription, UserId \
| order by timestamp desc \
| take 20" + "\""

output = utils.run(f"az monitor app-insights query --app {app_insights_name} -g {resource_group_name} --analytics-query {query}",
    "Retrieved token metrics", "Failed to retrieve token metrics (metrics may take a few minutes to appear)")

if output.success and output.json_data:
    table = output.json_data['tables'][0]
    df = pd.DataFrame(table.get("rows"), columns=[col.get("name") for col in table.get('columns')])
    if not df.empty:
        df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%Y-%m-%d %H:%M:%S')
        print("üìä Token Metrics:")
        print(df.to_string(index=False))
    else:
        print("‚ö†Ô∏è No token metrics found yet. Metrics may take a few minutes to appear in Application Insights.")

### üóëÔ∏è Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.