# APIM ❤️ all the Models

## AWS Bedrock lab
![flow](../../images/aws-bedrock.gif)

Playground to try the [AWS Bedrock models](https://learn.microsoft.com/en-us/azure/api-management/amazon-bedrock-passthrough-llm-api) with the AI Gateway. Consumed tokens, prompts and completions are logged automatically into Azure Monitor.

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)

#### Create AWS Access Keys
1. Navigate to AWS IAM Console
2. Go to Users → Your Username → Security Credentials
3. Create Access Key
4. Copy both Access Key ID and Secret Access Key to the variables bellow
5. Adjust the region, model id and ensure that access is granted

▶️ Click `Run All` to execute all steps sequentially, or execute them `Step by Step`... 


<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Adjust the models and versions according the [availability by region.](https://learn.microsoft.com/azure/ai-services/openai/concepts/models) 

In [None]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}" # change the name to match your naming style
resource_group_location = "westeurope"

apim_sku = 'Basicv2'
apim_subscriptions_config = [{"name": "subscription1", "displayName": "Subscription 1"}]

inference_api_path = "inference"  # path to the inference API in the APIM service
inference_api_version = "2025-03-01-preview"

aws_bedrock_access_key = '' # this value will be stored as APIM secure named value. In production consider using Key Vault
aws_bedrock_secret_key = '' # this value will be stored as APIM secure named value. In production consider using Key Vault
aws_bedrock_model_id = 'us.anthropic.claude-3-5-haiku-20241022-v1:0' # change to your inference profile
aws_bedrock_region = 'us-east-1' # change to the region where the model is being used
aws_bedrock_service_url = f'https://bedrock-runtime.{aws_bedrock_region}.amazonaws.com'
aws_bedrock_service_name = 'bedrock-runtime'

utils.print_ok('Notebook initialized')

<a id='1'></a>
### 1️⃣ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")

<a id='2'></a>
### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declarative define all the resources that will be deployed in the specified resource group. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 

In [None]:
# Create the resource group if doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Define the Bicep parameters
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "apimSku": { "value": apim_sku },
        "apimSubscriptionsConfig": { "value": apim_subscriptions_config },
        "inferenceAPIPath": { "value": inference_api_path },
        "awsBedrockAccessKey": { "value": aws_bedrock_access_key },
        "awsBedrockSecretKey": { "value": aws_bedrock_secret_key },
        "awsBedrockServiceURL": { "value": aws_bedrock_service_url }
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

# Run the deployment
output = utils.run(f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json",
    f"Deployment '{deployment_name}' succeeded", f"Deployment '{deployment_name}' failed")

<a id='3'></a>
### 3️⃣ Get the deployment outputs

We are now at the stage where we only need to retrieve the gateway URL and the subscription before we are ready for testing.

In [None]:
# Obtain all of the outputs from the deployment
output = utils.run(f"az deployment group show --name {deployment_name} -g {resource_group_name}", f"Retrieved deployment: {deployment_name}", f"Failed to retrieve deployment: {deployment_name}")

if output.success and output.json_data:
    log_analytics_id = utils.get_deployment_output(output, 'logAnalyticsWorkspaceId', 'Log Analytics Id')
    apim_service_id = utils.get_deployment_output(output, 'apimServiceId', 'APIM Service Id')
    apim_resource_gateway_url = utils.get_deployment_output(output, 'apimResourceGatewayURL', 'APIM API Gateway URL')
    apim_subscriptions = json.loads(utils.get_deployment_output(output, 'apimSubscriptions').replace("\'", "\""))
    for subscription in apim_subscriptions:
        subscription_name = subscription['name']
        subscription_key = subscription['key']
        utils.print_info(f"Subscription Name: {subscription_name}")
        utils.print_info(f"Subscription Key: ****{subscription_key[-4:]}")
    api_key = apim_subscriptions[0].get("key") # default api key to the first subscription key

<a id='requests'></a>
### 🧪 Test the inference API

Note: Install the boto3 library before running this step.

In [None]:
import boto3
import logging

# Enable botocore debug logging to trace HTTP calls
# logging.basicConfig()
# logging.getLogger('botocore').setLevel(logging.DEBUG)
# logging.getLogger('boto3').setLevel(logging.DEBUG)

session = boto3.Session()
bedrock = session.client(
    region_name=aws_bedrock_region,
    service_name=aws_bedrock_service_name,
    endpoint_url=f"{apim_resource_gateway_url}/{inference_api_path}",
    aws_access_key_id='',  # Leave empty as APIM will set the values
    aws_secret_access_key='', # Leave empty as APIM will set the values
)

# Register event handler to add custom headers
event_system = bedrock.meta.events
def add_custom_header(params, **kwargs):
    params["headers"]['api-key'] = api_key
event_system.register(f'before-call.{aws_bedrock_service_name}.Converse', add_custom_header)

system_message = {
    "role": "assistant",
    "content": [
        { "text": "You are a sarcastic, unhelpful assistant." } 
    ],
}
user_message = {
    "role": "user",
    "content": [
        { "text": "Can you tell me the time, please?" } 
    ],
}
response = bedrock.converse(
    modelId=aws_bedrock_model_id,
    messages=[system_message, user_message],
    inferenceConfig={
        "maxTokens": 2000,
        "temperature": 0
    },
)

print("💵 Total Tokens: ", response['usage']['totalTokens'])
print("💬 Response: ", response['output']['message']['content'][0]['text'])


<a id='sdk'></a>
### 🧪 Test with streaming
With a streaming API call, the response is sent back incrementally in chunks via an [event stream](https://developer.mozilla.org/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). In Python, you can iterate over these events with a for loop.

In [None]:
import boto3, time

session = boto3.Session()
bedrock = session.client(
    region_name=aws_bedrock_region,
    service_name=aws_bedrock_service_name,
    endpoint_url=f"{apim_resource_gateway_url}/{inference_api_path}",
    aws_access_key_id='',  # Leave empty as APIM will set the values
    aws_secret_access_key='', # Leave empty as APIM will set the values
)

# Register event handler to add custom headers
event_system = bedrock.meta.events
def add_custom_header(params, **kwargs):
    params["headers"]['api-key'] = api_key
event_system.register(f'before-call.{aws_bedrock_service_name}.ConverseStream', add_custom_header)

message = {
    "role": "user",
    "content": [
        { "text": "Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ..." } 
    ],
}

response = bedrock.converse_stream(
    modelId=aws_bedrock_model_id,
    messages=[message],
    inferenceConfig={
        "maxTokens": 2000,
        "temperature": 0.0
    }
)

stream = response.get('stream')
start_time = time.time()
for event in stream:
    if "contentBlockDelta" in event:
        chunk_time = time.time() - start_time  # calculate the time delay of the chunk
        chunk_message = event['contentBlockDelta']['delta']['text']
        print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text




<a id='kql'></a>
### 🔍 Display model usage


In [None]:
import pandas as pd

query = "model_usage"

output = utils.run(f"az monitor log-analytics query -w {log_analytics_id} --analytics-query \"{query}\"", "Retrieved log analytics query output", "Failed to retrieve log analytics query output") 
if output.success and output.json_data:
    table = output.json_data
    display(pd.DataFrame(table))


<a id='kql2'></a>
### 🔍 Display prompts and completions


In [None]:
import pandas as pd

query = "prompts_and_completions"

output = utils.run(f"az monitor log-analytics query -w {log_analytics_id} --analytics-query \"{query}\"", "Retrieved log analytics query output", "Failed to retrieve log analytics query output") 
if output.success and output.json_data:
    table = output.json_data
    display(pd.DataFrame(table))


<a id='clean'></a>
### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.