# APIM and Azure OpenAI with Private Endpoint

Playground to show how to privately access Azure OpenAI behind a private endpoint.

![](images/architecture.png)

### 1️⃣ Create deployment using Terraform

This lab uses Terraform to declaratively define all the resources that will be deployed. Change the [variables.tf](variables.tf) directly to try different configurations.

In [None]:
! $env:ARM_SUBSCRIPTION_ID=(az account show --query id -o tsv)   # if using Windows PowerShell
# ! setenv ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv) # if using macOS or Linux

! terraform init
! terraform apply -auto-approve

The following resources will be created.

![](images/resources.png)

### 2️⃣ Get the deployment outputs

We are now at the stage where we only need to retrieve the gateway URL and the subscription before we are ready for testing.

In [None]:
apim_resource_gateway_url = ! terraform output -raw apim_resource_gateway_url
apim_resource_gateway_url = apim_resource_gateway_url.n
print("👉🏻 APIM Resource Gateway URL: ", apim_resource_gateway_url)

apim_subscription_key = ! terraform output -raw apim_subscription_key
apim_subscription_key = apim_subscription_key.n
print("👉🏻 APIM Subscription Key: ", apim_subscription_key)

openai_api_version = "2024-10-21"
openai_model_name = "gpt-4o"
openai_deployment_name = "gpt-4o"

<a id='requests'></a>
### 🧪 Test the API using a direct HTTP call
Requests is an elegant and simple HTTP library for Python that will be used here to make raw API requests and inspect the responses. 

You will not see HTTP 429s returned as API Management's `retry` policy will select an available backend. If no backends are viable, an HTTP 503 will be returned.

Tip: Use the [tracing tool](../../tools/tracing.ipynb) to track the behavior of the backend pool.

In [None]:
import time
import os
import json
import datetime
import requests

url = apim_resource_gateway_url + "/openai/deployments/" + openai_deployment_name + "/chat/completions?api-version=" + openai_api_version

messages={"messages":[
    {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
    {"role": "user", "content": "Can you tell me the time, please?"}
]}

response = requests.post(url, headers = {'api-key':apim_subscription_key}, json = messages)

# Check the response status code and apply formatting
if 200 <= response.status_code < 300:
    status_code_str = '\x1b[1;32m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and green
elif response.status_code >= 400:
    status_code_str = '\x1b[1;31m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and red
else:
    status_code_str = str(response.status_code)  # No formatting
    
# Print the response status with the appropriate formatting
print("Response status:", status_code_str)

if (response.status_code == 200):
    data = json.loads(response.text)
    print("Token usage:", data.get("usage"), "\n")
    print("💬 ", data.get("choices")[0].get("message").get("content"), "\n")
else:
    print(response.text)   

<a id='sdk'></a>
### 🧪 Test the API using the Azure OpenAI Python SDK

Repeat the same test using the Python SDK to ensure compatibility.

In [None]:
import time
from openai import AzureOpenAI

messages=[
    {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
    {"role": "user", "content": "Can you tell me the time, please?"}
]

client = AzureOpenAI(
    azure_endpoint=apim_resource_gateway_url,
    api_key=apim_subscription_key,
    api_version=openai_api_version
)

response = client.chat.completions.create(model=openai_model_name, messages=messages)

print("💬 ", response.choices[0].message.content)

: 