# APIM ❤️ OpenAI

## SLM self hosting lab
![flow](../../images/slm-self-hosting.gif)

Playground to try the self-hosted [phy-3 Small Language Model (SLM)](https://azure.microsoft.com/blog/introducing-phi-3-redefining-whats-possible-with-slms/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility.

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.

The APIM self-hosted gateway is a containerized version of the default managed gateway. It's useful for scenarios such as placing gateways in the same environments where you host your APIs. Like in this experiment where we self-host the phi-3 API. This enable use cases where the SLM is running on-premises 

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)
- [Docker Desktop](https://www.docker.com/products/docker-desktop/)

▶️ Click `Run All` to execute all steps sequentially, or execute them `Step by Step`...

<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Adjust the OpenAI model and version according the [availability by region.](https://learn.microsoft.com/azure/ai-services/openai/concepts/models) 

In [None]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}" # change the name to match your naming style
resource_group_location = "westeurope" 

apim_sku = 'Developer'
self_hosted_gateway_name = "self-hosted-gateway"
model = "phi-3.5-mini"
openai_api_version = "2024-10-21"

utils.print_ok('Notebook initialized')


<a id='1'></a>
### 1️⃣ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")
    

<a id='2'></a>
### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declarative define all the resources that will be deployed in the specified resource group. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 


In [None]:
# Create the resource group if doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Define the Bicep parameters
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "apimSku": { "value": apim_sku },
        "selfHostedGatewayName": { "value": self_hosted_gateway_name },
        "openAIAPIVersion": { "value": openai_api_version }
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

# Run the deployment
output = utils.run(f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json",
    f"Deployment '{deployment_name}' succeeded", f"Deployment '{deployment_name}' failed")

<a id='3'></a>
### 3️⃣ Get the deployment outputs

Retrieve the required outputs from the Bicep deployment.

In [None]:
import datetime
# Obtain all of the outputs from the deployment
output = utils.run(f"az deployment group show --name {deployment_name} -g {resource_group_name}", f"Retrieved deployment: {deployment_name}", f"Failed to retrieve deployment: {deployment_name}")

if output.success and output.json_data:
    apim_resource_name = utils.get_deployment_output(output, 'apimResourceName', 'APIM Resource Name')
    apim_resource_id = utils.get_deployment_output(output, 'apimResourceId', 'APIM Resource Id')
    apim_subscription_key = utils.get_deployment_output(output, 'apimSubscriptionKey', 'APIM Subscription Key (masked)', True)
    apim_config_endpoint = f"{apim_resource_name}.configuration.azure-api.net"
    apim_resource_gateway_url = "http://localhost"

    print("👉🏻 API Gateway URL: ", apim_resource_gateway_url)

<a id='41'></a>
### 4️⃣ Option 1: Run phy-3.5-mini with [AI Foundry Local](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started)

1. [Install Foundry Local](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started#quickstart) in your local environment
2. Install the Foundry Local SDK: `pip install foundry-local-sdk`

The following code is based on the [official docs](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/how-to/how-to-integrate-with-inference-sdks?pivots=programming-language-python#use-openai-sdk-with-foundry-local).

In [None]:
import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device. 
alias = model
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
print(manager.endpoint)

# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)

<a id='41'></a>
### 4️⃣ Option 2: Run phy-3 API locally with docker

The following commands will build the container image and then create a container instance to run it

In [None]:
# ! docker build -t phi-3 phy-3/.
# ! docker run -d -p 5000:5000 -v phi2-models:/model_cache phi-3

<a id='42'></a>
### 4️⃣ Option 3: Run phy-3 API locally without a container

We recommend running the following commands in a separate terminal

In [None]:
# ! cd phy-3
# ! pip install --no-cache-dir -r requirements.txt
# ! flask --app app.py --debug run

<a id='5'></a>
### 5️⃣ Run the APIM self-hosted gateway with docker

In a production environment we recommend using Kubernetes. Follow this [guide](https://learn.microsoft.com/azure/api-management/how-to-self-hosted-gateway-on-kubernetes-in-production) to run the APIM self-hosted gateway in production. 

In [None]:
# Generate a token for the self-hosted gateway
request = {
    "properties": {    
        "keyType": "primary",
        "expiry": (datetime.datetime.now() + datetime.timedelta(days=29)).isoformat()
    }
}
output = utils.run(f"az rest --method post --uri {apim_resource_id}/gateways/{self_hosted_gateway_name}/generateToken?api-version=2023-05-01-preview --body \"{str(request)}\"",
        "Generated gateway token ", "Failed to generate gateway token")            
self_hosted_gateway_auth = output.json_data.get('value') if output.success and output.json_data else None


In [None]:
! docker run -d -p 80:8080 -p 443:8081 -e config.service.endpoint="{apim_config_endpoint}" -e config.service.auth="GatewayKey {self_hosted_gateway_auth}" -e runtime.deployment.artifact.source="Azure Portal" -e runtime.deployment.mechanism=Docker --name {self_hosted_gateway_name} mcr.microsoft.com/azure-api-management/gateway:v2

<a id='requests'></a>
### 🧪 Test the API using a direct HTTP call

Requests is an elegant and simple HTTP library for Python that will be used here to make raw API requests and inspect the responses.

⚠️ The requests might take some minutes depending on your running environment. 

In [None]:
import requests

url = apim_resource_gateway_url + "/inference/chat/completions?api-version=" + openai_api_version

manager = FoundryLocalManager(model)

payload = {
    "model": manager.get_model_info(model).id,
    "messages": [
        {"role": "user", "content": "What's the capital of Portugal?"}
    ]
}

headers = {
    "Content-Type": "application/json",
    "api-key": apim_subscription_key
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])

<a id='sdk'></a>
### 🧪 Test the API using the [Azure AI Inference library](https://learn.microsoft.com/en-us/python/api/overview/azure/ai-inference-readme?view=azure-python-preview)



In [None]:
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import SystemMessage, UserMessage
from foundry_local import FoundryLocalManager

manager = FoundryLocalManager(model)

client = ChatCompletionsClient(
    endpoint=f"{apim_resource_gateway_url}/inference",
    credential=AzureKeyCredential(apim_subscription_key),
    api_version=openai_api_version
)

response = client.complete(
    messages=[
        UserMessage("What's the capital of Portugal?"),
    ],
    model=manager.get_model_info(model).id
)

print(response.choices[0].message.content)


<a id='clean'></a>
### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.