# Custom LLM Application Deployment to **Azure AI Foundry** as Managed Endpoint with **Tracing & Evaluation**

**Requirements** - In order to benefit from this tutorial, you will need:
* A basic understanding of AI Foundry SDK, Machine Learning SDK, and Large Language Models
* An Azure Machine Learning Workspace and Azure Container Registry


**Actions** - 
* Created a **python server API** based on Flask.
* Deploy LLM Completion and evaluation & to an **AI Foundry Managed Online Endpoint**
* Trace custom LLM application
* Evaluate the results

Managed online endpoints provide an easy to manage inferencing server for your ML workload. It's perfect for LLM based applications. Since we need a REST service, we won't use the default endpoint docker image, we will create a custom docker image instead.

**Outline** - 
1. Prepare Dependencies
2. Deploy to Managed Online Endpoint
3. Test

# 1. Install required Lib

In [None]:
pip install azure-identity
pip install azure-ai-ml

In [4]:
import os
from pathlib import Path
# from opentelemetry import trace
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import ConnectionType
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
# from azure.search.documents import SearchClient
# from config import ASSET_PATH, get_logger

In [None]:
from azure.ai.ml import (
    MLClient)

### 1.1 Set workspace details

In [2]:
SUBSCRIPTION_ID = "<Subscription ID>"  # Azure Subscription ID
RESOURCE_GROUP = "<Resource Group>"           # AI Foundry Resource Group
AML_WORKSPACE_NAME = "<AI Foundry Project>"     # AI Foundry Project

### 1.2 Login to your Azure account

In [3]:
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    AzureCliCredential,
)

try:
    credential = DefaultAzureCredential(additionally_allowed_tenants=["*"])
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential(additionally_allowed_tenants=["*"])

# If login doesn't work above, uncomment the code below and login using device code
# !az login --use-device-code

### 1.3 Create Container Registry and Docker Image

In [4]:
import re
from azure.ai.ml import (
    MLClient,
)

ml_client = MLClient(credential, SUBSCRIPTION_ID, RESOURCE_GROUP, AML_WORKSPACE_NAME)
ws = ml_client.workspaces.get(AML_WORKSPACE_NAME)

# Get the Azure Container Registry associated with the workspace
acr = ws.container_registry

# Parse the ACR resource Id for the ACR name
match_object = re.match(r".+?registries\/(.+)", acr)
ACR_NAME = match_object.group(1)

In [None]:
# Build the image in your ACR image
ACR_IMAGE_NAME = "serving"

!az acr build --image {ACR_IMAGE_NAME} --registry {ACR_NAME} ./environment/serving/. --resource-group {RESOURCE_GROUP}

# 2. Managed Online Endpoint
### 2.1 Create Endpoint

In [None]:
# create a endpoint
import datetime

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
)

from azure.ai.ml import (
    MLClient,
)

time = str(datetime.datetime.now().strftime("%m%d%H%M%f"))
online_endpoint_name = f"aml-llm-app1-{time}"

ml_client = MLClient(credential, SUBSCRIPTION_ID, RESOURCE_GROUP, AML_WORKSPACE_NAME)

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="online endpoint for LLM app",
    auth_mode="key",
)

endpoint = ml_client.begin_create_or_update(endpoint).result()

print(endpoint)

# 4. Deploy to the Endpoint

In [None]:
from azure.ai.ml.entities import (
    ManagedOnlineDeployment,
    OnlineRequestSettings,
    Model,
    Environment,
)

deployment_name = f"deploy-{time}"
sk_deployment = ManagedOnlineDeployment(
    name=deployment_name,
    model=Model(path="../aifoundrydemo"),
    request_settings=OnlineRequestSettings(request_timeout_ms=60000),
    environment=Environment(
        image=f"{ACR_NAME}.azurecr.io/{ACR_IMAGE_NAME}:latest",
        name="serving",
        description="A generic serving environment, allowing customer to provide their own entry point to bring up an http server",
        inference_config={
            "liveness_route": {"port": 5001, "path": "/health"},
            "readiness_route": {"port": 5001, "path": "/health"},
            "scoring_route": {"port": 5001, "path": "/"},
        },
    ),
    environment_variables={
        "AZUREML_SERVING_ENTRYPOINT": "./aifoundrydemo/entry.sh",
        "AIPROJECT_CONNECTION_STRING": "eastus.api.azureml.ms;<SUBSCRIPTION ID>;<RESOURCE GROUP>;<PROJECT NAME>", # AI Project Connection String
        "AISEARCH_INDEX_NAME": "products-index",
        "EMBEDDINGS_MODEL": "text-embedding-3-small",
        "INTENT_MAPPING_MODEL": "gpt-4o-mini",
        "CHAT_MODEL": "gpt-4o",
        "EVALUATION_MODEL": "gpt-4o",
        "APPLICATION_INSIGHTS_RESOURCE_ID": f"/subscriptions/<SUBSCRIPTION ID>/resourceGroups/<RESOURCE GROUP>/providers/Microsoft.Insights/components/<APP INSIGHT INSTANCE NAME>", # Application Insights Resource ID
        "AOAI_CONNECTION_NAME": f"<AI CONNECTION NAME>",  # AI Connection Name from AI Foundry
        "AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED":True   # Enable content tracing
    },
    endpoint_name=online_endpoint_name,
    instance_type="Standard_F2s_v2",
    instance_count=1,
)
ml_client.online_deployments.begin_create_or_update(sk_deployment).result()

endpoint.traffic = {deployment_name: 100}
ml_client.begin_create_or_update(endpoint).result()

In [None]:
print(endpoint.identity.principal_id)   # Get the principal ID of the endpoint to grant RBAC access to services like Key Vault, workspace storage, App insights etc.

# 5. Test
Now endpoint has been deployed, let's test it.

In [60]:
import requests, json
from urllib.parse import urlsplit

url_parts = urlsplit(endpoint.scoring_uri)
url = url_parts.scheme + "://" + url_parts.netloc

token = ml_client.online_endpoints.get_keys(name=online_endpoint_name).primary_key
headers = {"Authorization": "Bearer " + token, "Content-Type": "application/json"}
payload = json.dumps(
    {
        "query": "Items to carry for Everest expediation","enable-telemetry":True   # Pass the query to the endpoint and enable telemetry to APP Insights
    }
)

response = requests.post(f"{url}/chat/products", headers=headers, data=payload)   # Call the endpoint to get LLM response
print(f"GenAI Response:\n", response)

Travel musts:
 <Response [200]>


In [61]:
import requests, json
from urllib.parse import urlsplit

url_parts = urlsplit(endpoint.scoring_uri)
url = url_parts.scheme + "://" + url_parts.netloc

token = ml_client.online_endpoints.get_keys(name=online_endpoint_name).primary_key
headers = {"Authorization": "Bearer " + token, "Content-Type": "application/json"}

response = requests.get(f"{url}/chat/evaluations", headers=headers)   # Genearete evaluation for the prior LLM conversation
print(f"Evaluation Response:\n", response)

Travel musts:
 <Response [200]>


# 6. Clean up resources

### 6.1 Delete the endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

### 6.2 Delete the ACR Image

In [None]:
!az acr repository delete --name {ACR_NAME} --image {ACR_IMAGE_NAME}:latest --yes