## Deploying a fine-tuned model with Model-As-Service Serverless

This notebook shows how to deploy a fine-tuned model serverless on Azure AI Model-As-Service.

**Note**: It waits for the fine-tuned model to be available so it is safe running it before the fine-tuning job has completed.

#### Model
We will use the model fine-tuned in the previous [2_finetune.ipynb](./2_finetune.ipynb) notebook.

#### Pre-requisites
- Same as in the [1_gen.ipynb](./1_gen.ipynb) notebook, you need to subscribe to the Marketplace offering. This should be done already but here is the [documentation](https://aka.ms/raft-llama-31-learn-deploy-405b) in case you worked around this in the previous notebook.


In [None]:
%pip install azure-ai-ml

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    client = MLClient.from_config(credential=credential)
except:
    print("Please create a workspace configuration file in the current directory.")

# Get AzureML workspace object.
workspace = client._workspaces.get(client.workspace_name)
workspace_id = workspace._workspace_id

We will use Meta Llama 3.1 405B Instruct as the teacher model to generate the dataset

In [None]:
teacher_model_name: str = "Meta-Llama-3.1-405B-Instruct"
registry_name: str = "azureml-meta"

In [None]:
registry_ml_client = MLClient(credential, registry_name=registry_name)

print(f"Searching for model {teacher_model_name}")
model = registry_ml_client.models.get(teacher_model_name, label="latest")
print(f"Found model {teacher_model_name} in registry {registry_name}")

Let's subscribe to the model, this requires having accepted the provider's Marketplace terms at least once in the Model Catalog UI before

In [None]:
subscription_name = teacher_model_name.replace(".", "-").replace("_", "-")
model_id = "/".join(model.id.split("/")[:-2])
print(f"Subscribing to {subscription_name} for model ID {model_id}")

In [None]:
from azure.core.exceptions import ResourceExistsError
marketplace_subscription = MarketplaceSubscription(
    model_id=model_id,
    name=subscription_name,
)

try:
    marketplace_subscription = client.marketplace_subscriptions.begin_create_or_update(marketplace_subscription).result()
except ResourceExistsError as ex:
    print(f"Marketplace subscription {subscription_name} already exists for model {model_id}")

Deploy the model as a serverless endpoint

In [None]:
import uuid

endpoint_name = f"{model.name}-raft".replace(".", "-").replace("_", "-")[:64]
print(f"Deploying model {model.name} as endpoint {endpoint_name}")

In [None]:
from azure.core.exceptions import ResourceNotFoundError
try:
    serverless_endpoint = client.serverless_endpoints.get(endpoint_name)
    print(f"Found existing endpoint {endpoint_name}")
except ResourceNotFoundError as ex:
    serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)
    serverless_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()

    print("Waiting for deployment to complete...")
    serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)

    created_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()
    print("Deployment complete")


Let's extract the endpoint URL, name and keys and store them in the shared state to pass on to the next notebook

In [None]:
endpoint = client.serverless_endpoints.get(endpoint_name)
endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)

# Update the shared `.env.state` env file with the newly deployed finetuned model endpoint
from utils import update_state

update_state("COMPLETION_OPENAI_BASE_URL", endpoint.scoring_uri)
update_state("COMPLETION_OPENAI_API_KEY", endpoint_keys.primary_key)
update_state("COMPLETION_OPENAI_DEPLOYMENT", endpoint.name)

Test that the teacher model is deployed and available

In [None]:
import requests

url = f"{endpoint.scoring_uri}/v1/chat/completions"

payload = {
    "messages":[ { "role":"user","content":"What do you know?" } ],
    "max_tokens":1024
}
headers = {"Content-Type": "application/json", "Authorization": endpoint_keys.primary_key}

response = requests.post(url, json=payload, headers=headers)

response.json()