## Deploying a fine-tuned model with Model-As-Service Serverless

This notebook shows how to deploy a fine-tuned model serverless on Azure AI Model-As-Service.

**Note**: It waits for the fine-tuned model to be available so it is safe running it before the fine-tuning job has completed.

#### Model
We will use the model fine-tuned in the previous [2_finetune.ipynb](./2_finetune.ipynb) notebook.

#### Pre-requisites
Same as in the [1_gen.ipynb](./1_gen.ipynb) notebook, you need to subscribe to the Marketplace offering. This should be done already but here is the [documentation](https://aka.ms/raft-llama-31-learn-deploy-405b) in case you worked around this in the previous notebook.

The requirements should have been automatically installed if you opened the project in Dev Container or Codespaces, but if not, uncomment the following cell to install the requirements


## Overview
![](./doc/raft-process-deploy.png)

In [1]:
#%pip install azure-ai-ml

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    client = MLClient.from_config(credential=credential)
except:
    print("Please create a workspace configuration file in the current directory.")

# Get AzureML workspace object.
workspace = client._workspaces.get(client.workspace_name)
workspace_id = workspace._workspace_id

Let's figure out the name of the finetuned model from the shared state environment

In [3]:
import os
from dotenv import load_dotenv

# Variables passed by previous notebooks
load_dotenv(".env.state")

FINETUNED_MODEL_NAME = os.getenv("FINETUNED_MODEL_NAME")
FINETUNED_MODEL_FORMAT = os.getenv("FINETUNED_MODEL_FORMAT")

The fine-tuning job might still be training so let's wait until the model is ready

In [None]:
from utils import wait_for_model

print(f"Waiting for fine tuned model {FINETUNED_MODEL_NAME} to complete training...")
model = wait_for_model(client, FINETUNED_MODEL_NAME)
print(f"Model {FINETUNED_MODEL_NAME} is ready")

Let's subscribe to the model, this requires having accepted the provider's Marketplace terms at least once in the Model Catalog UI before

In [None]:
base_model_id = model.properties["baseModelId"]
model_id = model.id
subscription_name = base_model_id.split("/")[-1].replace(".", "-").replace("_", "-")
print(f"Subscribing to {subscription_name} for model ID {base_model_id}")

The Asset ID required to deploy the model is not currently exposed through the Python SDK so we're constructing it using the information we have on hand.

**Note**: as we're indirectly constructing the Asset ID blob storage path, the backend might change this and break this code. If this happens, you can figure out what the new expected form is by searching for the Asset ID field in the fine-tuned model's catalog page and adjust the template bellow.

In [None]:
model_asset_id = f"azureml://locations/westus3/workspaces/{workspace_id}/{"/".join(model.id.split('/')[9:])}"
print(f"Deploying model asset id {model_asset_id}")

In [None]:
from azure.core.exceptions import ResourceExistsError
marketplace_subscription = MarketplaceSubscription(
    model_id=base_model_id,
    name=subscription_name,
)

try:
    marketplace_subscription = client.marketplace_subscriptions.begin_create_or_update(marketplace_subscription).result()
except ResourceExistsError as ex:
    print(f"Marketplace subscription {subscription_name} already exists for model {base_model_id}")

Deploy the model as a serverless endpoint

In [None]:
# The endpoint name is deterministic based only on the model name which is assumed to contain a hash of the dataset 
# because if the finetuned model for that specific dataset is already deployed, we don't want to deploy it again
endpoint_name = f"{model.name}".replace(".", "-").replace("_", "-")[:64]
print(f"Deploying model {model.name} as endpoint {endpoint_name}")

In [None]:
from azure.core.exceptions import ResourceNotFoundError
try:
    serverless_endpoint = client.serverless_endpoints.get(endpoint_name)
    print(f"Found existing endpoint {endpoint_name}")
except ResourceNotFoundError as ex:
    serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_asset_id)
    serverless_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()

    print("Waiting for deployment to complete...")
    serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)

    created_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()
    print("Deployment complete")

Let's extract the endpoint URL, name and keys and store them in the shared state to pass on to the next notebook

In [None]:
endpoint = client.serverless_endpoints.get(endpoint_name)
endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)

# Update the shared `.env.state` env file with the newly deployed finetuned model endpoint
from utils import update_state

update_state("FINETUNED_OPENAI_BASE_URL", endpoint.scoring_uri)
update_state("FINETUNED_OPENAI_API_KEY", endpoint_keys.primary_key)
update_state("FINETUNED_OPENAI_DEPLOYMENT", endpoint.name)

Test that the finetuned model is deployed and available

In [None]:
import requests

print(f"Testing deployed {FINETUNED_MODEL_FORMAT} model at {endpoint.scoring_uri}")
url = f"{endpoint.scoring_uri}/v1/completions" if FINETUNED_MODEL_FORMAT == "completion" else f"{endpoint.scoring_uri}/v1/chat/completions"

prompt = "What do you know?"
payload = {"max_tokens": 1024, "prompt": [prompt]} if FINETUNED_MODEL_FORMAT == "completion" else {
    "messages":[ { "role":"user","content":prompt } ],
    "max_tokens":1024
}
headers = {"Content-Type": "application/json", "Authorization": endpoint_keys.primary_key}

response = requests.post(url, json=payload, headers=headers)

response.json()

## Next step -> Evaluation

[./4_eval.ipynb](./4_eval.ipynb) to start evaluating the deployed student model