## Deploying an embedding model with Model-As-Service Serverless

This notebook shows how to deploy an embedding model serverless on Azure AI Model-As-Service using the Python SDK.

You can also bring your own embedding model or deploy it manually using either [Azure ML Studio](https://aka.ms/raft-llama-31-learn-deploy-405b) or [Azure AI Studio](https://aka.ms/raft-llama-31-learn-deploy-405b-ai-studio).

**Note**: an Azure ML Workspace is the same as a Azure AI Hub, you will be able to go back and forth between the two transparently.

If you choose to bring your own embedding model or deploy it manually, you can set the following environment variable in `.env`, this notebook will then skip deployment. You can also skip this notebook entirely, but note that the last cell in this notebook checks that the endpoint is up and running.

```
EMBEDDING_AZURE_OPENAI_ENDPOINT=<BASE_URL>
EMBEDDING_AZURE_OPENAI_API_KEY=<API_KEY>
EMBEDDING_AZURE_OPENAI_DEPLOYMENT=<DEPLOYMENT>
```

#### Model
We will use `text-embedding-ada-002` as the embedding model to generate the RAFT synthetic dataset.

#### Pre-requisites
- Authenticate to Azure using `az login --use-device-code`
- Existing workspace `config.json` file, either created by the previous `0_a_workspace.ipynb` notebook or brought over.

In [None]:
%pip install azure-ai-ml

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    client = MLClient.from_config(credential=credential)
except:
    print("Please create a workspace configuration file in the current directory.")

# Get AzureML workspace object.
workspace = client._workspaces.get(client.workspace_name)
workspace_id = workspace._workspace_id

We will use Meta Llama 3.1 405B Instruct as the teacher model to generate the dataset

In [None]:
teacher_embedding_model_name: str = "text-embedding-ada-002"
embedding_registry_name: str = "azure-openai"

In [None]:
registry_ml_client = MLClient(credential, registry_name=embedding_registry_name)

print(f"Searching for model {teacher_embedding_model_name}")
model = registry_ml_client.models.get(teacher_embedding_model_name, label="latest")
print(f"Found model {teacher_embedding_model_name} in registry {embedding_registry_name}")

In [None]:
model_id = "/".join(model.id.split("/")[:-2])
model_id

Deploy the model as a serverless endpoint

In [None]:
endpoint_name = f"{model.name}".replace(".", "-").replace("_", "-")[:64]
print(f"Deploying model {model.name} as endpoint {endpoint_name}")

In [None]:
import os
from dotenv import load_dotenv

load_dotenv(".env")

endpoint_base_url = os.getenv("EMBEDDING_AZURE_OPENAI_ENDPOINT")
endpoint_api_key = os.getenv("EMBEDDING_AZURE_OPENAI_API_KEY")
endpoint_deployment_name = os.getenv("EMBEDDING_AZURE_OPENAI_DEPLOYMENT")
endpoint_api_version = os.getenv("EMBEDDING_OPENAI_API_VERSION")

In [None]:
if endpoint_base_url:
    print(f"Skipping endpoint deployment as an existing embedding model completion endpoint was provided {endpoint_base_url}")
else:

    from azure.core.exceptions import ResourceNotFoundError
    try:
        serverless_endpoint = client.serverless_endpoints.get(endpoint_name)
        print(f"Found existing endpoint {endpoint_name}")
    except ResourceNotFoundError as ex:
        serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)
        serverless_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()

        print("Waiting for deployment to complete...")
        serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)

        created_endpoint = client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()
        print("Deployment complete")


Let's extract the endpoint URL, name and keys and store them in the shared state to pass on to the next notebook

In [None]:
if not endpoint_base_url:

    endpoint = client.serverless_endpoints.get(endpoint_name)
    endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)

    # Update the shared `.env.state` env file with the newly deployed finetuned model endpoint
    from utils import update_state

    endpoint_base_url = endpoint.scoring_uri
    endpoint_api_key = endpoint_keys.primary_key
    endpoint_deployment_name = endpoint.name

    update_state("EMBEDDING_AZURE_OPENAI_ENDPOINT", endpoint_base_url)
    update_state("EMBEDDING_AZURE_OPENAI_API_KEY", endpoint_api_key)
    update_state("EMBEDDING_AZURE_OPENAI_DEPLOYMENT", endpoint_deployment_name)

Test that the embedding model is deployed and available

In [None]:
from openai import AzureOpenAI

oai_client = AzureOpenAI(
  api_key = endpoint_api_key,
  api_version = endpoint_api_version,
  azure_endpoint = endpoint_base_url
)

oai_client.embeddings.create(input = ["Hello"], model=endpoint_deployment_name).data[0].embedding