# Score OpenAI models in batch using Batch Endpoints

In this notebook you will learn how to create an MLflow model pointing to an OpenAI model to perform batch computation of embeddings.

## 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

### 1.1. Import the required libraries

In [None]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    ModelBatchDeployment,
    ModelBatchDeploymentSettings,
    Model,
    AmlCompute,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.identity import DefaultAzureCredential

### 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

If you are working in a Azure Machine Learning compute, you can simply:

In [None]:
ml_client = MLClient.from_config(DefaultAzureCredential())

## 2. Registering the model

### 2.1 About the model

We are going to compute embeddings from text using the OpenAI model `text-embedding-ada-002`. To create a Batch Deployment in a resusable way, we are going to create an MLflow model associated which such OpenAI model. The flavor of this model is OpenAI.

### 2.2 Registering the model in the workspace

In [None]:
model_name = "text-embedding-ada-002"
model_local_path = "model"

model = ml_client.models.create_or_update(
    Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

Let's get the model:

In [None]:
model = ml_client.models.get(name=model_name, label="latest")

## 3 Create Batch Endpoint

Batch endpoints are endpoints that are used batch inferencing on large volumes of data over a period of time. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis.

### 3.1 Configure the endpoint

First, let's create the endpoint that is going to host the batch deployments. To ensure that our endpoint name is unique, let's create a random suffix to append to it. 

> In general, you won't need to use this technique but you will use more meaningful names. Please skip the following cell if your case:

In [None]:
endpoint_name = "text-embedding-ada"

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = f"{endpoint_name}-{endpoint_suffix}"

print(f"Endpoint name: {endpoint_name}")

Let's configure the endpoint:

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI",
)

### 3.2 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

## 4. Create a batch deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `BatchDeployment` class.

### 4.1 Creating an scoring script to work with the model

MLflow models don't require an scoring script. However, we are going to provide an scoring script to allow the batch endpoint to:

- Read any type of input data including JSON, JSONL, text files, etc.
- Generate outputs in JSON format, which is very appealing for embeddings.

The scoring script looks as follows:

In [None]:
!cat code/batch_driver.py

### 4.2 Creating the compute

Batch deployments can run on any Azure ML compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an AzureML compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.

In [None]:
compute_name = "batch-cluster-lp"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name, description="amlcompute", min_instances=0, max_instances=5
    )
    ml_client.begin_create_or_update(compute_cluster).result()

### 4.3 Creating the environment

Our scoring script requires additional packages like `datasets`. We are creating an environment for it:

In [None]:
environment = Environment(
    name="batch-openai-mlflow",
    conda_file="environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)

### 4.4 Configuring the deployment

We create the associated deployment. Take a look about how the `environment_variables` section is created.

In [None]:
openai_api_base = "https://<deployment>.openai.azure.com/"

> Replace `https://<deployment>.openai.azure.com/` with the URL of the deployment in Azure OpenAI hosting the model.

In [None]:
deployment = ModelBatchDeployment(
    name="default",
    description="The default deployment for generating embeddings",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(code="code", scoring_script="batch_driver.py"),
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        instance_count=1,
        max_concurrency_per_instance=1,
        mini_batch_size=1,
        output_action=BatchDeploymentOutputAction.SUMMARY_ONLY,
        retry_settings=BatchRetrySettings(max_retries=1, timeout=9999),
        logging_level="info",
        error_threshold=-1,
        environment_variables={
            "OPENAI_API_TYPE": "azure_ad",
            "OPENAI_API_VERSION": "2023-03-15-preview",
            "OPENAI_API_BASE": openai_api_base,
        },
    ),
)

### 4.5 Create the deployment
Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

Once created, let's configure this new deployment as the default one:

In [None]:
endpoint = ml_client.batch_endpoints.get(endpoint.name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

In [None]:
print(f"The default deployment is {endpoint.defaults.deployment_name}")

### 4.6 Testing the deployment

Once the deployment is created, it is ready to recieve jobs.

#### 4.6.1 Creating an input for the deployment

In [None]:
input = Input(type=AssetTypes.URI_FOLDER, path="data")

#### 4.6.2 Invoke the deployment

Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:
- `name` - Name of the endpoint
- `input_path` - Path where input data is present

In [None]:
job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)

Notice how we are not indicating the deployment name in the invoke operation. That's because the endpoint automatically routes the job to the default deployment. Since our endpoint only has one deployment, then that one is the default one. You can target an specific deployment by indicating the argument/parameter `deployment_name`.

#### 4.6.3 Get the details of the invoked job

Let us get details and logs of the invoked job:

In [None]:
ml_client.jobs.get(job.name)

We can wait for the job to finish using the following code:

In [None]:
ml_client.jobs.stream(job.name)

### 4.7 Exploring the results

The deployment creates a child job that executes the scoring. We can get the details of it using the following code:

In [None]:
scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]

In [None]:
print("Job name:", scoring_job.name)
print("Job status:", scoring_job.status)
print(
    "Job duration:",
    scoring_job.creation_context.last_modified_at
    - scoring_job.creation_context.created_at,
)

#### 4.7.1 Download the results

The outputs generated by the deployment job will be placed in an output named `score`:

In [None]:
ml_client.jobs.download(name=scoring_job.name, download_path=".", output_name="score")

The output predictions will look like the following:

In [None]:
import pandas as pd

embeddings = pd.read_json("named-outputs/score/embeddings.jsonl", lines=True)
embeddings

## 6. Clean up resources

Clean-up the resources created. 

In [None]:
ml_client.batch_endpoints.begin_delete(endpoint_name).result()