# How to create an Azure AI Content Safety enabled Llama batch endpoint (Preview)
### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __Llama__ batch endpoint.
### This notebook is under preview
### The steps are:
1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __Llama__ batch endpoint.
2. Create a new __Azure AI Content Safety__ enabled __Llama__ batch endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Llama__ model and the request from the user, but to make the custom score.py to sucessfully autheticated to the __Azure AI Content Safety__ resource, for batch inferencing is using __Environment variable__ to pass the access key of the __Azure AI Content Safety__ resource to the custom score.py via environment variable, then the custom score.py can use the key directly to access the Azure AI Content Safety resource, this option is less secure than the first option, if someone in your org has access to the endpoint, he/she can get the access key from the environment variable and use it to access the Azure AI Content Safety resource.
  

### 1. Prerequisites
#### 1.1 Check List:
- [x] You have created a new Python virtual environment for this notebook.
- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity.
- [x] Required If you choose to use the UAI approach, the identity executing this notebook (either yourself or your virtual machine) needs to have the owner role on the resource group that contains the specified AML Workspace. This is because the notebook will create a new UAI and assign the UAI some required roles to successfully create the Azure AI Content Safety enabled Llama endpoint.

#### 1.2 Assign variables for the workspace and deployment

In [1]:
# The public registry name contains Llama models
registry_name = "msft-meta-preview"

# Name of the Llama model to be deployed
# available_llama_models_text_generation = ["Llama-2-7b", "Llama-2-13b"]
# available_llama_models_chat_complete = ["Llama-2-7b-chat", "Llama-2-13b-chat"]
model_name = "Llama-2-7b"

endpoint_name = "bani-llama-endpt"  # Replace with your endpoint name
deployment_name = "bani-llama-dep"  # Replace with your deployment name, lower case only!!!
sku_name = "Standard_NC6"  # Name of the sku(instance type) Check the model-list(can be found in the parent folder(inference)) to get the most optimal sku for your model (Default: Standard_DS2_v2)

environment_name = "for-llama-env"  # Replace with your environment name

#### 1.3 Install Dependencies(as needed)

In [None]:
# uncomment the following lines to install the required packages
# %pip install azure-identity==1.13.0
# %pip install azure-mgmt-cognitiveservices==13.4.0
# %pip install azure-ai-ml==1.8.0
# %pip install azure-mgmt-msi==7.0.0
# %pip install azure-mgmt-authorization==3.0.0

#### 1.4 Get credential

In [2]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

#### 1.5 Configure workspace 

In [3]:
from azure.ai.ml import MLClient

try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # enter details of your AML workspace
    subscription_id = "b17253fa-f327-42d6-9686-f3e553e24763"
    resource_group = "bani-llm"
    workspace = "bani-llm-ws"

    # get a handle to the workspace
    ml_client = MLClient(credential, subscription_id, resource_group, workspace, logging_enable=True,)


subscription_id = ml_client.subscription_id
resource_group = ml_client.resource_group_name
workspace = ml_client.workspace_name

print(f"Connected to workspace {workspace}")


Connected to workspace bani-llm-ws


#### 1.6 Assign variables for Azure Content Safety
Currently, Azure AI Content Safety is in a limited set of regions:


__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please be aware that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution.

In [None]:
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient

acs_client = CognitiveServicesManagementClient(credential, subscription_id)


# settings for the Azure AI Content Safety resource
# we will choose existing AACS resource if it exists, otherwise create a new one
# name of azure ai content safety resource, has to be unique
import time

aacs_name = f"{endpoint_name}-aacs-{str(time.time()).replace('.','')}"
available_aacs_locations = ["east us", "west europe"]

# create a new Cognitive Services Account
kind = "ContentSafety"
aacs_sku_name = "S0"
aacs_location = available_aacs_locations[0]


print("Available SKUs:")
aacs_skus = acs_client.resource_skus.list()
print("SKU Name\tSKU Tier\tLocations")
for sku in aacs_skus:
    if sku.kind == "ContentSafety":
        locations = ",".join(sku.locations)
        print(sku.name + "\t" + sku.tier + "\t" + locations)

print(
    f"Choose a new Azure AI Content Safety resource in {aacs_location} with SKU {aacs_sku_name}"
)

### 2. Create Azure AI Content Safety

In [None]:
from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties


parameters = Account(
    sku=Sku(name=aacs_sku_name),
    kind=kind,
    location=aacs_location,
    properties=AccountProperties(
        custom_sub_domain_name=aacs_name, public_network_access="Enabled"
    ),
)
# How many seconds to wait between checking the status of an async operation.
wait_time = 10


def find_acs(accounts):
    return next(
        x
        for x in accounts
        if x.kind == "ContentSafety"
        and x.location == aacs_location
        and x.sku.name == aacs_sku_name
    )


try:
    # check if AACS exists
    aacs = acs_client.accounts.get(resource_group, aacs_name)
    print(f"Found existing Azure AI content safety Account {aacs.name}.")
except:
    try:
        # check if there is an existing AACS resource within same resource group
        aacs = find_acs(acs_client.accounts.list_by_resource_group(resource_group))
        print(
            f"Found existing Azure AI content safety Account {aacs.name} in resource group {resource_group}."
        )
    except:
        print(f"Creating Azure AI content safety Account {aacs_name}.")
        acs_client.accounts.begin_create(resource_group, aacs_name, parameters).wait()
        print("Resource created.")
        aacs = acs_client.accounts.get(resource_group, aacs_name)


aacs_endpoint = aacs.properties.endpoint
aacs_resource_id = aacs.id
print(f"AACS endpoint is {aacs_endpoint}")
print(f"AACS ResourceId is {aacs_resource_id}")

aacs_access_key = acs_client.accounts.list_keys(
    resource_group_name=resource_group, account_name=aacs.name
).key1

### 3. Create Azure AI Content Safety enabled Llama batch endpoint

#### 3.1 Check if Llama model is available in the AML registry.

In [8]:
reg_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    registry_name=registry_name,
    logging_enable=True,
)
version_list = list(
    reg_client.models.list(model_name)
)  # list available versions of the model
llama_model = None
if len(version_list) == 0:
    raise Exception(f"No model named {model_name} found in registry")
else:
    model_version = version_list[0].version
    llama_model = reg_client.models.get(model_name, model_version)
    print(
        f"Using model name: {llama_model.name}, version: {llama_model.version}, id: {llama_model.id} for inferencing"
    )

DEBUG:azure.identity._internal.decorators:AzureCliCredential.get_token succeeded
DEBUG:azure.identity._internal.decorators:[Authenticated account] Client ID: 04b07795-8ddb-461a-bbee-02f9e1bf7b46. Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47. User Principal Name: banide@microsoft.com. Object ID (user): 558795a7-7d0c-4b80-b7c6-8f71e52a8bef
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from AzureCliCredential
DEBUG:azure.core.pipeline.policies._universal:Request URL: 'https://eastus.api.azureml.ms/registrymanagement/v1.0/registries/msft-meta-preview/discovery'
Request method: 'GET'
Request headers:
    'Accept': 'application/json, text/json'
    'x-ms-client-request-id': '1828ba21-20fe-11ee-90ec-000d3ac5f120'
    'User-Agent': 'azsdk-python-mgmt-machinelearningservices/0.1.0 Python/3.11.4 (Windows-10-10.0.22621-SP0)'
    'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ii1LSTNROW5OUjdiUm9meG1lWm9YcWJIWkdldyIsImtpZCI6Ii1LSTNROW5OUjd

Using model name: Llama-2-7b, version: 6, id: azureml://registries/msft-meta-preview/models/Llama-2-7b/versions/6 for inferencing


#### 3.2 Create Llama batch endpoint
This step may take a few minutes.

In [4]:
from azure.ai.ml.entities import BatchEndpoint
import logging
logging.basicConfig(level=logging.DEBUG)
endpoint_name = "bani-batch-4"
# Check if the endpoint already exists in the workspace
try:
    endpoint = ml_client.batch_endpoints.get(endpoint_name)
    print("---Endpoint already exists---")
except:
    # Create an batch endpoint if it doesn't exist

    # Define the endpoint
    endpoint = BatchEndpoint(
        name=endpoint_name, description="Test endpoint for model"
    )

    # Trigger the endpoint creation
    try:
        ml_client.begin_create_or_update(endpoint).wait()
        print("\n---Endpoint created successfully---\n")
    except Exception as err:
        raise RuntimeError(
            f"Endpoint creation failed. Detailed Response:\n{err}"
        ) from err

DEBUG:azure.identity._internal.decorators:AzureCliCredential.get_token succeeded
DEBUG:azure.identity._internal.decorators:[Authenticated account] Client ID: 04b07795-8ddb-461a-bbee-02f9e1bf7b46. Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47. User Principal Name: banide@microsoft.com. Object ID (user): 558795a7-7d0c-4b80-b7c6-8f71e52a8bef
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from AzureCliCredential
DEBUG:azure.core.pipeline.policies._universal:Request URL: 'https://management.azure.com/subscriptions/b17253fa-f327-42d6-9686-f3e553e24763/resourceGroups/bani-llm/providers/Microsoft.MachineLearningServices/workspaces/bani-llm-ws/batchEndpoints/bani-batch-4?api-version=2022-05-01'
Request method: 'GET'
Request headers:
    'Accept': 'application/json'
    'x-ms-client-request-id': 'f653a3ae-20fd-11ee-836c-000d3ac5f120'
    'User-Agent': 'azure-ai-ml/1.8.0 azsdk-python-mgmt-machinelearningservices/0.1.0 Python/3.11.4 (Windows-10-10.0.22621-SP0)'


---Endpoint already exists---


##### 3.3 Create environment for Llama endpoint


In [5]:
from IPython.core.display import display, HTML
from azure.ai.ml.entities import Environment, BuildContext

try:
    env = ml_client.environments.get(environment_name, label="latest")
    print("---Environment already exists---")
except:
    print("---Creating environment---")
    env = Environment(
        name=environment_name, build=BuildContext(path="./llama-files/docker_env")
    )
    ml_client.environments.create_or_update(env)
    env = ml_client.environments.get(environment_name, label="latest")
    print("---Please use link below to check build status---")


display(
    HTML(
        f"""
             <a href="https://ml.azure.com/environments/{environment_name}/version/{env.version}?wsid=/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace}">
                Click here to check env build status in AML studio
             </a>
             """
    )
)

  from IPython.core.display import display, HTML
DEBUG:azure.identity._internal.decorators:AzureCliCredential.get_token succeeded
DEBUG:azure.identity._internal.decorators:[Authenticated account] Client ID: 04b07795-8ddb-461a-bbee-02f9e1bf7b46. Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47. User Principal Name: banide@microsoft.com. Object ID (user): 558795a7-7d0c-4b80-b7c6-8f71e52a8bef
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from AzureCliCredential
DEBUG:azure.core.pipeline.policies._universal:Request URL: 'https://management.azure.com/subscriptions/b17253fa-f327-42d6-9686-f3e553e24763/resourceGroups/bani-llm/providers/Microsoft.MachineLearningServices/workspaces/bani-llm-ws/environments/for-llama-env/versions?api-version=2023-04-01-preview&$orderBy=createdtime%20desc&$top=1'
Request method: 'GET'
Request headers:
    'Accept': 'application/json'
    'x-ms-client-request-id': 'fb2acadc-20fd-11ee-8dca-000d3ac5f120'
    'User-Agent': 'azure-ai-

---Environment already exists---


In [6]:
from azure.ai.ml import Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    ModelBatchDeployment,
    ModelBatchDeploymentSettings,
    Model,
    AmlCompute,
    Data,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction

compute_name = "gpu-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name,
        description="GPU cluster compute",
        size="Standard_Nc6",
        min_instances=0,
        max_instances=2,
    )
    ml_client.compute.begin_create_or_update(compute_cluster).result()

DEBUG:azure.identity._internal.decorators:AzureCliCredential.get_token succeeded
DEBUG:azure.identity._internal.decorators:[Authenticated account] Client ID: 04b07795-8ddb-461a-bbee-02f9e1bf7b46. Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47. User Principal Name: banide@microsoft.com. Object ID (user): 558795a7-7d0c-4b80-b7c6-8f71e52a8bef
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from AzureCliCredential
DEBUG:azure.core.pipeline.policies._universal:Request URL: 'https://management.azure.com/subscriptions/b17253fa-f327-42d6-9686-f3e553e24763/resourceGroups/bani-llm/providers/Microsoft.MachineLearningServices/workspaces/bani-llm-ws/computes?api-version=2022-10-01-preview'
Request method: 'GET'
Request headers:
    'Accept': 'application/json'
    'x-ms-client-request-id': '00ddc9f3-20fe-11ee-9c23-000d3ac5f120'
    'User-Agent': 'azure-ai-ml/1.8.0 azsdk-python-mgmt-machinelearningservices/0.1.0 Python/3.11.4 (Windows-10-10.0.22621-SP0)'
    'Author

##### 3.4 Deploy Llama model
This step may take a few minutes.

In [9]:
import os
deployment = ModelBatchDeployment(
    name=deployment_name,
    endpoint_name=endpoint.name,
    model=llama_model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="llama-files/score",
        scoring_script="score_batch.py",
    ),
    compute=compute_name,
    environment_variables={
        "MLFLOW_MODEL_FOLDER": os.path.basename(llama_model.path),
        "CONTENT_SAFETY_ENDPOINT": "aacs_endpoint",
        "CONTENT_SAFETY_KEY": "aacs_access_key",
    },
    settings=ModelBatchDeploymentSettings(
        instance_count=1,
        max_concurrency_per_instance=1,
        mini_batch_size=1,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=3000),
        logging_level="info",
    ),
)
# Trigger the deployment creation
try:
    ml_client.begin_create_or_update(deployment).wait()
    print("\n---Deployment created successfully---\n")
except Exception as err:
    raise RuntimeError(
        f"Deployment creation failed. Detailed Response:\n{err}"
    ) from err

Class ModelBatchDeploymentSettings: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ModelBatchDeployment: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
DEBUG:azure.core.pipeline.policies._universal:Request URL: 'https://management.azure.com/subscriptions/b17253fa-f327-42d6-9686-f3e553e24763/resourceGroups/bani-llm/providers/Microsoft.MachineLearningServices/workspaces/bani-llm-ws/batchEndpoints/bani-batch-4?api-version=2022-05-01'
Request method: 'GET'
Request headers:
    'Accept': 'application/json'
    'x-ms-client-request-id': '2111aa0e-20fe-11ee-bf6b-000d3ac5f120'
    'User-Agent': 'azure-ai-ml/1.8.0 azsdk-python-mgmt-machinelearningservices/0.1.0 Python/3.11.4 (Windows-10-10.0.22621-SP0)'
    'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ii1LSTNROW5OUjdiUm9meG1lWm9YcWJIWkdldyIsImtpZCI6Ii1LSTNRO


---Deployment created successfully---



DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): westus-0.in.applicationinsights.azure.com:443
DEBUG:urllib3.connectionpool:https://westus-0.in.applicationinsights.azure.com:443 "POST //v2.1/track HTTP/1.1" 200 None


In [None]:
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 4. Test the Safety Enabled Llama batch endpoint.

In [None]:
data_path = "llama-files/data"
dataset_name = "summary-small"

summary_data = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of the  dataset for text generation, in CSV file format",
    name=dataset_name,
)

In [None]:
summary_data = ml_client.data.create_or_update(summary_data)

In [None]:
from time import sleep

print(f"Waiting for data asset {dataset_name}", end="")
while not any(filter(lambda m: m.name == dataset_name, ml_client.data.list())):
    sleep(10)
    print(".", end="")

print(" [DONE]")

In [None]:
summary_data = ml_client.data.get(name=dataset_name, label="latest")

In [None]:
input = Input(type=AssetTypes.URI_FOLDER, path=summary_data.id)

In [None]:

import os
from azure.ai.ml.entities import BatchEndpoint
endpoint_name = "bani-batch-4"
# Check if the endpoint already exists in the workspace
try:
    endpoint = ml_client.batch_endpoints.get(endpoint_name)
    print("---Endpoint already exists---")
except:
    # Create an batch endpoint if it doesn't exist

    # Define the endpoint
    endpoint = BatchEndpoint(
        name=endpoint_name, description="Test endpoint for model"
    )

    # Trigger the endpoint creation
    try:
        ml_client.begin_create_or_update(endpoint).wait()
        print("\n---Endpoint created successfully---\n")
    except Exception as err:
        raise RuntimeError(
            f"Endpoint creation failed. Detailed Response:\n{err}"
        ) from err
deployment = ModelBatchDeployment(
    name=deployment_name,
    endpoint_name=endpoint.name,
    model=llama_model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="llama-files/score",
        scoring_script="score_batch.py",
    ),
    compute=compute_name,
    environment_variables={
        "MLFLOW_MODEL_FOLDER": os.path.basename(llama_model.path),
        "CONTENT_SAFETY_ENDPOINT": aacs_endpoint,
        "CONTENT_SAFETY_KEY": aacs_access_key,
    },
    settings=ModelBatchDeploymentSettings(
        instance_count=1,
        max_concurrency_per_instance=1,
        mini_batch_size=1,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=3000),
        logging_level="info",
    ),
)
# Trigger the deployment creation
try:
    ml_client.begin_create_or_update(deployment).wait()
    print("\n---Deployment created successfully---\n")
except Exception as err:
    raise RuntimeError(
        f"Deployment creation failed. Detailed Response:\n{err}"
    ) from err

endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()



In [None]:
job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)

In [None]:
ml_client.batch_endpoints.begin_delete(endpoint.name).result()

In [None]:
ml_client.batch_deployments.get("bani-llama-dep", endpoint.name)