# How to create an Azure AI Content Safety enabled Llama 2 batch endpoint (Preview)
### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __Llama 2__ batch endpoint.
### This notebook is under preview
### The steps are:
1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __Llama 2__ batch endpoint.
2. Create a new __Azure AI Content Safety__ enabled __Llama 2__ batch endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Llama 2__ model and the request from the user, but to make the custom score.py to successfully authenticated to the __Azure AI Content Safety__ resource, for batch inferencing is using __Environment variable__ to pass the access key of the __Azure AI Content Safety__ resource to the custom score.py via environment variable, then the custom score.py can use the key directly to access the Azure AI Content Safety resource, this option is less secure than the first option, if someone in your org has access to the endpoint, he/she can get the access key from the environment variable and use it to access the Azure AI Content Safety resource.
  

### 1. Prerequisites
#### 1.1 Check List:
- [x] You have created a new Python virtual environment for this notebook.
- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity.


#### 1.2 Assign variables for the workspace and deployment

In [None]:
# The public registry name contains Llama 2 models
registry_name = "azureml-meta"

# Name of the Llama 2 model to be deployed
# available_llama_models_text_generation = ["Llama-2-7b", "Llama-2-13b", "Llama-2-70b"]
# use the appropriate model name that is suitable for your workload below, this example shows Llama-2-7b
model_name = "Llama-2-7b"
# This notebook has been tested with "Llama-2-7b version" "4", "Llama-2-13b" version 4, and "Llama-2-70b" version 4
import random

endpoint_name = f"batch-{random.randint(0,10000)}"  # Replace with your endpoint name
deployment_name = "batch-dep"  # Replace with your deployment name, lower case only!!!
sku_name = "Standard_ND40rs_v2"  # Name of the sku(instance type) Check the model-list(can be found in the parent folder(inference)) to get the most optimal sku for your model (Default: Standard_DS2_v2)

environment_name = "llama-model-env"  # Replace with your environment name
compute_name = "nd40-src"

#### 1.3 Install Dependencies(as needed)

In [None]:
# uncomment the following lines to install the required packages
# %pip install azure-identity==1.13.0
# %pip install azure-mgmt-cognitiveservices==13.4.0
# %pip install azure-ai-ml==1.8.0
# %pip install azure-mgmt-msi==7.0.0
# %pip install azure-mgmt-authorization==3.0.0

#### 1.4 All required Imports


In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties
from IPython.core.display import display, HTML
from azure.ai.ml import Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    ModelBatchDeployment,
    ModelBatchDeploymentSettings,
    Model,
    AmlCompute,
    Data,
    BuildContext,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)

#### 1.5 Get credential

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

#### 1.6 Configure workspace 

In [None]:
try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # enter details of your AML workspace
    subscription_id = "subscription_id"
    resource_group = "resource_group"
    workspace = "workspace"

    # get a handle to the workspace
    ml_client = MLClient(
        credential,
        subscription_id,
        resource_group,
        workspace,
        logging_enable=True,
    )


subscription_id = ml_client.subscription_id
resource_group = ml_client.resource_group_name
workspace = ml_client.workspace_name

print(f"Connected to workspace {workspace}")

#### 1.7 Assign variables for Azure Content Safety
Currently, Azure AI Content Safety is in a limited set of regions:


__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please be aware that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution.

In [None]:
acs_client = CognitiveServicesManagementClient(credential, subscription_id)


# settings for the Azure AI Content Safety resource
# we will choose existing AACS resource if it exists, otherwise create a new one
# name of azure ai content safety resource, has to be unique
import time

aacs_name = f"{endpoint_name}-aacs"
available_aacs_locations = ["east us", "west europe"]

# create a new Cognitive Services Account
kind = "ContentSafety"
aacs_sku_name = "S0"
aacs_location = available_aacs_locations[0]


print("Available SKUs:")
aacs_skus = acs_client.resource_skus.list()
print("SKU Name\tSKU Tier\tLocations")
for sku in aacs_skus:
    if sku.kind == "ContentSafety":
        locations = ",".join(sku.locations)
        print(sku.name + "\t" + sku.tier + "\t" + locations)

print(
    f"Choose a new Azure AI Content Safety resource in {aacs_location} with SKU {aacs_sku_name}"
)

### 2. Create Azure AI Content Safety

In [None]:
parameters = Account(
    sku=Sku(name=aacs_sku_name),
    kind=kind,
    location=aacs_location,
    properties=AccountProperties(
        custom_sub_domain_name=aacs_name, public_network_access="Enabled"
    ),
)
# How many seconds to wait between checking the status of an async operation.
wait_time = 10


def find_acs(accounts):
    return next(
        x
        for x in accounts
        if x.kind == "ContentSafety"
        and x.location == aacs_location
        and x.sku.name == aacs_sku_name
    )


try:
    # check if AACS exists
    aacs = acs_client.accounts.get(resource_group, aacs_name)
    print(f"Found existing Azure AI content safety Account {aacs.name}.")
except:
    try:
        # check if there is an existing AACS resource within same resource group
        aacs = find_acs(acs_client.accounts.list_by_resource_group(resource_group))
        print(
            f"Found existing Azure AI content safety Account {aacs.name} in resource group {resource_group}."
        )
    except:
        print(f"Creating Azure AI content safety Account {aacs_name}.")
        acs_client.accounts.begin_create(resource_group, aacs_name, parameters).wait()
        print("Resource created.")
        aacs = acs_client.accounts.get(resource_group, aacs_name)


aacs_endpoint = aacs.properties.endpoint
aacs_resource_id = aacs.id
print(f"AACS endpoint is {aacs_endpoint}")
print(f"AACS ResourceId is {aacs_resource_id}")

aacs_access_key = acs_client.accounts.list_keys(
    resource_group_name=resource_group, account_name=aacs.name
).key1

### 3. Create Azure AI Content Safety enabled Llama 2 batch endpoint

#### 3.1 Check if Llama 2 model is available in the AML registry.

In [None]:
reg_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    registry_name=registry_name,
)
version_list = list(
    reg_client.models.list(model_name)
)  # list available versions of the model
llama_model = None
if len(version_list) == 0:
    raise Exception(f"No model named {model_name} found in registry")
else:
    model_version = "4"
    llama_model = reg_client.models.get(model_name, model_version)
    print(
        f"Using model name: {llama_model.name}, version: {llama_model.version}, id: {llama_model.id} for inferencing"
    )

##### 3.2 Create environment for Llama 2 endpoint


In [None]:
try:
    env = ml_client.environments.get(environment_name, label="latest")
    print("---Environment already exists---")
except:
    print("---Creating environment---")
    env = Environment(
        name=environment_name, build=BuildContext(path="./llama-files/docker_env")
    )
    ml_client.environments.create_or_update(env)
    env = ml_client.environments.get(environment_name, label="latest")
    print("---Please use link below to check build status---")


display(
    HTML(
        f"""
             <a href="https://ml.azure.com/environments/{environment_name}/version/{env.version}?wsid=/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace}">
                Click here to check env build status in AML studio
             </a>
             """
    )
)

##### 3.3 Create compute cluster to run batch job on


In [None]:
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction

if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name,
        size=sku_name,
        min_instances=0,
        max_instances=2,
    )
    ml_client.compute.begin_create_or_update(compute_cluster).result()

#### 3.4 Create Llama 2 batch endpoint
This step may take a few minutes.

In [None]:
from azure.ai.ml.entities import BatchEndpoint

# Check if the endpoint already exists in the workspace
try:
    endpoint = ml_client.batch_endpoints.get(endpoint_name)
    print("---Endpoint already exists---")
except:
    # Create an batch endpoint if it doesn't exist

    # Define the endpoint
    endpoint = BatchEndpoint(name=endpoint_name, description="Test endpoint for model")

    # Trigger the endpoint creation
    try:
        ml_client.begin_create_or_update(endpoint).wait()
        print("\n---Endpoint created successfully---\n")
    except Exception as err:
        raise RuntimeError(
            f"Endpoint creation failed. Detailed Response:\n{err}"
        ) from err

##### 3.5 Deploy Llama 2 model
This step may take a few minutes.

In [None]:
deployment = ModelBatchDeployment(
    name=deployment_name,
    endpoint_name=endpoint.name,
    model=llama_model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="llama-files/score/default",
        scoring_script="score_batch.py",
    ),
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        instance_count=1,
        max_concurrency_per_instance=1,
        mini_batch_size=1,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=3000),
        logging_level="info",
        environment_variables={
            "CONTENT_SAFETY_ENDPOINT": aacs_endpoint,
            "CONTENT_SAFETY_KEY": aacs_access_key,
        },
    ),
)
# Trigger the deployment creation
try:
    ml_client.begin_create_or_update(deployment).wait()
    print("\n---Deployment created successfully---\n")
except Exception as err:
    raise RuntimeError(
        f"Deployment creation failed. Detailed Response:\n{err}"
    ) from err

##### 3.6 Update Batch endpoint to set the default deployment
This step may take a few minutes.

In [None]:
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 4. Prepare to test.

#### 4.1 Input data preparation.

In [None]:
data_path = "llama-files/data"
dataset_name = "input-data-small"

input_data = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of the  dataset for text generation, in CSV file format",
    name=dataset_name,
)

In [None]:
input_data = ml_client.data.create_or_update(input_data)

In [None]:
from time import sleep

print(f"Waiting for data asset {dataset_name}", end="")
while not any(filter(lambda m: m.name == dataset_name, ml_client.data.list())):
    sleep(10)
    print(".", end="")

print(" [DONE]")

In [None]:
input_data = ml_client.data.get(name=dataset_name, label="latest")

In [None]:
input = Input(type=AssetTypes.URI_FOLDER, path=input_data.id)

#### 4.2 Invoke the endpoint

Let's now invoke the endpoint for batch scoring job:

In [None]:
job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)

#### 4.3 Get the details of the invoked job
Let us get details and logs of the invoked job

In [None]:
ml_client.jobs.get(job.name)

We can wait for the job to finish using the following code:

In [None]:
ml_client.jobs.stream(job.name)

#### 4.4 Download the results

The deployment creates a child job that executes the scoring. We can get the details of it using the following code:

In [None]:
scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]

In [None]:
print("Job name:", scoring_job.name)
print("Job status:", scoring_job.status)
print(
    "Job duration:",
    scoring_job.creation_context.last_modified_at
    - scoring_job.creation_context.created_at,
)

The outputs generated by the deployment job will be placed in an output named `score`:

In [None]:
ml_client.jobs.download(name=scoring_job.name, download_path=".", output_name="score")

The output file above will contain one line for each file, and each line will have multiple arrays corresponding to each line of the file. If you see [], that means ACS has stripped the response from the model

#### 4.5 Clean up Resources
Delete endpoint

In [None]:
ml_client.batch_endpoints.begin_delete(endpoint.name).result()