# How to create an Azure AI Content safety enabled LLaMA online endpoint
### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __LLaMA__ online endpoint.
### The steps are:
1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __LLaMA__ online endpoint.
2. Create a new __Azure AI Content Safety__ enabled __LLaMA__ online endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __LLaMA__ model and the request from the user, but to make the custom score.py to sucessfully autheticated to the __Azure AI Content Safety__ resource, we have 2 options:
    1. __UAI__, recommended but more complex approach, is to create a __User Assigned Identity (UAI)__ and assign appropriate roles to the __UAI__. Then, the custom score.py can obtain the access token of the __UAI__ from the AAD server to access the Azure AI Content Safety resource.
    2. __Environment variable__, simpler but less secure approach, is to just pass the access key of the __Azure AI Content Safety__ resource to the custom score.py via environment variable, then the custom score.py can use the key directly to access the Azure AI Content Safety resource, this option is less secure than the first option, if someone in your org has access to the endpoint, he/she can get the access key from the environment variable and use it to access the Azure AI Content Safety resource.
  

### 1. Prerequisites
#### 1.1 Check List:
- [x] You have created an new Python virtual environment for this notebook.
- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity.
- [x] Required If you choose to use the UAI approach, the identity executing this notebook (either yourself or your virtual machine) needs to have the owner role on the resource group that contains the specified AML Workspace. This is because the notebook will create a new UAI and assign the UAI some required roles to successfully create the Azure AI Content Safety enabled LLaMA endpoint.

#### 1.2 Install Dependencies(As need)

In [None]:
# uncomment the following lines to install the required packages
# %pip install azure-identity==1.13.0
# %pip install azure-mgmt-cognitiveservices==13.4.0
# %pip install azure-ai-ml==1.8.0
# %pip install azure-mgmt-msi==7.0.0
# %pip install azure-mgmt-authorization==3.0.0

#### 1.3 Get credential

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()


#### 1.4 Configure workspace 

In [None]:
from azure.ai.ml import MLClient

try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"

    # get a handle to the workspace
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)



subscription_id = ml_client.subscription_id
resource_group = ml_client.resource_group_name
workspace = ml_client.workspace_name

print(f"Connected to workspace {workspace}")

#### 1.4 Assign variables for the workspace and deployment

In [None]:
# The public registry name contains LLaMA models
registry_name="azureml-preview-test1"

# Name of the LLaMA model to be deployed
# available_llama_models_pre_trained = ["Llama-2-7b", "Llama-2-13b"]
# available_llama_models_fine_tuned = ["Llama-2-7b-chat", "Llama-2-13b-chat"]
model_name="Llama-2-13b"

endpoint_name="llama-cs-test-guid1" # Replace with your endpoint name
deployment_name="llama-2-7b-2" # Replace with your deployment name
sku_name="Standard_NC24s_v3" # Name of the sku(instance type) Check the model-list(can be found in the parent folder(inference)) to get the most optimal sku for your model (Default: Standard_DS2_v2)

environment_name=f"{endpoint_name}-env" # Replace with your environment name


#### 1.5 Assign variables for Azure Content Safety
Currently, Azure AI Content Safety only available in limited regions:


__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please aware of that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution.

In [None]:
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
acs_client = CognitiveServicesManagementClient(credential, subscription_id)



# settings for the Azure AI Content Safety resource
# we will choose existing AACS resource if it exists, otherwise create a new one
# name of azure ai content safety resource, has to be unique
import time
aacs_name = f"{endpoint_name}-aacs-{str(time.time()).replace('.','')}"
available_aacs_locations = ['east us', 'west europe']

# create a new Cognitive Services Account
kind = "ContentSafety"
aacs_sku_name = "S0"
aacs_location = available_aacs_locations[0]


print("Available SKUs:")
aacs_skus = acs_client.resource_skus.list()
print("SKU Name\tSKU Tier\tLocations")
for sku in aacs_skus:
    if sku.kind == "ContentSafety":
        locations = ",".join(sku.locations)
        print(sku.name + "\t" + sku.tier + "\t" + locations)

print(f"Choose a new Azure AI Content Safety resource in {aacs_location} with sku {aacs_sku_name}")

### 2. Create Azure AI Content Safety

In [None]:
from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties


parameters = Account(sku=Sku(name=aacs_sku_name), kind=kind, location=aacs_location, properties= AccountProperties(custom_sub_domain_name=aacs_name, public_network_access="Enabled"))
# How many seconds to wait between checking the status of an async operation.
wait_time = 10

def find_acs(accounts):
    return next(x for x in accounts if x.kind == 'ContentSafety' and x.location == aacs_location and x.sku.name == aacs_sku_name)

try:
    # check if AACS exists
    aacs = acs_client.accounts.get(resource_group, aacs_name)    
    print(f"Found existing Azure AI content safety Account {aacs.name}.")
except:    
    try:
        # check if there is an existing AACS resource within same resource group
        aacs = find_acs(acs_client.accounts.list_by_resource_group(resource_group))
        print(f"Found existing Azure AI content safety Account {aacs.name} in resource group {resource_group}.")
    except:
        print(f"Creating Azure AI content safety Account {aacs_name}.")
        acs_client.accounts.begin_create(resource_group, aacs_name, parameters).wait()
        print("Resource created.")
        aacs=acs_client.accounts.get(resource_group, aacs_name)



aacs_endpoint = aacs.properties.endpoint
aacs_resource_id = aacs.id
print(f"AACS endpoint is {aacs_endpoint}")
print(f"AACS ResourceId is {aacs_resource_id}")

aacs_access_key = acs_client.accounts.list_keys(resource_group_name=resource_group, account_name=aacs.name).key1
print(f"AACS access key is {aacs_access_key}")

### 3. Create Azure AI Content Safety enabled LLaMA online endpoint

#### 3.1 Check if LLaMA model is available in the AML registry.

In [None]:

reg_client = MLClient(credential, subscription_id=subscription_id, resource_group_name=resource_group, registry_name=registry_name)
version_list = list(reg_client.models.list(model_name)) # list available versions of the model
llama_model = None
if len(version_list) == 0:
    print("Model not found in registry")
else:
    model_version = version_list[0].version
    llama_model = reg_client.models.get(model_name, model_version)
    print(
        f"Using model name: {llama_model.name}, version: {llama_model.version}, id: {llama_model.id} for inferencing"
    )

#### 3.2 Create LLaMA online endpoint
This step may take a few minutes.

In [None]:
from azure.ai.ml.entities import ManagedOnlineEndpoint
# Check if the endpoint already exists in the workspace
try:
    endpoint = ml_client.online_endpoints.get(endpoint_name)
    print("---Endpoint already exists---")
except:
    # Create an online endpoint if it doesn't exist

    # Define the endpoint
    endpoint = ManagedOnlineEndpoint(name=endpoint_name, description="Test endpoint for model")

    # Trigger the endpoint creation
    try:
        ml_client.begin_create_or_update(endpoint).wait()
        print("\n---Endpoint created successfully---\n")
    except Exception as err:
        raise RuntimeError(f"Endpoint creation failed. Detailed Response:\n{err}") from err

##### 3.3 Create environment for LLaMA endpoint


In [None]:
from azure.ai.ml.entities import (
    Environment,
    BuildContext
)
try:
    env = ml_client.environments.get(environment_name, label="latest")
    print("---Environment already exists---")
except:
    print("---Creating environment---")
    env = Environment(name = environment_name, build= BuildContext(path='./docker_env') )
    ml_client.environments.create_or_update(env)
   

##### 3.4 Deploy LLaMA model
This step may take a few minutes.

In [None]:
from azure.ai.ml.entities import (
    CodeConfiguration,
    OnlineRequestSettings,
    ManagedOnlineDeployment
)

# Define the deployment
# Update the model version as necessary
deployment = ManagedOnlineDeployment(
    name=deployment_name,
    endpoint_name=endpoint_name,
    model=llama_model.id,
    instance_type=sku_name,
    instance_count=1,
    code_configuration=CodeConfiguration(
        code="./score", scoring_script="score.py"
    ),
    environment = env,
    environment_variables= {
        "MLFLOW_MODEL_FOLDER":"mlflow_model_folder",
        "CONTENT_SAFETY_ENDPOINT": aacs_endpoint,
        "CONTENT_SAFETY_KEY": aacs_access_key
    },
    request_settings= OnlineRequestSettings(request_timeout_ms=90000)
)

# Trigger the deployment creation
try:
    ml_client.begin_create_or_update(deployment).wait()
    print("\n---Deployment created successfully---\n")
except Exception as err:
    raise RuntimeError(f"Deployment creation failed. Detailed Response:\n{err}") from err


### 4. Test the Safety Enabled LLaMA online endpoint.

In [None]:
import os

test_src_dir = "./safety-llama-test"
os.makedirs(test_src_dir, exist_ok=True)
print(f"test script directory: {test_src_dir}")
sample_data = os.path.join(test_src_dir, "sample-request.json")

In [None]:
import json
with open(sample_data, "w") as f:
    json.dump({
      "input_data": {
        "columns": [
          "input_string",
          "input_string",
          "input_string"
        ],
        "index": [0, 1, 2],
        "data": [
          "Hello",
          "My name is John and I ",
          "I wanna kill you" #This line contains hateful message and endpoint will return empty string as response
        ]
      }
    }, f)
      


In [None]:
ml_client.online_endpoints.invoke(
        endpoint_name=endpoint_name, 
        deployment_name=deployment_name,
        request_file=sample_data)