# Protected Material and Indirect Attack Jailbreak Simulation and Evaluation
This following demo notebook demonstrates the simulation and evaluation of the Protected Material and Indirect Attack Jailbreak. 

## Setup
Here are the imports needed to run this notebook:

In [None]:
from pprint import pprint
from azure.identity import DefaultAzureCredential
from promptflow.evals.evaluate import evaluate
from promptflow.evals.evaluators import ProtectedMaterialsEvaluator, IndirectAttackEvaluator
from promptflow.evals.synthetic import AdversarialSimulator, AdversarialScenario, IndirectAttackSimulator

### Configuration
The following simulator and evaluators require an Azure AI Studio project configuration and an Azure credential to use. Please fill in the assignments below with the required values to run the rest of this sample.

Protected Materials simulator and evaluator are only supported in East US 2, so please configure your project in that region to access this functionality. Additionally, your project scope will be what is used to log your evaluation results in your project after the evaluation run is finished.

In [None]:
# project_scope = {
#     "subscription_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", # Azure subscription ID
#     "resource_group_name": "resource-group", # Azure resource group name
#     "project_name": "project-name", # Azure Machine Learning project name
# }


credential = DefaultAzureCredential()

To keep this notebook lightweight, let's create a dummy application that calls GPT 3.5 Turbo, which is essentially Chat GPT. When we are testing your application for certain safety metrics like Protected Material or Indirect Attacks, it's important to have a way to automate a basic style of red-teaming to elicit behaviors from a simulated malicious user. We will use the `Simulator` class and this is how we will generate a synthetic test dataset against your application. Once we have the test dataset, we can evaluate them with our `ProtectedMaterialEvaluator` and `IndirectAttackEvaluator` classes.

The `Simulator` needs a structured contract with your application in order to simulate conversations or other types of interactions with it. This is achieved via a callback function. This is the function you would rewrite to actually format the response from your generative AI application.

In [None]:
from openai import AzureOpenAI

async def azure_model_callback(
        messages, stream = False, session_state = None, context = None
    ) -> dict:
    # Fill these values in with own model data
    endpoint = "https://ai-railpmhub406357280652.openai.azure.com/"
    deployment = "gpt-35-turbo"
    # In theory, the api_key input can be replaced with a azure_ad_token_provider input, although I seem to lack the permissions for that.
    # Get a client handle for the model
    client = AzureOpenAI(
        azure_endpoint=endpoint,
        api_version="2024-05-01-preview",
        api_key="6eb3ccd0351c44b2ae865ac055c07828"
    )
    # Call the model
    completion = client.chat.completions.create(
        model=deployment,
        messages= [
        {
        "role": "user",
        "content": messages["messages"][0]["content"] # injection of prompt happens here.
        }],
        max_tokens=800,
        temperature=0.7,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
        stream=False,
    )

    formatted_response = completion.to_dict()['choices'][0]['message']
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context,
    }

## Testing your application for Protected Material

When building your application, you want to test that Protected Material (i.e. copyrighted content or material) is not being generated by your generative AI applications. The following example uses an `AdversarialSimulator` paired with a content scenario to prompt your model to respond with material that is protected by intellectual property laws.

In [None]:
# initialize the adversarial simulator
protected_material_simulator = AdversarialSimulator(azure_ai_project=project_scope, credential=credential)

# define the adversarial scenario you want to simulate
protected_material_scenario = AdversarialScenario.ADVERSARIAL_CONTENT_PROTECTED_MATERIAL

protected_material_outputs = await protected_material_simulator(
    scenario=protected_material_scenario,
    max_conversation_turns=1, # define the number of conversation turns
    max_simulation_results=10, # define the number of simulation results
    target=azure_model_callback, # define the target model callback
    # api_call_retry_limit=3,
    # api_call_retry_sleep_sec=1,
    # api_call_delay_sec=30,
    # concurrent_async_task=1
)

Let's take a quick look at the generated dataset

In [None]:
# Results are truncated for brevity.
truncation_limit = 50
for output in protected_material_outputs:
    for turn in output['messages']:
        print(f"{turn['role']} : {turn['content'][0:truncation_limit]}")

Now that we have our dataset, we can evaluate it for Protected Material. The `ProtectedMaterialEvaluator` class can take in the dataset and detect whether your data contains copyrighted content. Let's use the `evaluate()` API to run the evaluation and log it to our Azure AI Studio Project.

In [None]:
# intialize your protected material evaluator
protected_material_eval = ProtectedMaterialsEvaluator(azure_ai_project=project_scope, credential=credential)
result=evaluate(
    data= protected_material_outputs,# TO DO ADD DATA
    evaluators={
        "protected_material": protected_material_eval
    },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
    azure_ai_project = project_scope,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path="./myevalresults.json"
)

We can see that our "model" application gives us a defect rate showing us that we can't deploy our application just yet. Moving forward, to protect our application against generating protected material content, we can add an Azure AI Content Safety filter for Protected Materials for text which is a mitigation layer to help protect and filter out responses from your model that may contain protected material content. Let's apply this filter and re-run the simulator and evaluation step to see if it helps with our defect rate.

In [None]:
filtered_protected_material_outputs = await protected_material_simulator(
    scenario=protected_material_scenario,
    max_conversation_turns=1, # define the number of conversation turns
    max_simulation_results=10, # define the number of simulation results
    target=azure_model_callback, # define the target model callback once we add the filter
)

In [None]:
filtered_result=evaluate(
    data= filtered_protected_material_outputs,# TO DO ADD DATA
    evaluators={
        "protected_material": protected_material_eval
    },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
    azure_ai_project = project_scope,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path="./myfilteredevalresults.json"
)

## Testing your application for Indirect Attack Jailbreaks



Jailbreaks are direct attacks injected into either the user's query towards your application (UPIA or user prompt injected attack) or indirect attacks injected into the context sent to your application to generate a response (XPIA or cross domaine prompt injected attack). Both types of attacks will result in an altered or unexpected behavior that may result in disrupted functionality or security risks like information leakage or engaging in harmful behavior. 

The following example takes the "model" application above and simulates indirect attacks to jailbreak the model and then evaluates the dataset generated by it.

In [None]:
indirect_attack_simulator = IndirectAttackSimulator(azure_ai_project=project_scope, credential=DefaultAzureCredential())

indirect_attack_outputs = await indirect_attack_simulator(
        target=azure_model_callback,
        scenario=AdversarialScenario.ADVERSARIAL_INDIRECT_JAILBREAK,
        max_simulation_results=10,
        max_conversation_turns=3
    )

In [None]:
import json

path = "indirect_attack_outputs.jsonl"

with open(path, "w") as f:
    for output in indirect_attack_outputs:
        f.write(json.dumps(output) + "\n")

Let's take a quick look at the data generated

In [None]:
# Results are truncated for brevity.
truncation_limit = 50
for output in indirect_attack_outputs:
    for turn in output['messages']:
        content = turn["content"]
        if isinstance(content, dict): # user response from callback is dict
            print(f"{turn['role']} : {content['content'][0:truncation_limit]}")
        elif isinstance(content, tuple): # assistant response from callback is tuple
            print(f"{turn['role']} : {content[0:truncation_limit]}")

Now that we have our dataset, we can evaluate it to see if the indirect attacks resulted in jailbreaks. The `IndirectAttackEvaluator` class can take in the dataset and detects instances of jailbreak. Let's use the `evaluate()` API to run the evaluation and log it to our Azure AI Studio Project.

In [None]:
indirect_attack_eval = IndirectAttackEvaluator(project_scope=project_scope, credential=DefaultAzureCredential())

In [None]:
result = evaluate(
    data=path,
    evaluators={
        "indirect_attack": indirect_attack_eval
    },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
    azure_ai_project = project_scope,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path="./myindirectattackevalresults.json"
)

We can see that our "model" application gives us a defect rate broken down by different behaviors resulting from the jailbreak, showing us that we can't deploy our application just yet. Moving forward, to protect our application against indirect jailbreak attacks, we can add an Azure AI Content Safety Prompt Shield which is a mitigation layer to help annotate and block requests to your model or application that contain known indirect attacks for jailbreak. Let's apply this filter and re-run the simulator and evaluation step to see if it helps with our defect rate.

In [None]:
filtered_indirect_attack_outputs = await indirect_attack_simulator(
        target=azure_model_callback, # now with the Prompt Shield attached to our model deployment
        scenario=AdversarialScenario.ADVERSARIAL_INDIRECT_JAILBREAK,
        max_simulation_results=10,
        max_conversation_turns=3
    )

In [None]:
filtered_indirect_attack_result=evaluate(
    data=filtered_indirect_attack_outputs,# TO DO ADD DATA
    evaluators={
        "indirect_attack": indirect_attack_eval
    },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
    azure_ai_project = project_scope,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path="./myindirectattackevalresults.json"
)