# Prompt Injection Rule Demonstration Notebook

In this notebook we will walk through the following steps in order to demonstrate what the Shield Prompt Injection Rule is capable of.

- Role Play 
- Instruction Manipulation  

Pre  Requisites: 
- A Shield env and API key.

1. Create a new task for Prompt Injection evaluation 
2. Arthur Benchmark dataset evaluation 
   1. Run the examples against a pre-configured Shield task from Step 1 
   2. View our results 
3. Additional examples evaluation using datasets referenced in our documentation: https://shield.docs.arthur.ai/docs/prompt-injection#benchmarks
   1. Run the examples against a pre-configured Shield task from Step 1 
    2. View our results 

#### Configure Shield Test Env Details

In [None]:
%pip install datasets
%pip install scikit-learn
%pip install matplotlib

In [None]:
from datasets import load_dataset, concatenate_datasets
import pandas as pd
from os.path import abspath, join
import sys
from datetime import datetime

utils_path = abspath(join('..', 'utils'))
if utils_path not in sys.path:
    sys.path.append(utils_path)

from shield_utils import setup_env, set_up_task_and_rule, run_shield_evaluation
from analysis_utils import print_performance_metrics, granular_result_dfs

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

setup_env(base_url="<URL>", api_key="<API_KEY>")

---
### 1.Setup: Configure a test task and enable Prompt Injection Rule 

In [None]:
prompt_injection_rule_config =  {
    "name": "Prompt Injection Rule",
    "type": "PromptInjectionRule",
    "apply_to_prompt": True,
    "apply_to_response": False
}

# Create task, archive all rules except the one we pass, create the rule we pass 
prompt_injection_rule, prompt_injection_task = set_up_task_and_rule(prompt_injection_rule_config, "prompt-injection-task")

print(prompt_injection_rule)
print(prompt_injection_task)

---
### 2. Arthur benchmark dataset evaluation

In [13]:
prompt_injection_benchmark_df_arthur = pd.read_csv("./arthur_benchmark_datasets/prompt_injection_benchmark.csv")

prompt_injection_benchmark_df_arthur["label"] = prompt_injection_benchmark_df_arthur["labels"].map({0: False, 1: True})

#### 2.1  Run the examples against a pre-configured Shield task from Step 1 

In [None]:
prompt_injection_benchmark_df_arthur = run_shield_evaluation(prompt_injection_benchmark_df_arthur, prompt_injection_task, prompt_injection_rule)
current_datetime = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
prompt_injection_benchmark_df_arthur.to_csv(f"./results/prompt_injection_benchmark_df_arthur{current_datetime}.csv")

#### 2.2 Analyze Results

In [None]:
print_performance_metrics(prompt_injection_benchmark_df_arthur)

arthur_fn, arthur_fp, arthur_tp, arthur_tn = granular_result_dfs(prompt_injection_benchmark_df_arthur)

In [None]:
arthur_fp

---
### 3. Load benchmark datasets from https://shield.docs.arthur.ai/docs/prompt-injection#benchmarks

- **DISCLAIMER**: This is for demonstration and guidance purposes only and does not reflect the performance of the model behind the Shield score, as sampling techniques may not be optimal.
- **DISCLAIMER 2**: This dataset contains German and Arthur prompt injection is only trained on English. We do our best to filter it out, but some German may slip through. 

In [None]:
%pip install langdetect

In [10]:
from langdetect import detect

def is_english(text):
    try:
        lang = detect(text)
        return lang == 'en'
    except:
        return False

number_samples_from_each = 50

prompt_injections_dataset = load_dataset("deepset/prompt-injections")

prompt_injections_dataset_combined = concatenate_datasets([prompt_injections_dataset['train'], prompt_injections_dataset['test']])

prompt_injections_dataset_combined_df = prompt_injections_dataset_combined.to_pandas()

# Filter out non english entries 
prompt_injections_dataset_combined_df_en = prompt_injections_dataset_combined_df[prompt_injections_dataset_combined_df['text'].apply(lambda x: is_english(x))]

prompt_injections_dataset_combined_df_en_sample = prompt_injections_dataset_combined_df_en.sample(frac=0.5, random_state=33).head(number_samples_from_each).dropna()

prompt_injections_dataset_combined_df_en_sample['label'] = prompt_injections_dataset_combined_df_en_sample['label'].map({0: False, 1: True})

#### 3.1  Run the examples against a pre-configured Shield task from Step 1 

In [13]:
prompt_injections_dataset_combined_df_en_sample = run_shield_evaluation(prompt_injections_dataset_combined_df_en_sample, prompt_injection_task, prompt_injection_rule)
current_datetime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
prompt_injections_dataset_combined_df_en_sample.to_csv(f"./results/prompt_injections_dataset_combined_df_en_sample_{current_datetime}.csv")

#### 3.2 Analyze Results

In [None]:
print_performance_metrics(prompt_injections_dataset_combined_df_en_sample)

custom_fn, custom_fp, custom_tp, custom_tn = granular_result_dfs(prompt_injections_dataset_combined_df_en_sample)

In [None]:
custom_fn