# Sensitive Data Rule Demonstration Notebook

In this notebook we will walk through the following steps in order to demonstrate what the Shield Sensitive Data Rule is capable of.

Pre  Requisites: 
- A Shield env and API key.

1. Create a new task for Sensitive Data evaluation 
2. Arthur Benchmark dataset evaluation (passwords dataset)
   1. Run the examples against a pre-configured Shield task from Step 1 
   2. View our results 

#### Configure Shield Test Env Details

In [None]:
%pip install datasets
%pip install scikit-learn
from datasets import load_dataset, concatenate_datasets
import pandas as pd
from os.path import abspath, join
import sys
from datetime import datetime 

utils_path = abspath(join('..', 'utils'))
if utils_path not in sys.path:
    sys.path.append(utils_path)

from shield_utils import setup_env, set_up_task_and_rule, run_shield_evaluation
from analysis_utils import print_performance_metrics, granular_result_dfs


pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

setup_env(base_url="<URL>", api_key="<API_KEY>")

---
### 1.Setup: Configure a test task and enable Sensitive Data Rule

In [None]:
sensitive_data_password_rule =  {
    "name": "Sensitive Data Password Rule",
    "type": "ModelSensitiveDataRule",
    "apply_to_prompt": True,
    "apply_to_response": False,
    "config": {
        "examples": [
            {
                "example": "Hi, Your password 5d9kpQDXRiVf for our online therapy portal has been set successfully.",
                "result": True
            },
            {
                "example": "Mr. Conroy's password, h78WwcpT3s6B has leaked online.",
                "result": True
            },
                    {
                "example": "During this time, your login credentials Remington31 and VXBDni7EO5od will be reset..",
                "result": True
            },
                    {
                "example": "Hey Agnes, your login details for the upcoming clinical trial are as follows: IP: 73ac:d53c:f2bb:94a5:86f3:1dca:e1ec:2b4e, Password: WlBi79j58UsW.",
                "result": True
            },
                    {
                "example": "Can't access database for financial report. Hassie34's password last updated to mWGhCVGWuB8o.",
                "result": True
            },
                    {
                "example": "Linnea_OConnell, please submit your final Psychopharmacology thesis for review by 11/32.",
                "result": False
            },
                    {
                "example": "Dear client, thank you for your payment of NT$200 for our acupuncture services.",
                "result": False
            },
                    {
                "example": "Surely, Nolan and Sons would not want to restrict its healthcare services to people less than 91 years old years of age in the 40753 region?",
                "result": False
            },
                    {
                "example": "The name tag for the new staff member, Angel has arrived. You will find it in the Apt. 634.",
                "result": False
            },
                    {
                "example": "Your password has been reset.",
                "result": False
            }
        ]
    }
}

# Create task, archive all rules except the one we pass, create the rule we pass 
sensitive_data_password_rule, sensitive_data_password_task = set_up_task_and_rule(sensitive_data_password_rule, "sensitive-data-password-task")

print(sensitive_data_password_rule)
print(sensitive_data_password_task)

---
### 2. Arthur benchmark dataset evaluation

**REMOVE THE SECOND LINE TO RUN ENTIRE DATASET EVALUATION - IT WILL TAKE TIME AND CONSUME AOAI CREDITS**

In [3]:
sensitive_data_password_benchmark_df_arthur = pd.read_csv("./arthur_benchmark_datasets/sensitive_data_password.csv")

sensitive_data_password_benchmark_df_arthur = sensitive_data_password_benchmark_df_arthur.head(10)

#### 2.1  Run the examples against a pre-configured Shield task from Step 1 

In [6]:
sensitive_data_password_benchmark_df_arthur = run_shield_evaluation(sensitive_data_password_benchmark_df_arthur, sensitive_data_password_task, sensitive_data_password_rule)
current_datetime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
sensitive_data_password_benchmark_df_arthur.to_csv(f"./results/sensitive_data_password_benchmark_df_arthur_{current_datetime}.csv")

#### 2.2 Analyze Results

In [None]:
print_performance_metrics(sensitive_data_password_benchmark_df_arthur)

arthur_fn, arthur_fp, arthur_tp, arthur_tn = granular_result_dfs(sensitive_data_password_benchmark_df_arthur)