# Ensuring no Private Identifiable Information is leaked

Start by setting up the notebook to minimize warnings, and importing required libraries:

<img src="shared_data/PII.png" width="700"/>

In [1]:
# Warning control
import warnings
warnings.filterwarnings("ignore")
%env TOKENIZERS_PARALLELISM=true

env: TOKENIZERS_PARALLELISM=true


In [2]:
# Type hints
from typing import Optional, Any, Dict

# Standard imports
import time
from openai import OpenAI



# Presidio imports
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

# Guardrails imports
from guardrails import Guard, OnFailAction, install
from guardrails.validator_base import (
    FailResult,
    PassResult,
    ValidationResult,
    Validator,
    register_validator,
)


## Using Microsoft Presidio to detect PII

You'll use two components of Microsoft Presidio: the **analyzer**, which identifies PII in a given text, and the **anonymizer**, which can mask out PII in text:

In [3]:
presidio_analyzer = AnalyzerEngine()
presidio_anonymizer= AnonymizerEngine()

See the analyzer in action:

In [4]:
# First, let's analyze the text
text = "Thank you for contacting us, Sarah! I've located your account using the email address sarah.johnson@gmail.com. I can see that your order #45821 was shipped to 156 Maple Street Austin. Is there anything else I can help you with today?"
analysis = presidio_analyzer.analyze(text, language='en')

In [9]:
analysis

[type: EMAIL_ADDRESS, start: 86, end: 109, score: 1.0,
 type: PERSON, start: 29, end: 34, score: 0.85,
 type: LOCATION, start: 163, end: 175, score: 0.85,
 type: LOCATION, start: 163, end: 182, score: 0.85,
 type: DATE_TIME, start: 227, end: 232, score: 0.85,
 type: URL, start: 86, end: 94, score: 0.5,
 type: URL, start: 100, end: 109, score: 0.5,
 type: IN_PAN, start: 14, end: 24, score: 0.05]

Now try out the anonymizer:

In [10]:
# Then, we can anonymize the text using the analysis output
print(presidio_anonymizer.anonymize(text=text, analyzer_results=analysis))

text: Thank you for <IN_PAN> us, <PERSON>! I've located your account using the email address <EMAIL_ADDRESS>. I can see that your order #45821 was shipped to 156 <LOCATION>. Is there anything else I can help you with <DATE_TIME>?
items:
[
    {'start': 211, 'end': 222, 'entity_type': 'DATE_TIME', 'text': '<DATE_TIME>', 'operator': 'replace'},
    {'start': 156, 'end': 166, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'},
    {'start': 87, 'end': 102, 'entity_type': 'EMAIL_ADDRESS', 'text': '<EMAIL_ADDRESS>', 'operator': 'replace'},
    {'start': 27, 'end': 35, 'entity_type': 'PERSON', 'text': '<PERSON>', 'operator': 'replace'},
    {'start': 14, 'end': 22, 'entity_type': 'IN_PAN', 'text': '<IN_PAN>', 'operator': 'replace'}
]



## Real Time Stream Validation

Here you'll use the DetectPII guard to anonymize text generated by an LLM in real time! 

First, set up a new guard that uses the pii_entities guard to validate the **output** of the LLM. This time, you'll set `on_fail` to fix, which will replace the detected PII before it is shown to the user:

In [11]:
from guardrails.hub import DetectPII

guard = Guard().use(
    DetectPII(pii_entities=["PHONE_NUMBER", "EMAIL_ADDRESS"], on_fail="fix")
)

Now use the guard in a call to an LLM to anonymize the output. You'll use the `stream=True` to use the validator on each LLM chunk and replace PII before it is shown to the user:

In [12]:
from IPython.display import clear_output

validated_llm_req = guard(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a chatbot."},
        {
            "role": "user",
            "content": "Write a short 2-sentence paragraph about an unnamed protagonist while interspersing some made-up 10 digit phone numbers for the protagonist.",
        },
    ],
    stream=True,
)

validated_output = ""
for chunk in validated_llm_req:
    clear_output(wait=True)
    validated_output = "".join([validated_output, chunk.validated_output])
    print(validated_output)
    time.sleep(1)

The protagonist wandered the bustling city streets, lost in thought as they dialed <PHONE_NUMBER> on their phone, hoping for a connection that would bring clarity to their troubled mind. With each unanswered call to <PHONE_NUMBER>, they felt a sense of isolation growing within them, a reminder of the distance between themselves and the world around them.
