## Safety and Privacy with LangChain

In this notebook we will use our privacy (PII entity recognizer and anonymizer) and toxic text classifier `tensor-trek/distilbert-toxicity-classifier` along with LangChain to implement checks of the text going into our LLM and text generated by the LLM. For this example we will use HuggingFace Hub LLM with LangChain and a custom `PrivacyAndSafetyChain` chain that implements the two checks.


Let's install some dependencies first
- You will need `transformers`, PyTorch, `langchain`, `presidio-analyzer`, `presidio-anonymizer`, and `spacy` libraries
- You will also need the spacy `en_core_web_lg` model. You can also work with `en_core_web_md` model here.

In [None]:
!pip install pip-system-certs -q
!pip install openai langchain==0.1 transformers==4.28.0 -q
!pip install presidio-analyzer presidio-anonymizer spacy huggingface-hub -q
!python -m spacy download en_core_web_lg

In [5]:
import os
os.environ['CURL_CA_BUNDLE'] = ''
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
pd.set_option('display.max_colwidth', None)
import openai
from dotenv import load_dotenv, find_dotenv

In [6]:
_ = load_dotenv(find_dotenv())  # read local .env file

openai_api_version = '2023-08-01-preview'
model_deployment_name = os.getenv('MODEL_DEPLOYMENT_NAME')

## Import the `PrivacyAndSafetyChain` custom chain

The directory `PrivacyAndSafety` contains files that implements the custom Chain.
- The `privacy_and_safety.py` file contains a Subclass of the base LangChain `Chain` class
- The `check.py` file contains the actual toxic text classification and PII entity detection and anonymization

Let's import and initialize `PrivacyAndSafetyChain` first.

##  Configuration for `PrivacyAndSafetyChain`

We can customize the behavior of `PrivacyAndSafetyChain` via the following parameters

- `pii_mask_character`, the character used to perform anonymization of PII entities. Default is `*`
- `pii_labels` if you wish to specify a specific list of PII entity types, then a list of entity types. For a full list of PII entity labels refer [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/). Defaults to ALL entities.
- `fail_on_pii` a boolean flag which will make the chain fail if PII is detected. Defaults to `False`.
- `pii_threshold` the confidence score threshold for PII entity recognition. Defaults to 50%
- `toxicity_threshold` the confidence score threshold for toxicity classification. Defaults to 80%

In [7]:
from chains import PrivacyAndSafetyChain

safety_privacy = PrivacyAndSafetyChain(
    verbose=True,
    pii_mask_character="#",
    pii_labels = ["PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON", "US_SSN"],
    fail_on_pii = True,
    pii_threshold = 0.5,
    toxicity_threshold = 0.6
)

In [10]:
from openai import AzureOpenAI

# gets the API Key from environment variable AZURE_OPENAI_API_KEY
client = AzureOpenAI(
    api_version=openai_api_version,
)


# this is the name of the deployments you created in the Azure portal within the above resource
from typing import List, Dict

def get_chat_with_conversation(
        text,
        temperature: float = 0.2,
        **model_kwargs
) -> str:
    try:
        
        messages = [
            {"role": "system", "content": '"""'+ str(text) + '"""'}
        ]
        response = client.chat.completions.create(model=model_deployment_name,
                                                  messages=messages)
 
        return response.choices[0].message.content
    except openai.OpenAIError as e: # this is the base class of any openai exception
        print(f"The call to the Chat Completion API failed as a consequence "
              f"of the following exception: {e}")

        
def user_request():
    # Take request
    request = input("\nEnter an instruction"
                    "(or 'quit'):")
    if request.lower() == "quit":
        raise KeyboardInterrupt()
    return request

def user_reply_success(request,response):
    # Create and print user reply
    reply = f"{request}:\n{response}"
    print(reply)

# LangCahin Expression Language (LCEL)

<div>
<img src="../assets/LCEL1.png" width="500" align="left"/>&nbsp;&nbsp;<img src="../assets/LCEL2.png" width="540" align="right"/>
</div>


Now, let's wrap it up together and create our app using LangChain Expression Language (LCEL).

The chain is now containing :

<br>
<img src="../assets/PrivacySafetyChain.png" width="950" align="center">

In [11]:
from langchain import PromptTemplate

template = """{question}"""

prompt = PromptTemplate(template=template, input_variables=["question"])

chain = (
    prompt 
    | safety_privacy 
    | {"input": (lambda x: x['output'] )}
    | get_chat_with_conversation
    | safety_privacy 
)


while True : 
    try:
        input_user = user_request()
        response = chain.invoke({"question": input_user})
    except KeyboardInterrupt:
        print("\n\nQuit!")
        break
    except Exception as e:
        print(str(e))
    else:

        print("\n The answer is: \n", response['output'])



[1m> Entering new PrivacyAndSafetyChain chain...[0m
Running PrivacyAndSafetyChain...


tokenizer_config.json:   0%|          | 0.00/320 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Checking for Toxic content...
Toxic content found in text. Stopping...
Toxic content found in text. Stopping...



Quit!


## Examples of prompts:

##### I hate my coworkers, can you help me get them fired?


"""What is John Doe's email address, phone number and SSN from the following text?
John Doe, a resident of 1234 Elm Street in Springfield, recently celebrated his birthday on January 1st. Turning 43 this year, John reflected on the years gone by. He often shares memories of his younger days with his close friends through calls on his phone, (555) 123-4567. Meanwhile, during a casual evening, he received an email at johndoe@example.com reminding him of an old acquaintance's reunion. As he navigated through some old documents, he stumbled upon a paper that listed his SSN as 338-12-6789, reminding him to store it in a safer place."""




In [None]:
from chains import PrivacyAndSafetyChain

safety_privacy = PrivacyAndSafetyChain(
    verbose=True,
    pii_mask_character="#",
    pii_labels = ["PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON", "US_SSN"],
    fail_on_pii = False,
    pii_threshold = 0.5,
    toxicity_threshold = 0.6
)

In [None]:
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain

template = """{question}"""

prompt = PromptTemplate(template=template, input_variables=["question"])

chain = (
    prompt 
    | safety_privacy 
    | {"input": (lambda x: x['output'] )}
    | get_chat_with_conversation
    | safety_privacy 
)


while True : 
    try:
        input_user = user_request()
        response = chain.invoke({"question": input_user})
    except KeyboardInterrupt:
        print("\n\nQuit!")
        break
    except Exception as e:
        print(str(e))
    else:
        print("\n The answer is: \n", response['output'])