# Demo - PII Chekcer

In this demo, we show how to create a chain using Azure OpenAI service to prevent exposure of PII information.

We are creating a process which uses Azure OpenAI to self-evaluate the answer, identify potential PII information and remediate before the answer is returned to end user.

# Generate Summaries for extracted documents

In this demo, we show how to generate one summary for a large document. The answer generated will likely include some PII data like address or phone number.


## Step 1: Generate a summary from existing large document

In [2]:
import json
from langchain.llms import AzureOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
import os
from dotenv import load_dotenv

load_dotenv()
file = 'data/SAMPLE-RESIDENTIAL-REPORT.json'

llm = AzureOpenAI(temperature=0, deployment_name=os.getenv('DEPLOYMENT_NAME'), max_tokens=1000)

with open(file) as f:
    file_json = json.loads(f.read())

docs = [Document(page_content = page['page_content']) for page in file_json['content']]


In [3]:
from langchain.prompts import PromptTemplate
prompt_template = """Below is a environmental analysis report, review the details and summerize claim details, cost and action plan:


"{text}"


CONCISE SUMMARY:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
summary_chain = load_summarize_chain(llm, 
                             chain_type="map_reduce", 
                             map_prompt=PROMPT, 
                             combine_prompt=PROMPT)
summary_response = summary_chain({"input_documents": docs}, return_only_outputs=True)
print(summary_response['output_text'])

 Kaizen Safety Solutions, LLC conducted an environmental analysis of 123 Main Street, a single family residence that had been damaged by a fire. The analysis found that the air quality was within acceptable limits, but that there were elevated levels of asbestos and lead present in the soil. The estimated cost of the project is $25,000, and the action plan includes the removal and replacement of the soil, as well as monitoring of the area for asbestos and lead levels.


## Step 2: We review the summary and identify any PII data

In this step, we ask GPT model to evaluate the answer generated in previous step.

In [4]:
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
pii_prompt_template = """Review following text and create a bullet list of any personally identifiable information (PII) such as names, addresses, phone numbers, email addresses, social security numbers, or any other information that could be used to identify an individual.


"{text}"

PII LIST:"""
PII_PROMPT = PromptTemplate(template=pii_prompt_template, input_variables=["text"])
pii_chain = LLMChain(llm=llm, 
                     prompt=PII_PROMPT, 
                     verbose=False,
                     output_key='PII')
response = pii_chain.predict(text=summary_response['output_text'])
print(response)


- 123 Main Street (address)


## Final Step: We revise the summary and redact PII information from the summary

This is remediation step, where we ask GPT to rewrite the answer and remove the PII data identified in the previous steps.

In [5]:
final_prompt_template = """Use the PII list, redact all PII information in the summary and rewrite the summary:


Summary:
"{text}"

PII List:
"{PII}"

REWRITTEN SUMMARY:"""
FINAL_PROMPT = PromptTemplate(template=final_prompt_template, input_variables=["text", "PII"])
final_chain = LLMChain(llm=llm, 
                     prompt=FINAL_PROMPT, 
                     output_key='revised_summary',
                     verbose=False)
revise_summary = final_chain.predict(text=summary_response['output_text'], PII=response)
print(revise_summary)


Kaizen Safety Solutions, LLC conducted an environmental analysis of a single family residence that had been damaged by a fire. The analysis found that the air quality was within acceptable limits, but that there were elevated levels of asbestos and lead present in the soil. The estimated cost of the project is $25,000, and the action plan includes the removal and replacement of the soil, as well as monitoring of the area for asbestos and lead levels.
