# Detect Personally Identifiable Information (PII)

Overview: https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/overview?tabs=text-pii?wt.mc_id=MVP_322781

Supported Categories: https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/concepts/entity-categories?source=recommendations?wt.mc_id=MVP_322781

## Install Library

In [None]:
%pip install azure-ai-textanalytics

## Load Azure Configurations

In [1]:
import os

# Load Azure configurations from environment variables
# Ensure that AZURE_AI_LANGUAGE_KEY and AZURE_AI_LANGUAGE_ENDPOINT are set in your environment
language_key = os.environ.get('AZURE_AI_LANGUAGE_KEY')
language_endpoint = os.environ.get('AZURE_AI_LANGUAGE_ENDPOINT')

## Create a Text Analytics client

In [2]:
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using Azure Key and Endpoint
def authenticate_client():
    """
    Authenticates the Azure Text Analytics client using the provided key and endpoint.

    Returns:
        TextAnalyticsClient: An authenticated client for Azure Text Analytics.
    """
    ta_credential = AzureKeyCredential(language_key)
    text_analytics_client = TextAnalyticsClient(
        endpoint=language_endpoint,
        credential=ta_credential
    )
    return text_analytics_client

# Initialize the client
client = authenticate_client()

## Recognize PII Entities function

In [None]:
def pii_recognition(client, documents):
    """
    Recognizes Personally Identifiable Information (PII) entities in the provided documents.

    Args:
        client (TextAnalyticsClient): The authenticated Azure Text Analytics client.
        documents (list): A list of documents to analyze.

    Returns:
        None
    """
    # Call the Azure Text Analytics API to recognize PII entities
    response = client.recognize_pii_entities(documents, language="en")
    
    # Filter out documents that encountered errors during processing
    result = [doc for doc in response if not doc.is_error]
    
    # Iterate over the successfully processed documents
    for doc_idx, doc in enumerate(result, start=1):  # Start document numbering from 1
        print(f"Document #{doc_idx}:")
        print(f"Redacted Text: {doc.redacted_text}")
        for entity in doc.entities:
            print(f"Entity: {entity.text}")
            print(f"\tCategory: {entity.category}")
            print(f"\tConfidence Score: {entity.confidence_score}")
            print(f"\tOffset: {entity.offset}")
            print(f"\tLength: {entity.length}")
        print("\n")  # Add a newline for better readability between documents

In [10]:
documents = [
        "Jane Smith's SSN is 859-98-0987 and her email is jane.smith@gmail.com",
        "The employee's phone number is 555-555-5555.",
        """Jane Smith, a freelancer working with Global Finance Corp., uses the ABA routing number 021000021 for her bank transactions. 
        For international transfers, her bank's SWIFT code is CHASUS33. 
        She uses a credit card with the number 4111 1111 1111 1111. 
        Her International Banking Account Number (IBAN) is GB29 NWBK 6016 1331 9268 19. 
        Additionally, her Social Security Number (SSN), which is a government and country-specific identification in the United States, is 123-45-6789.
        Jane's passport number is 123456789, and her driver's license number is D1234567.
        """,
        """John Doe, an employee at Tech Solutions Inc., can be reached at (123) 456-7890. 
        His office is located at 1234 Elm Street, Springfield, IL 62704. 
        You can email him at john.doe@techsolutions.com or visit the company's website at https://www.techsolutions.com. 
        His work computer's IP Address is 192.168.1.1. He was born on January 15, 1985, making his age 40. 
        For financial transactions, his account number is 123456789.
        """

    ]

In [None]:
pii_recognition(client, documents)

Document #1:
Redacted Text: **********'s SSN is *********** and her email is ********************
Entity: Jane Smith
	Category: Person
	Confidence Score: 1.0
	Offset: 0
	Length: 10
Entity: 859-98-0987
	Category: USSocialSecurityNumber
	Confidence Score: 0.85
	Offset: 20
	Length: 11
Entity: jane.smith@gmail.com
	Category: Email
	Confidence Score: 0.8
	Offset: 49
	Length: 20


Document #2:
Redacted Text: The ********'s phone number is ************.
Entity: employee
	Category: PersonType
	Confidence Score: 0.98
	Offset: 4
	Length: 8
Entity: 555-555-5555
	Category: PhoneNumber
	Confidence Score: 0.8
	Offset: 31
	Length: 12


Document #3:
Redacted Text: **********, a ********** working with ********************, uses the ABA routing number ********* for her bank transactions. 
        For international transfers, her bank's SWIFT code is ********. 
        She uses a credit card with the number ******************** 
        Her International Banking Account Number (IBAN) is ****************