# Azure Language - PII Redaction

This notebook demonstrates how to use the Azure Language Analyze Text API to
detect and redact Personally Identifiable Information (PII) from a medical report.

## Prerequisites

Set the following environment variables:
- `FOUNDRY_ENDPOINT` - Azure Language service endpoint
- `FOUNDRY_API_KEY` - API key for authentication

In [None]:
import json
import os

import requests
from rich.console import Console
from rich.panel import Panel
from rich.table import Table

## Load Sample Data

Load the medical report containing PII from the sample data file.

In [None]:
with open("05-PIIRedaction-data/medical_report.txt", encoding="utf-8") as f:
    report_text = f.read()

print(f"Loaded medical report: {len(report_text)} characters")
print(f"\nFirst 500 characters:\n{report_text[:500]}...")

## API Setup

Configure the Azure Language API endpoint and authentication.

In [None]:
endpoint = os.environ["FOUNDRY_ENDPOINT"]
api_key = os.environ["FOUNDRY_API_KEY"]

# Construct the Analyze Text API URL
api_url = f"{endpoint}/language/:analyze-text?api-version=2024-11-01"

headers = {
    "Ocp-Apim-Subscription-Key": api_key,
    "Content-Type": "application/json",
}

print(f"API Endpoint: {endpoint}")
print("API Key: [CONFIGURED]")

## Make API Call

Send the medical report to the PII Entity Recognition API.

In [None]:
# Prepare the document for the API
documents = [{"id": "1", "language": "en", "text": report_text}]

# Build the request payload
payload = {
    "kind": "PiiEntityRecognition",
    "parameters": {"modelVersion": "latest"},
    "analysisInput": {"documents": documents},
}

# Make the API request
response = requests.post(api_url, headers=headers, json=payload, timeout=30)
response.raise_for_status()
result = response.json()

print("API call successful!")

## Raw API Results

Display the complete JSON response from the API.

In [None]:
print(json.dumps(result, indent=2))

## Human-Friendly Results

Present the PII detection and redaction results in a readable format.

In [None]:
console = Console()

# Extract document result
doc_result = result["results"]["documents"][0]
entities = doc_result["entities"]
redacted_text = doc_result.get("redactedText", "")

# Create PII entities table
table = Table(title="Detected PII Entities")
table.add_column("Entity Text", style="red")
table.add_column("Category", style="cyan")
table.add_column("Subcategory", style="dim")
table.add_column("Confidence", justify="center")

for entity in entities:
    subcategory = entity.get("subcategory", "-")
    confidence = f"{entity['confidenceScore']:.0%}"
    table.add_row(entity["text"], entity["category"], subcategory, confidence)

console.print(table)

# Group by category for summary
categories: dict[str, list[str]] = {}
for entity in entities:
    cat = entity["category"]
    if cat not in categories:
        categories[cat] = []
    categories[cat].append(entity["text"])

# Category summary
console.print("\n")
summary_table = Table(title="PII Summary by Category")
summary_table.add_column("Category", style="bold cyan")
summary_table.add_column("Count", justify="center", style="red")
summary_table.add_column("Examples", style="dim")

for cat, items in sorted(categories.items()):
    examples = ", ".join(items[:3])
    if len(items) > 3:
        examples += f" (+{len(items) - 3} more)"
    summary_table.add_row(cat, str(len(items)), examples)

console.print(summary_table)

## Redacted Document

Display the document with PII entities redacted.

In [None]:
# Display redacted text if available from API
if redacted_text:
    console.print(Panel(redacted_text, title="Redacted Medical Report (API Output)"))
else:
    # Manually redact using entity offsets
    # Sort entities by offset in reverse order to preserve positions
    sorted_entities = sorted(entities, key=lambda e: e["offset"], reverse=True)

    redacted = report_text
    for entity in sorted_entities:
        start = entity["offset"]
        end = start + entity["length"]
        replacement = f"[{entity['category']}]"
        redacted = redacted[:start] + replacement + redacted[end:]

    console.print(Panel(redacted, title="Redacted Medical Report (Manual Redaction)"))

# Warning panel
warning = """
[bold yellow]⚠️ PII Detection Summary:[/bold yellow]

This document contained [bold red]{count}[/bold red] pieces of personally identifiable information:

• Social Security Numbers
• Phone Numbers
• Addresses
• Names
• Account Numbers
• Email Addresses

[dim]PII redaction is essential for HIPAA compliance and protecting patient privacy.[/dim]
""".format(count=len(entities))

console.print(Panel(warning, title="Privacy Alert", border_style="yellow"))