# Azure Document Intelligence API Demo

This notebook demonstrates how to use Azure Document Intelligence REST APIs for document analysis, including:
- **Read Model**: Extract text from PDFs and scanned documents
- **Business Card Model**: Extract information from business cards
- **Bank Check Model**: Extract details from US bank checks
- **Receipt Model**: Extract item details from receipts including handwritten tips

## Prerequisites

Set the following environment variables:
- `AZURE_DI_ENDPOINT` - Azure Document Intelligence endpoint URL
- `AZURE_DI_KEY` - Azure Document Intelligence API key
- `AZURE_DI_REGION` - Azure region (e.g., eastus)

## Resources

- [Document Intelligence Documentation](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/?view=doc-intel-4.0.0)
- [Read Model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/read?view=doc-intel-4.0.0)
- [Business Card Model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/business-card?view=doc-intel-4.0.0)
- [Bank Check Model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/bank-check?view=doc-intel-4.0.0)- [Receipt Model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/receipt?view=doc-intel-4.0.0)


## Setup: Import Libraries and Configure Environment

In [None]:
import os
import requests
import json
import time
from typing import Dict, Any, List, Optional
from io import BytesIO
from PIL import Image, ImageDraw
from rich import print_json
import base64

# Load environment variables
AZURE_DI_ENDPOINT = os.getenv("AZURE_DI_ENDPOINT", "https://tech901-aif-foundryresource.cognitiveservices.azure.com/")
AZURE_DI_KEY = os.getenv("AZURE_DI_KEY", "<placeholder-key>")
AZURE_DI_REGION = os.getenv("AZURE_DI_REGION", "eastus")

# Verify configuration
if AZURE_DI_KEY == "<placeholder-key>":
    print("‚ö†Ô∏è Warning: Please set AZURE_DI_KEY environment variable")
else:
    print("‚úÖ Configuration loaded successfully")
    print(f"   Endpoint: {AZURE_DI_ENDPOINT}")
    print(f"   Region: {AZURE_DI_REGION}")

## Helper Functions

In [None]:
def _poll_for_result(operation_url: str, poll_interval: float = 1.0, max_wait: float = 120.0) -> Dict[str, Any]:
    """
    Poll the operation URL until analysis completes.

    Args:
        operation_url: The Operation-Location URL to poll
        poll_interval: Seconds between poll attempts
        max_wait: Maximum seconds to wait before timeout

    Returns:
        Dictionary containing the completed analysis result
    """
    headers = {"Ocp-Apim-Subscription-Key": AZURE_DI_KEY}
    elapsed = 0.0
    
    while elapsed < max_wait:
        result = requests.get(operation_url, headers=headers)
        result.raise_for_status()
        result_json = result.json()
        status = result_json.get("status")
        
        if status == "succeeded":
            return result_json
        elif status == "failed":
            error_info = result_json.get("error", {})
            raise Exception(f"Analysis failed: {error_info.get('message', result_json)}")
        elif status in ("notStarted", "running"):
            time.sleep(poll_interval)
            elapsed += poll_interval
        else:
            raise Exception(f"Unknown status: {status}")
    
    raise TimeoutError(f"Analysis did not complete within {max_wait} seconds")


def analyze_document_from_url(
    document_url: str,
    model_id: str,
    api_version: str = "2024-11-30"
) -> Dict[str, Any]:
    """
    Analyze a document using Document Intelligence API.

    Args:
        document_url: URL of the document to analyze
        model_id: ID of the model to use (e.g., 'prebuilt-read', 'prebuilt-layout')
        api_version: API version to use

    Returns:
        Dictionary containing the API response
    """
    url = f"{AZURE_DI_ENDPOINT}documentintelligence/documentModels/{model_id}:analyze"
    
    headers = {
        "Ocp-Apim-Subscription-Key": AZURE_DI_KEY,
        "Content-Type": "application/json",
    }
    
    params = {"api-version": api_version}
    body = {"urlSource": document_url}
    
    try:
        response = requests.post(url, params=params, headers=headers, json=body)
        response.raise_for_status()
        
        operation_url = response.headers.get("Operation-Location")
        if not operation_url:
            raise Exception("No Operation-Location header in response")
        
        return _poll_for_result(operation_url)
    except requests.exceptions.HTTPError as e:
        print(f"Error: {e.response.status_code}")
        print(f"Response: {e.response.text}")
        raise


def analyze_document_from_file(
    file_path: str,
    model_id: str,
    api_version: str = "2024-11-30"
) -> Dict[str, Any]:
    """
    Analyze a local document file using Document Intelligence API.

    Args:
        file_path: Path to the document file
        model_id: ID of the model to use
        api_version: API version to use

    Returns:
        Dictionary containing the API response
    """
    url = f"{AZURE_DI_ENDPOINT}documentintelligence/documentModels/{model_id}:analyze"
    
    headers = {
        "Ocp-Apim-Subscription-Key": AZURE_DI_KEY,
        "Content-Type": "application/octet-stream",
    }
    
    params = {"api-version": api_version}
    
    with open(file_path, "rb") as f:
        try:
            response = requests.post(url, params=params, headers=headers, data=f)
            response.raise_for_status()
            
            operation_url = response.headers.get("Operation-Location")
            if not operation_url:
                raise Exception("No Operation-Location header in response")
            
            return _poll_for_result(operation_url)
        except requests.exceptions.HTTPError as e:
            print(f"Error: {e.response.status_code}")
            print(f"Response: {e.response.text}")
            raise


def print_response(response: Dict[str, Any]):
    """Pretty print API response."""
    print_json(json.dumps(response, indent=2))


def extract_text_from_read_result(response: Dict[str, Any]) -> str:
    """Extract human-friendly text from Read model response."""
    text = ""
    if "analyzeResult" in response:
        result = response["analyzeResult"]
        if "paragraphs" in result:
            for paragraph in result["paragraphs"]:
                text += paragraph.get("content", "") + "\n"
    return text.strip()


def extract_check_fields(response: Dict[str, Any]) -> Dict[str, Any]:
    """Extract key fields from bank check response."""
    fields = {}
    if "analyzeResult" in response and "documents" in response["analyzeResult"]:
        if len(response["analyzeResult"]["documents"]) > 0:
            doc_fields = response["analyzeResult"]["documents"][0].get("fields", {})
            for field_name in ["CheckNumber", "RoutingNumber", "AccountNumber", "Amount", "PayTo", "Date", "Memo", "Signature"]:
                if field_name in doc_fields:
                    field_data = doc_fields[field_name]
                    fields[field_name] = field_data.get("content", field_data.get("value", ""))
    return fields


def display_image(image_path: str) -> Image.Image:
    """Load and display an image from a local path."""
    img = Image.open(image_path)
    return img


def extract_receipt_fields(response: Dict[str, Any]) -> Dict[str, Any]:
    """Extract key fields from receipt response."""
    fields = {}
    if "analyzeResult" in response and "documents" in response["analyzeResult"]:
        if len(response["analyzeResult"]["documents"]) > 0:
            doc_fields = response["analyzeResult"]["documents"][0].get("fields", {})
            for field_name in ["MerchantName", "MerchantPhoneNumber", "MerchantAddress", "TransactionDate", "TransactionTime", "Items", "Subtotal", "Tax", "TaxDetails", "Total", "Tip", "ReceiptType", "CountryRegion"]:
                if field_name in doc_fields:
                    field_data = doc_fields[field_name]
                    if field_name == "Items" and "valueArray" in field_data:
                        fields[field_name] = [
                            {
                                "description": item.get("valueObject", {}).get("Description", {}).get("content", ""),
                                "quantity": item.get("valueObject", {}).get("Quantity", {}).get("content", ""),
                                "price": item.get("valueObject", {}).get("Price", {}).get("content", ""),
                                "total_price": item.get("valueObject", {}).get("TotalPrice", {}).get("content", "")
                            }
                            for item in field_data["valueArray"]
                        ]
                    elif field_name == "TaxDetails" and "valueArray" in field_data:
                        fields[field_name] = [
                            {
                                "description": item.get("valueObject", {}).get("Description", {}).get("content", ""),
                                "rate": item.get("valueObject", {}).get("Rate", {}).get("content", ""),
                                "net_amount": item.get("valueObject", {}).get("NetAmount", {}).get("content", "")
                            }
                            for item in field_data["valueArray"]
                        ]
                    else:
                        fields[field_name] = field_data.get("content", field_data.get("value", ""))
    return fields

## 1. Read Model: Extract Text from Documents

The Read model is optimized for extracting text from PDFs, scanned images, and office documents. It includes support for:
- Printed and handwritten text
- Multiple languages
- Document structure (paragraphs, tables, etc.)
- Higher resolution for dense text

In [None]:
# Use local PDF file (Azure can't fetch many remote URLs due to firewalls/redirects)
pdf_path = "documents/EmployeeRights_FFCRA.pdf"

print("üìÑ Analyzing PDF document with Read model...")
print(f"   File: {pdf_path}")

try:
    response = analyze_document_from_file(pdf_path, "prebuilt-read")
    print("\n‚úÖ Read model analysis complete!")
except Exception as e:
    print(f"‚ùå Error: {str(e)}")

### View Raw API Response

In [None]:
# Show the complete API response structure
print("API Response (JSON):")
print_response(response)

### Extract and Display Human-Friendly Text

In [None]:
# Extract readable text from the API response
extracted_text = extract_text_from_read_result(response)

print("üìù Extracted Text:")
print("=" * 80)
print(extracted_text[:1000] + "..." if len(extracted_text) > 1000 else extracted_text)
print("=" * 80)
print(f"\nTotal text length: {len(extracted_text)} characters")

if "analyzeResult" in response:
    pages = len(response["analyzeResult"].get("pages", []))
    paragraphs = len(response["analyzeResult"].get("paragraphs", []))
    print(f"Pages detected: {pages}")
    print(f"Paragraphs detected: {paragraphs}")

## 2. Business Card Model: Extract Information from Business Cards

‚ö†Ô∏è **Note**: The business card model is available in Document Intelligence v3.1 and earlier. For v4.0, use the general document model or other prebuilt models.

This example demonstrates the business card extraction capabilities:

In [None]:
# Use local business card image with prebuilt-layout model
# Note: prebuilt-businessCard was deprecated in Document Intelligence v4.0
business_card_path = "documents/businessCard.png"

print("üóÇÔ∏è Analyzing business card with Layout model...")
print(f"   File: {business_card_path}")

try:
    # Use prebuilt-layout since businessCard model is deprecated
    bc_response = analyze_document_from_file(
        business_card_path,
        "prebuilt-layout",
        api_version="2024-11-30"
    )
    print("\n‚úÖ Business card analysis complete!")
except Exception as e:
    print(f"‚ùå Error: {str(e)}")
    bc_response = None

### View Business Card Response

In [None]:
if bc_response:
    print("Business Card API Response (JSON):")
    print_response(bc_response)

### Display Business Card Image with Detected Regions

In [None]:
# Display the business card image with detected text region polygons
business_card_path = "documents/businessCard.png"
img = Image.open(business_card_path).convert("RGB")
draw = ImageDraw.Draw(img)

if bc_response and "analyzeResult" in bc_response:
    result = bc_response["analyzeResult"]
    
    # Get page dimensions for coordinate scaling
    if "pages" in result and len(result["pages"]) > 0:
        page = result["pages"][0]
        page_width = page.get("width", 1)
        page_height = page.get("height", 1)
        img_width, img_height = img.size
        
        # Scale factors (API returns coordinates in page units)
        scale_x = img_width / page_width
        scale_y = img_height / page_height
        
        # Draw polygons for detected lines/words
        colors = ["red", "blue", "green", "orange", "purple", "cyan", "magenta"]
        
        # Draw line-level polygons
        if "lines" in page:
            for idx, line in enumerate(page.get("lines", [])):
                if "polygon" in line:
                    poly = line["polygon"]
                    points = [(poly[i] * scale_x, poly[i+1] * scale_y) for i in range(0, len(poly), 2)]
                    if len(points) >= 3:
                        draw.polygon(points, outline=colors[idx % len(colors)], width=2)
        
        # Alternatively, draw paragraph-level polygons
        if "paragraphs" in result:
            for idx, para in enumerate(result.get("paragraphs", [])):
                if "boundingRegions" in para:
                    for region in para["boundingRegions"]:
                        if "polygon" in region:
                            poly = region["polygon"]
                            points = [(poly[i] * scale_x, poly[i+1] * scale_y) for i in range(0, len(poly), 2)]
                            if len(points) >= 3:
                                draw.polygon(points, outline="green", width=3)

print(f"üñºÔ∏è Business card image with detected text regions:")
img

### Extract and Display Business Card Fields

In [None]:
if bc_response:
    # Layout model returns text content rather than structured business card fields
    print("üìá Extracted Business Card Content (using Layout model):")
    print("=" * 80)
    
    if "analyzeResult" in bc_response:
        result = bc_response["analyzeResult"]
        
        # Extract paragraphs/lines of text
        if "paragraphs" in result:
            print("\nüìù Detected Text:")
            for para in result["paragraphs"]:
                content = para.get("content", "")
                if content.strip():
                    print(f"  ‚Ä¢ {content}")
        
        # Show page info
        if "pages" in result:
            page = result["pages"][0]
            print(f"\nüìÑ Page Info:")
            print(f"  Dimensions: {page.get('width', 'N/A')} x {page.get('height', 'N/A')} {page.get('unit', '')}")
            print(f"  Lines detected: {len(page.get('lines', []))}")
    
    print("=" * 80)
    print("\nüí° Note: For structured field extraction, the deprecated prebuilt-businessCard")
    print("   model would extract ContactNames, JobTitles, Emails, etc. automatically.")
else:
    print("‚ö†Ô∏è Business card response not available.")

## 3. Bank Check Model: Extract Information from Checks

The bank check model (v4.0) extracts data from US bank checks including:
- Check number
- Routing number
- Account number
- Payment amount
- Payee information
- Date
- Signature detection

In [None]:
# Use a public sample check image
check_url = "https://online.citi.com/JRS/forms/images/Check.jpg"

print("üí≥ Analyzing bank check with Bank Check model...")
print(f"   URL: {check_url}")

try:
    check_response = analyze_document_from_url(
        check_url,
        "prebuilt-check.us",
        api_version="2024-11-30"
    )
    print("\n‚úÖ Bank check analysis complete!")
except Exception as e:
    print(f"‚ùå Error: {str(e)}")
    print("   (Check model may not be available with current endpoint/key)")
    check_response = None

### View Bank Check Response

In [None]:
if check_response:
    print("Bank Check API Response (JSON):")
    print_response(check_response)

### Display Bank Check Image with Detected Regions

In [None]:
# Display the bank check image with detected field polygons
check_path = "documents/Check.jpg"
img = Image.open(check_path).convert("RGB")
draw = ImageDraw.Draw(img)

if check_response and "analyzeResult" in check_response:
    result = check_response["analyzeResult"]
    
    # Get page dimensions for coordinate scaling
    if "pages" in result and len(result["pages"]) > 0:
        page = result["pages"][0]
        page_width = page.get("width", 1)
        page_height = page.get("height", 1)
        img_width, img_height = img.size
        
        # Scale factors (API returns coordinates in page units)
        scale_x = img_width / page_width
        scale_y = img_height / page_height
        
        # Draw polygons for detected fields
        if "documents" in result and len(result["documents"]) > 0:
            doc = result["documents"][0]
            fields = doc.get("fields", {})
            
            colors = ["red", "blue", "green", "orange", "purple", "cyan", "magenta", "yellow"]
            color_idx = 0
            
            for field_name, field_data in fields.items():
                if "boundingRegions" in field_data:
                    for region in field_data["boundingRegions"]:
                        if "polygon" in region:
                            poly = region["polygon"]
                            points = [(poly[i] * scale_x, poly[i+1] * scale_y) for i in range(0, len(poly), 2)]
                            if len(points) >= 3:
                                draw.polygon(points, outline=colors[color_idx % len(colors)], width=2)
                                # Add field label near the first point
                                draw.text((points[0][0], points[0][1] - 15), field_name, fill=colors[color_idx % len(colors)])
                    color_idx += 1

print(f"üñºÔ∏è Bank check image with detected field boundaries:")
img

### Extract and Display Check Information

In [None]:
if check_response and "analyzeResult" in check_response:
    result = check_response["analyzeResult"]
    
    print("üí∞ Extracted Check Information:")
    print("=" * 80)
    
    # Display all detected fields from the document
    if "documents" in result and len(result["documents"]) > 0:
        doc = result["documents"][0]
        fields = doc.get("fields", {})
        
        if not fields:
            print("  No structured fields detected.")
        else:
            for field_name, field_data in fields.items():
                # Get the value - could be in 'content', 'value', or 'valueString'
                value = field_data.get("content") or field_data.get("valueString") or field_data.get("value", "")
                confidence = field_data.get("confidence", 0)
                print(f"  {field_name}: {value} (confidence: {confidence:.1%})")
    
    # Also show any text content detected
    if "content" in result:
        print("\nüìù Raw Text Content:")
        print("-" * 40)
        print(result["content"][:500] + "..." if len(result.get("content", "")) > 500 else result.get("content", ""))
    
    print("=" * 80)
else:
    print("‚ö†Ô∏è Check response not available.")
    print("   Make sure to run the bank check analysis cell first.")

## 4. Receipt Model: Extract Information from Receipts

The receipt model extracts data from sales receipts including:
- Merchant information (name, phone, address)
- Transaction details (date, time, total)
- Line items with descriptions, quantities, and prices
- Tax information and breakdown
- Tip amounts (including handwritten tips)
- Receipt type and country/region detection

In [None]:
# Use a public receipt sample from Azure-Samples
print("üßæ Analyzing receipt with Receipt model...")

try:
    receipt_response = analyze_document_from_file(
        "documents/receipt-with-tips.png",
        "prebuilt-receipt",
        api_version="2024-11-30"
    )
    print("\n‚úÖ Receipt analysis complete!")
except Exception as e:
    print(f"‚ùå Error: {str(e)}")
    receipt_response = None

### View Receipt Response

In [None]:
if receipt_response:
    print("Receipt API Response (JSON):")
    print_response(receipt_response)

### Display Receipt Image

In [None]:
# Display the receipt image with detected field polygons
receipt_path = "documents/receipt-with-tips.png"
img = Image.open(receipt_path).convert("RGB")
draw = ImageDraw.Draw(img)

if receipt_response and "analyzeResult" in receipt_response:
    result = receipt_response["analyzeResult"]
    
    # Get page dimensions for coordinate scaling
    if "pages" in result and len(result["pages"]) > 0:
        page = result["pages"][0]
        page_width = page.get("width", 1)
        page_height = page.get("height", 1)
        img_width, img_height = img.size
        
        # Scale factors (API returns coordinates in page units)
        scale_x = img_width / page_width
        scale_y = img_height / page_height
        
        # Draw polygons for detected fields
        if "documents" in result and len(result["documents"]) > 0:
            doc = result["documents"][0]
            fields = doc.get("fields", {})
            
            colors = ["red", "blue", "green", "orange", "purple", "cyan", "magenta"]
            color_idx = 0
            
            for field_name, field_data in fields.items():
                if "boundingRegions" in field_data:
                    for region in field_data["boundingRegions"]:
                        if "polygon" in region:
                            # Polygon is a flat list: [x1, y1, x2, y2, x3, y3, x4, y4]
                            poly = region["polygon"]
                            points = [(poly[i] * scale_x, poly[i+1] * scale_y) for i in range(0, len(poly), 2)]
                            if len(points) >= 3:
                                draw.polygon(points, outline=colors[color_idx % len(colors)], width=2)
                    color_idx += 1

print(f"üñºÔ∏è Receipt image with detected field boundaries:")
img

### Extract and Display Receipt Fields

In [None]:
if receipt_response:
    fields = extract_receipt_fields(receipt_response)
    
    print("üßæ Extracted Receipt Information:")
    print("=" * 80)
    
    # Display basic receipt info
    if "MerchantName" in fields and fields["MerchantName"]:
        print(f"  Merchant: {fields['MerchantName']}")
    if "MerchantPhoneNumber" in fields and fields["MerchantPhoneNumber"]:
        print(f"  Phone: {fields['MerchantPhoneNumber']}")
    if "MerchantAddress" in fields and fields["MerchantAddress"]:
        print(f"  Address: {fields['MerchantAddress']}")
    if "TransactionDate" in fields and fields["TransactionDate"]:
        print(f"  Date: {fields['TransactionDate']}")
    if "TransactionTime" in fields and fields["TransactionTime"]:
        print(f"  Time: {fields['TransactionTime']}")
    if "ReceiptType" in fields and fields["ReceiptType"]:
        print(f"  Receipt Type: {fields['ReceiptType']}")
    if "CountryRegion" in fields and fields["CountryRegion"]:
        print(f"  Country/Region: {fields['CountryRegion']}")
    
    # Display items
    if "Items" in fields and fields["Items"]:
        print("\n  üìã Line Items:")
        for i, item in enumerate(fields["Items"], 1):
            if item["description"]:
                print(f"    {i}. {item['description']}", end="")
                if item["quantity"]:
                    print(f" (Qty: {item['quantity']})", end="")
                if item["price"]:
                    print(f" @ {item['price']}", end="")
                if item["total_price"]:
                    print(f" = {item['total_price']}", end="")
                print()
    
    # Display amounts
    print("\n  üí∞ Amounts:")
    if "Subtotal" in fields and fields["Subtotal"]:
        print(f"    Subtotal: {fields['Subtotal']}")
    if "Tax" in fields and fields["Tax"]:
        print(f"    Tax: {fields['Tax']}")
    if "Tip" in fields and fields["Tip"]:
        print(f"    Tip (including handwritten): {fields['Tip']}")
    if "Total" in fields and fields["Total"]:
        print(f"    Total: {fields['Total']}")
    
    # Display tax details if available
    if "TaxDetails" in fields and fields["TaxDetails"]:
        print("\n  üìä Tax Details:")
        for tax in fields["TaxDetails"]:
            if tax["description"]:
                print(f"    {tax['description']}: {tax['rate']} (Net: {tax['net_amount']})")
    
    print("=" * 80)
else:
    print("‚ö†Ô∏è Receipt response not available. The model extracts:")
    expected_fields = {
        "Receipt Information": [
            "MerchantName - Business name",
            "MerchantPhoneNumber - Contact number",
            "MerchantAddress - Business address",
            "TransactionDate - Date of transaction",
            "TransactionTime - Time of transaction",
            "ReceiptType - Type of receipt (e.g., hotel, restaurant, retail)",
            "CountryRegion - Country/region code"
        ],
        "Line Items": [
            "Items - Array of purchased items with quantity and price",
            "Item.Description - Item name/description",
            "Item.Quantity - Quantity purchased",
            "Item.Price - Unit price",
            "Item.TotalPrice - Total for line item"
        ],
        "Financial Details": [
            "Subtotal - Subtotal amount",
            "Tax - Total tax amount",
            "Tip - Tip amount (including handwritten tips)",
            "Total - Final total amount",
            "TaxDetails - Array of tax breakdowns with rate and amount"
        ]
    }
    for category, items in expected_fields.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  - {item}")

## Summary: Document Intelligence Models

| Model | Use Case | Key Fields | Format Support |
|-------|----------|-----------|----------------|
| **Read** | Extract all text from documents | Text content, paragraphs, lines | PDF, Images, Office documents |
| **Business Card** | Extract contact information | Names, emails, phones, addresses | Images (JPG, PNG, etc.) |
| **Bank Check** | Extract check details | Check #, routing #, account #, amount | Images (JPG, PNG, etc.) |
| **Receipt** | Extract line items and amounts from receipts | Items, merchant info, total, tip (incl. handwritten) | Images (JPG, PNG, etc.) |

## API Version Notes

- **v4.0 (2024-11-30)**: Latest GA version with enhanced features
- **v3.1 (2023-07-31)**: Includes business card model (deprecated in v4.0)
- **v3.0 (2022-08-31)**: Previous GA version

## Additional Resources

- [REST API Documentation](https://learn.microsoft.com/en-us/rest/api/aiservices/operation-groups?view=rest-aiservices-v4.0%20(2024-11-30))
- [Document Intelligence Studio](https://documentintelligence.ai.azure.com)
- [Model Schema Definitions](https://github.com/Azure-Samples/document-intelligence-code-samples/tree/main/schema/2024-11-30-ga)