## Setup

The goal of this quickstart is to provide a reference for the most common uses cases of interacting with prebuilt models of Azure Document Intelligence (**prebuilt-read** and **prebuilt-layout**).


Some add-on capabilities are also explored, together with the usage of **markdown output format** for the layout model.
This option is particularly powerful when the results need to be served as context to a LLM, as demonstrated in the last section of this notebook.

### Import libraries

In [1]:
import os
import json
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient, DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import AnalyzeResult
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
from azure.ai.documentintelligence.models import AnalyzeOutputOption
# import base64
import pandas as pd

In [2]:
# check the version of the azure-ai-documentintelligence package
import importlib.metadata
print(importlib.metadata.version("azure-ai-documentintelligence"))

1.0.2


### Document Intelligence client

In [3]:
# Load environment variables from .env file
load_dotenv(override=True)

True

In [4]:
# Be aware if your deployment is single-service (Azure Document Intelligence resource) or multi-service (Azure AI Services resource)
azure_docintelligence_endpoint = os.environ.get('AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT')
azure_docintelligence_key = os.environ.get('AZURE_DOCUMENT_INTELLIGENCE_KEY')
print(f'Current endpoint: {azure_docintelligence_endpoint}')

Current endpoint: https://westus2.api.cognitive.microsoft.com/


In [5]:
document_intelligence_client = DocumentIntelligenceClient(
    endpoint=azure_docintelligence_endpoint, 
    credential=AzureKeyCredential(azure_docintelligence_key),
    # api_version="2024-11-30" # v4.0 (default)
)

### Document Intelligence admin client

In [6]:
document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(
    endpoint=azure_docintelligence_endpoint, 
    credential=AzureKeyCredential(azure_docintelligence_key),
    api_version="2024-11-30" # v4.0 (default)
)

In [7]:
document_intelligence_admin_client.get_resource_details()

{'customDocumentModels': {'count': 3, 'limit': 20000}, 'customNeuralDocumentModelBuilds': {'used': 0, 'quota': 20, 'quotaResetDateTime': '2025-07-01T00:00:00Z'}}

## Classify document

In [8]:
# Read the local file in binary mode
with open("composed_model/f1040_7.pdf", "rb") as file:
    poller = document_intelligence_client.begin_analyze_document(
        model_id="ComposeModel",
        body=file
    )

In [9]:
# Returns: The deserialized resource of the long running operation, if one is available
result: AnalyzeResult = poller.result(timeout=1000)

In [10]:
# output result to json file
with open("composed_model/f1040_7.json", "w") as f:
    json.dump(result.as_dict(), f, indent=2)

In [11]:
# docType and confidence are available for implementing confidence-based routing
for document in result["documents"]:
    document_type = document["docType"]
    document_type_confidence = document["confidence"]
    print(f"Document type: {document_type} (confidence: {document_type_confidence})")

Document type: 1040Form (confidence: 0.812)


In [None]:
#for each field in the trained custom extraction model, a dedicated confidence score is available
for document in result["documents"]:
    print(json.dumps(document["fields"], indent=2, default=str))

{
  "FirstName": "{'type': 'string', 'valueString': 'Arshavin', 'content': 'Arshavin', 'boundingRegions': [{'pageNumber': 1, 'polygon': [0.505, 1.595, 0.995, 1.595, 0.995, 1.715, 0.505, 1.715]}], 'confidence': 0.995, 'spans': [{'offset': 632, 'length': 8}]}",
  "LastName": "{'type': 'string', 'valueString': 'Andrea', 'content': 'Andrea', 'boundingRegions': [{'pageNumber': 1, 'polygon': [3.325, 1.6, 3.715, 1.6, 3.715, 1.71, 3.325, 1.71]}], 'confidence': 0.995, 'spans': [{'offset': 641, 'length': 6}]}",
  "State": "{'type': 'string', 'valueString': 'PA', 'content': 'PA', 'boundingRegions': [{'pageNumber': 1, 'polygon': [5.065, 2.605, 5.23, 2.605, 5.23, 2.71, 5.065, 2.71]}], 'confidence': 0.995, 'spans': [{'offset': 1000, 'length': 2}]}",
  "City": "{'type': 'string', 'valueString': 'Philadelphia', 'content': 'Philadelphia', 'boundingRegions': [{'pageNumber': 1, 'polygon': [0.505, 2.595, 1.175, 2.595, 1.175, 2.73, 0.505, 2.73]}], 'confidence': 0.995, 'spans': [{'offset': 981, 'length': 12