## Setup


The goal of this quickstart is to provide a reference for the most common uses cases 
of interacting with prebuilt models of Azure Document Intelligence (prebuilt-read and prebuilt-layout).

Some add-on capabilities are also explored, 
together with the usage of markdown output format for the layout model.
This option is particularly powerful when the results need to be served as context to a LLM,
as demonstrated in the last section of this notebook.

### Document Intelligence Configuration

- Create a new **Document Intelligence** resource within the same Resource Group. During configuration, select the free pricing tier.  
- Once the resource is created, copy the **key** and **endpoint** values into the `credentials.env` file.



### Import libraries

If you forgot to install ai-doc-intelligence, please do so with pip install command in the terminal of your compute

In [1]:
#!pip install azure-ai-documentintelligence==1.0.1

In [1]:
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
# import base64
import pandas as pd

### Document Intelligence client

In [2]:
# Load environment variables from .env file
load_dotenv(dotenv_path='../infra/credentials.env', override=True)

True

In [3]:
# Be aware if your deployment is single-service (Azure Document Intelligence resource) or multi-service (Azure AI Services resource)
azure_docintelligence_endpoint = os.environ.get('AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT')
azure_docintelligence_key = os.environ.get('AZURE_DOCUMENT_INTELLIGENCE_KEY')
print(f'Current endpoint: {azure_docintelligence_endpoint}')

Current endpoint: https://ep-di-standalone.cognitiveservices.azure.com


In [4]:
document_intelligence_client = DocumentIntelligenceClient(
    endpoint=azure_docintelligence_endpoint, 
    credential=AzureKeyCredential(azure_docintelligence_key),
    # api_version="2024-11-30" # v4.0 (default)
)

## Sample document

In [6]:
# a lot of test files in different formats are available in this repo:
# https://github.com/Azure-Samples/cognitive-services-REST-api-samples/tree/master/curl/form-recognizer

In [7]:
# for an example of how to use a local file, see the Prebuilt-layout --> Key-value pairs section

In [5]:
# get the document file from a URL
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf"

In [9]:
#formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/invoice-logic-apps-tutorial.pdf"

In [10]:
#formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/invoice_sample.jpg"

## Analyze document

### Prebuilt-read

In [6]:
poller = document_intelligence_client.begin_analyze_document(
    model_id="prebuilt-read", body=AnalyzeDocumentRequest(url_source=formUrl
))

In [7]:
# An instance of AnalyzeDocumentLROPoller that returns AnalyzeResult. 
# (LRO = long-running operation)
poller

<azure.ai.documentintelligence._operations._patch.AnalyzeDocumentLROPoller at 0x7f4a3f299570>

In [8]:
# The result() method is designed to retrieve the result of a long-running operation (LRO), 
# which is a common pattern in cloud services where certain tasks, such as analyzing data or deploying resources, take time to complete.
# It abstracts the complexity of polling and waiting, handling the operation's result once it is available.

# Returns: The deserialized resource of the long running operation, if one is available
result: AnalyzeResult = poller.result(timeout=1000)

In [9]:
print(result)

{'apiVersion': '2024-11-30', 'modelId': 'prebuilt-read', 'stringIndexType': 'textElements', 'content': 'UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549\nFORM 10-Q\n☒ QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the Quarterly Period Ended March 31, 2020\nOR\n☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the Transition Period From to Commission File Number 001-37845\nMICROSOFT CORPORATION\nWASHINGTON (STATE OF INCORPORATION) ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399 (425) 882-8080 www.microsoft.com/investor\n91-1144442 (I.R.S. ID)\nSecurities registered pursuant to Section 12(b) of the Act:\nTitle of each class\nTrading Symbol\nName of exchange on which registered\nCommon stock, $0.00000625 par value per share\nMSFT\nNASDAQ\n2.125% Notes due 2021\nMSFT\nNASDAQ\n3.125% Notes due 2028\nMSFT\nNASDAQ\n2.625% Notes due 2033\nMSFT\nNASDAQ\nSecurities registered purs

In [10]:
print(result.content)

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549
FORM 10-Q
☒ QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the Quarterly Period Ended March 31, 2020
OR
☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the Transition Period From to Commission File Number 001-37845
MICROSOFT CORPORATION
WASHINGTON (STATE OF INCORPORATION) ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399 (425) 882-8080 www.microsoft.com/investor
91-1144442 (I.R.S. ID)
Securities registered pursuant to Section 12(b) of the Act:
Title of each class
Trading Symbol
Name of exchange on which registered
Common stock, $0.00000625 par value per share
MSFT
NASDAQ
2.125% Notes due 2021
MSFT
NASDAQ
3.125% Notes due 2028
MSFT
NASDAQ
2.625% Notes due 2033
MSFT
NASDAQ
Securities registered pursuant to Section 12(g) of the Act:
NONE
Indicate by check mark whether the registrant (1) has filed all reports required to be file

In [11]:
# print dir(result) ignoring hidden attributes
print([attr for attr in dir(result) if not attr.startswith('_')])



In [12]:
# experiment with prebuilt read model: it does not return tables
if result.tables:
    print(f"I've found {len(result.tables)} tables.")
else:
    print("I haven't found any tables.")

I haven't found any tables.


### Prebuilt-layout

In [13]:
poller = document_intelligence_client.begin_analyze_document(
    model_id="prebuilt-layout", 
    body=AnalyzeDocumentRequest(url_source=formUrl) # the parameter urlSource or base64Source is required
)

In [14]:
# The result() method is designed to retrieve the result of a long-running operation (LRO), 
# which is a common pattern in cloud services where certain tasks, such as analyzing data or deploying resources, take time to complete.
# It abstracts the complexity of polling and waiting, handling the operation's result once it is available.

# Returns: The deserialized resource of the long running operation, if one is available
result: AnalyzeResult = poller.result(timeout=1000)

In [15]:
print(result)

{'apiVersion': '2024-11-30', 'modelId': 'prebuilt-layout', 'stringIndexType': 'textElements', 'content': 'UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549\nFORM 10-Q\n☐ ☒ :selected: QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Quarterly Period Ended March 31, 2020 OR :unselected: TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Transition Period From to\nCommission File Number 001-37845\nMICROSOFT CORPORATION\nWASHINGTON (STATE OF INCORPORATION) ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399 (425) 882-8080 www.microsoft.com/investor\n91-1144442 (I.R.S. ID)\nSecurities registered pursuant to Section 12(b) of the Act:\nTitle of each class\nTrading Symbol\nName of exchange on which registered\nCommon stock, $0.00000625 par value per share\nMSFT\nNASDAQ\n2.125% Notes due 2021\nMSFT\nNASDAQ\n3.125% Notes due 2028\nMSFT\nNASDAQ\n2.625% Notes due 2033\nMSFT\nNASDAQ\nSec

In [16]:
type(result)

azure.ai.documentintelligence.models._models.AnalyzeResult

In [17]:
result.model_id

'prebuilt-layout'

In [18]:
result.api_version

'2024-11-30'

In [19]:
print(result.content)

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549
FORM 10-Q
☐ ☒ :selected: QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Quarterly Period Ended March 31, 2020 OR :unselected: TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the Transition Period From to
Commission File Number 001-37845
MICROSOFT CORPORATION
WASHINGTON (STATE OF INCORPORATION) ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399 (425) 882-8080 www.microsoft.com/investor
91-1144442 (I.R.S. ID)
Securities registered pursuant to Section 12(b) of the Act:
Title of each class
Trading Symbol
Name of exchange on which registered
Common stock, $0.00000625 par value per share
MSFT
NASDAQ
2.125% Notes due 2021
MSFT
NASDAQ
3.125% Notes due 2028
MSFT
NASDAQ
2.625% Notes due 2033
MSFT
NASDAQ
Securities registered pursuant to Section 12(g) of the Act: NONE
Indicate by check mark whether the registrant (1) has filed all rep

In [20]:
if result.tables:
    print(f"I've found {len(result.tables)} tables.")

I've found 2 tables.


#### Tables parsing

In [21]:
if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(
            f"Table # {table_idx} has {table.row_count} rows and "
            f"{table.column_count} columns"
        )
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(
                    f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}"
                )
        for cell in table.cells:
            print(
                f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
            )
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(
                        f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'"
                    )

Table # 0 has 5 rows and 3 columns
Table # 0 location on page: 1 is [0.5912, 4.9025, 6.9694, 4.9034, 6.9718, 5.7826, 0.5926, 5.782]
...Cell[0][0] has text 'Title of each class'
...content on page 1 is within bounding polygon '[0.5702, 4.8825, 3.8454, 4.8825, 3.8389, 5.1255, 0.5636, 5.1255]'
...Cell[0][1] has text 'Trading Symbol'
...content on page 1 is within bounding polygon '[3.8454, 4.8825, 5.3713, 4.8825, 5.3713, 5.1255, 3.8389, 5.1255]'
...Cell[0][2] has text 'Name of exchange on which registered'
...content on page 1 is within bounding polygon '[5.3713, 4.8825, 6.93, 4.8825, 6.93, 5.1123, 5.3713, 5.1255]'
...Cell[1][0] has text 'Common stock, $0.00000625 par value per share'
...content on page 1 is within bounding polygon '[0.5636, 5.1255, 3.8389, 5.1255, 3.8389, 5.2962, 0.5636, 5.2962]'
...Cell[1][1] has text 'MSFT'
...content on page 1 is within bounding polygon '[3.8389, 5.1255, 5.3713, 5.1255, 5.3713, 5.2962, 3.8389, 5.2962]'
...Cell[1][2] has text 'NASDAQ'
...content on pag

In [22]:
# table to dataframe
if result.tables:
    # list to store all dataframes
    dataframes = []  
    for table_idx, table in enumerate(result.tables):
        # count rows and columns, considering the header row
        print(
            f"Table # {table_idx} has {table.row_count - 1} rows and "
            f"{table.column_count} columns"
        )
        # initialize an empty dataframe with the correct dimensions
        df = pd.DataFrame(index=range(table.row_count), columns=range(table.column_count))
        for cell in table.cells:
            # Assign the cell content to the correct location in the dataframe
            df.at[cell.row_index, cell.column_index] = cell.content        
        # promote the first row as column headers
        df.columns = df.iloc[0]  # Set the first row as the header
        df = df[1:].reset_index(drop=True)  # Drop the first row and reset the index
        
        # add the current dataframe to the list of dataframes
        dataframes.append(df)  

Table # 0 has 4 rows and 3 columns
Table # 1 has 1 rows and 2 columns


In [23]:
len(dataframes)

2

In [24]:
dataframes[0]

Unnamed: 0,Title of each class,Trading Symbol,Name of exchange on which registered
0,"Common stock, $0.00000625 par value per share",MSFT,NASDAQ
1,2.125% Notes due 2021,MSFT,NASDAQ
2,3.125% Notes due 2028,MSFT,NASDAQ
3,2.625% Notes due 2033,MSFT,NASDAQ


#### Key-value pairs

In [25]:
# this example demonstrates how to pass a file stream to the API
with open('assets/sample-invoice.pdf', "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout",
        f,  # Pass the file stream as a positional argument
        features=[DocumentAnalysisFeature.KEY_VALUE_PAIRS],
        content_type="application/octet-stream", # default value of "application/json", overriden for binary data
    )

In [26]:
result: AnalyzeResult = poller.result()
print(f"I've found {len(result.key_value_pairs)} key-value pairs.")

I've found 24 key-value pairs.


In [27]:
# verbose print of key-value pairs
result.key_value_pairs

[{'key': {'content': 'INVOICE:', 'boundingRegions': [{'pageNumber': 1, 'polygon': [6.8368, 1.4018, 7.4325, 1.4024, 7.4323, 1.5448, 6.8367, 1.5442]}], 'spans': [{'offset': 75, 'length': 8}]}, 'value': {'content': 'INV-100', 'boundingRegions': [{'pageNumber': 1, 'polygon': [7.49, 1.4019, 8.013, 1.4007, 8.0133, 1.5445, 7.4903, 1.5457]}], 'spans': [{'offset': 84, 'length': 7}]}, 'confidence': 0.997},
 {'key': {'content': 'INVOICE DATE:', 'boundingRegions': [{'pageNumber': 1, 'polygon': [6.2044, 1.5987, 7.1897, 1.5999, 7.1895, 1.7504, 6.2042, 1.7492]}], 'spans': [{'offset': 92, 'length': 13}]}, 'value': {'content': '11/15/2019', 'boundingRegions': [{'pageNumber': 1, 'polygon': [7.2456, 1.5985, 8.0075, 1.5971, 8.0078, 1.7494, 7.2459, 1.7508]}], 'spans': [{'offset': 106, 'length': 10}]}, 'confidence': 0.997},
 {'key': {'content': 'DUE DATE:', 'boundingRegions': [{'pageNumber': 1, 'polygon': [6.4762, 1.8088, 7.1845, 1.8117, 7.1838, 1.9611, 6.4756, 1.9582]}], 'spans': [{'offset': 117, 'length':

In [28]:
print("----Key-value pairs found in document----")
if result.key_value_pairs:
    for kv_pair in result.key_value_pairs:
        key = kv_pair.key.content if kv_pair.key else "None"
        value = kv_pair.value.content if kv_pair.value else "None"
        print(f"Key: {key}, \nValue: {value}")
        print("--")


----Key-value pairs found in document----
Key: INVOICE:, 
Value: INV-100
--
Key: INVOICE DATE:, 
Value: 11/15/2019
--
Key: DUE DATE:, 
Value: 12/15/2019
--
Key: CUSTOMER NAME:, 
Value: MICROSOFT CORPORATION
--
Key: SERVICE PERIOD:, 
Value: 10/14/2019 - 11/14/2019
--
Key: CUSTOMER ID:, 
Value: CID-12345
--
Key: BILL TO:, 
Value: Microsoft Finance
123 Bill St,
Redmond WA, 98052
--
Key: SHIP TO:, 
Value: Microsoft Delivery
123 Ship St,
Redmond WA, 98052
--
Key: SERVICE ADDRESS:, 
Value: Microsoft Services 123 Service St, Redmond WA, 98052
--
Key: REQUISITIONER, 
Value: None
--
Key: DATE, 
Value: 3/4/2021
3/5/2021
3/6/2021
--
Key: ITEM CODE, 
Value: A123
B456
C789
--
Key: DESCRIPTION, 
Value: Consulting Services
Document Fee
Printing Fee
--
Key: QTY, 
Value: 2
3
10
--
Key: UM, 
Value: hours
pages
--
Key: PRICE, 
Value: $30.00
$10.00
$1.00
--
Key: TAX, 
Value: $6.00
$3.00
$1.00
--
Key: AMOUNT, 
Value: $60.00
$30.00
$10.00
--
Key: SUBTOTAL, 
Value: $100.00
--
Key: SALES TAX, 
Value: $10.00
-

#### Markdown output

In [29]:
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    body=AnalyzeDocumentRequest(url_source=formUrl),
    output_content_format="markdown" # default "text"
)

In [30]:
print(formUrl)

https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf


In [31]:
# retrieve the file name from the URL
file_name = os.path.basename(formUrl)
print(f"File name: {file_name}")
# file name without extension
file_name_without_ext = os.path.splitext(file_name)[0]

File name: sample-layout.pdf


In [32]:
result: AnalyzeResult = poller.result()

In [33]:
print(result.content)

# UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549


## FORM 10-Q

☐
☒
☒
QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF
1934
For the Quarterly Period Ended March 31, 2020
OR
☐
TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF
1934
For the Transition Period From
to

Commission File Number 001-37845


## MICROSOFT CORPORATION

WASHINGTON
(STATE OF INCORPORATION)
ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399
(425) 882-8080
www.microsoft.com/investor

91-1144442
(I.R.S. ID)

Securities registered pursuant to Section 12(b) of the Act:


<table>
<tr>
<th>Title of each class</th>
<th>Trading Symbol</th>
<th>Name of exchange on which registered</th>
</tr>
<tr>
<td>Common stock, $0.00000625 par value per share</td>
<td>MSFT</td>
<td>NASDAQ</td>
</tr>
<tr>
<td>2.125% Notes due 2021</td>
<td>MSFT</td>
<td>NASDAQ</td>
</tr>
<tr>
<td>3.125% Notes due 2028</td>
<td>MSFT</td>
<td>NASDAQ</td>
</tr>
<tr>
<t

In [34]:
# save result content to file
with open(f"assets/{file_name_without_ext}.md", "w") as f:
    f.write(result.content)  # Write the string content directly

## Chat with your document (basic)

In [35]:
from openai import AzureOpenAI

In [36]:
# Load environment variables from .env file
load_dotenv(dotenv_path='../infra/credentials.env', override=True)

# Use your `key` and `endpoint` environment variables for Azure Document Intelligence
azure_openai_endpoint = os.environ.get('AZURE_OPENAI_ENDPOINT')
print(f'Current endpoint: {azure_openai_endpoint}')

Current endpoint: https://ai-aifoundryupskillinghub687267079310.openai.azure.com/


### Azure OpenAI client

In [37]:
client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_KEY"),  
  api_version="2024-05-01-preview" #"2024-08-01-preview"
)

### Prompt template

In [38]:
question = "What quarterly period does this form cover?"

In [39]:
document_prompt = f"""
Given the markdown-formatted content extracted, answer the following question using only the information contained in the content.
---

Answer concisely and factually. If the information is not present, reply: "Not specified in the document."
---

Markdown Content:
{result.content}
---

Question:
{question}
"""

### Chat

In [40]:
document_response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a document understanding assistant.",
        },
        {
            "role": "user", 
            "content": document_prompt,
        },
    ],
    model="gpt-4o", 
    temperature=0.0, # for stable results
)

print(document_response.choices[0].message.content)

The quarterly period covered by this form is the period ended March 31, 2020.


### Questions generation

In [41]:
questions_prompt = f"""
Given the following markdown-formatted content, generate a list of 5 relevant questions that can be used to verify the correct processing and comprehension of the document by an automated pipeline.

The questions could cover:
- Document metadata and structure
- Company information
- Securities details
- Compliance and filing status
- Shares outstanding

Make sure the questions are clear, factual, and refer only to the information available in the text.

---

Markdown Content:
{result.content}

---

Now, list the questions based on the markdown content above.
"""

In [42]:
questions_response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a document understanding assistant.",
        },
        {
            "role": "user", 
            "content": questions_prompt,
        },
    ],
    model="gpt-4o", 
    temperature=0.0, # for stable results
)

print(questions_response.choices[0].message.content)

1. What is the Commission File Number associated with the Form 10-Q for Microsoft Corporation?

2. What is the par value per share of Microsoft's common stock as registered on NASDAQ?

3. Has Microsoft Corporation filed all reports required by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months?

4. What is the total number of shares outstanding for Microsoft's common stock as of April 24, 2020?

5. What is the state of incorporation for Microsoft Corporation as mentioned in the document?


## Searchable PDF

In the same directory as this notebook, you'll find the `searchable_pdf.py` script.  
This script uses the **Azure Document Intelligence** service to convert an image into a searchable PDF.  
You can customize the input image by modifying the `pdf_path` variable in the `main` function.

To run the script execute the following command:

In [43]:
!python searchable_pdf.py

https://ep-di-standalone.cognitiveservices.azure.com
File downloaded successfully as invoice_sample_search.pdf
