# Document Extraction with Azure AI Document Intelligence and Azure OpenAI GPT-4o (Text Only)

**Before running this notebook, ensure you have selected the correct Python kernel. If running in the `devcontainer` environment, this is likely to be 3.12.11 at `/usr/local/python/current/bin/python`.**

![Example devcontainer notebook kernel](../../../../images/python-notebook-kernel.png)

This sample demonstrates how to extract structured data from any document using Azure AI Document Intelligence and Azure OpenAI GPT models.

![Data Extraction](../../../../images/extraction-document-intelligence-openai.png)

This is achieved by the following process:

- Analyze a document using Azure AI Document Intelligence's `prebuilt-layout` model to extract the structure as Markdown.
- Construct a system prompt that defines the instruction for extracting structured data from documents.
- Construct a user prompt that includes specific extraction instruction for the type of document, and the Markdown content of the document.
- Use the Azure OpenAI chat completions API with the GPT-4o model to generate a structured output from the content.

## Objectives

By the end of this sample, you will have learned how to:

- Convert a document to Markdown format using Azure AI Document Intelligence.
- Use prompt engineering techniques to instruct GPT-4o to extract structured data from a type of document.
- Use the [Structured Outputs feature](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs?tabs=python-secure) to extract structured data from a document using Azure OpenAI's GPT-4o model.
- Use the analysis result from Azure AI Document Intelligence to determine the confidence of the extracted structured output.
- Use the [logprobs](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#request-body:~:text=False-,logprobs,-integer) parameter in an OpenAI request to determine the confidence of the extracted structured output.

## Useful Tips

- Combine this technique with a [page classification](../../classification/README.md) approach to reduce the number of pages to extract from to only those that match your criteria for extraction.

## Setup

#### Install Dependencies

In [36]:
#%pip install azure-ai-documentintelligence

### Import modules

This sample takes advantage of the following Python dependencies:

- **azure-ai-documentintelligence** to interface with the Azure AI Document Intelligence API for analyzing documents.
- **openai** to interface with the Azure OpenAI chat completions API to generate structured extraction outputs using the GPT-4o model.
- **azure-identity** to securely authenticate with deployed Azure Services using Microsoft Entra ID credentials.

The following local components are also used:

- [**invoice**](../../modules/samples/models/invoice.py) to provide the expected structured output JSON schema for invoice documents.
- [**accuracy_evaluator**](../../modules/samples/evaluation/accuracy_evaluator.py) to evaluate the output of the classification process with expected results.
- [**document_intelligence_confidence**](../../modules/samples/confidence/document_intelligence_confidence.py) to calculate the confidence of the extraction process based on the analysis result from the Azure AI Document Intelligence API.
- [**openai_confidence**](../../modules/samples/confidence/openai_confidence.py) to calculate the confidence of the extraction process based on the `logprobs` response from the OpenAI API request.
- [**document_processing_result**](../../modules/samples/models/document_processing_result.py) to store the results of the extraction process as a file.
- [**stopwatch**](../../modules/samples/utils/stopwatch.py) to measure the end-to-end execution time for the extraction process.
- [**app_settings**](../../modules/samples/app_settings.py) to access environment variables from the `.env` file.

In [6]:
import sys
sys.path.append('../../modules/') # Import local modules

from IPython.display import display, Markdown
import os
import pandas as pd
from dotenv import dotenv_values
import json
from openai import AzureOpenAI
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult, DocumentContentFormat
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from concurrent.futures import ThreadPoolExecutor

from samples.app_settings import AppSettings
from samples.utils.stopwatch import Stopwatch
from samples.utils.storage_utils import create_json_file
from samples.models.document_processing_result import DataExtractionResult

from samples.models.acord25 import Acord25
from samples.confidence.confidence_utils import merge_confidence_values
from samples.confidence.openai_confidence import evaluate_confidence as evaluate_openai_confidence
from samples.confidence.document_intelligence_confidence import evaluate_confidence as evaluate_di_confidence
from samples.evaluation.accuracy_evaluator import AccuracyEvaluator
from samples.evaluation.comparison import get_extraction_comparison

### Configure the Azure services

To use Azure AI Document Intelligence and Azure OpenAI, their SDKs are used to create client instances using a deployed endpoint and authentication credentials.

For this sample, the credentials of the Azure CLI are used to authenticate with the deployed services.

In [7]:
# Set the working directory to the root of the repo
working_dir = os.path.abspath('../../../../')
settings = AppSettings(dotenv_values(f"{working_dir}/.env"))
sample_path = f"{working_dir}/samples/python/extraction/text"
sample_name = "document-extraction-gpt"

# Configure the default credential for accessing Azure services using Azure CLI credentials
credential = DefaultAzureCredential(
    exclude_workload_identity_credential=True,
    exclude_developer_cli_credential=True,
    exclude_environment_credential=True,
    exclude_managed_identity_credential=True,
    exclude_powershell_credential=True,
    exclude_shared_token_cache_credential=True,
    exclude_interactive_browser_credential=True
)

openai_token_provider = get_bearer_token_provider(credential, 'https://cognitiveservices.azure.com/.default')

openai_client = AzureOpenAI(
    azure_endpoint=settings.azure_openai_endpoint,
    azure_ad_token_provider=openai_token_provider,
    api_version=settings.azure_openai_api_version
)

document_intelligence_client = DocumentIntelligenceClient(
    endpoint=settings.azure_ai_services_endpoint,
    credential=credential
)

### Establish the expected output

To compare the accuracy of the extraction process, the expected output of the extraction process has been defined in the following code block based on the details of an [Invoice](../../../assets/invoices/invoice_1.pdf).

> **Note**: More invoice examples can be found in the [assets folder](../../../assets/invoices). These examples include the PDF file and an associated JSON metadata file that provides the expected structured output. You can add your own scenarios by following the same structure.

The expected output has been defined by a human evaluating the document.

In [39]:
path = f"{working_dir}/samples/assets/forms/"
metadata_fname = "acord25.json" # Change this to the file you want to evaluate
metadata_fpath = f"{path}{metadata_fname}"

with open(metadata_fpath, "r") as f:
    data = json.load(f)

#expected = Acord25(**data['0_expected'])
pdf_fname = data['fname']
pdf_fpath = f"{path}{pdf_fname}"

#invoice_evaluator = AccuracyEvaluator(match_keys=['product_code', 'description']) # Product Code and Descriptions are used to match the extracted product items data
print(f"Evaluating:  {pdf_fname}")

Evaluating:  ACORD_25_Completed.pdf


In [42]:
try:
    expected = Acord25.model_construct(data['0_expected'], strict=False)
except ValidationError as e:
    print(f"Validation failed: {e}")
    # Handle missing fields or provide defaults

## Extract data from the document

The following code block executes the data extraction process using Azure AI Document Intelligence and Azure OpenAI's GPT-4o model.

It performs the following steps:

1. Get the document bytes from the provided file path. _Note: In this example, we are processing a local document, however, you can use any document storage location of your choice, such as Azure Blob Storage._
2. Use Azure AI Document Intelligence to analyze the structure of the document and convert it to Markdown format using the pre-built layout model.
3. Using Azure OpenAI's GPT-4o model and its [Structured Outputs feature](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs?tabs=python-secure), extract a structured data transfer object (DTO) from the content of the Markdown.

In [11]:
with Stopwatch() as di_stopwatch:
    with open(pdf_fpath, "rb") as f:
        poller = document_intelligence_client.begin_analyze_document(
            model_id="prebuilt-layout",
            body=f,
            output_content_format=DocumentContentFormat.MARKDOWN,
            content_type="application/pdf"
        )

    result: AnalyzeResult = poller.result()

markdown = result.content

In [12]:
# Doc AI Raw Output
result_d = result.as_dict()
print(json.dumps(result_d, indent=2))

{
  "apiVersion": "2024-11-30",
  "modelId": "prebuilt-layout",
  "stringIndexType": "textElements",
  "content": "<figure>\n\nACORD\u00ae\n\u00ae\n\n</figure>\n\n\n# CERTIFICATE OF LIABILITY INSURANCE\n\nDATE (MM/DD/YYYY)\n\n05/05/2025\n\nTHIS CERTIFICATE IS ISSUED AS A MATTER OF INFORMATION ONLY AND CONFERS NO RIGHTS UPON THE CERTIFICATE HOLDER. THIS\nCERTIFICATE DOES NOT AFFIRMATIVELY OR NEGATIVELY AMEND, EXTEND OR ALTER THE COVERAGE AFFORDED BY THE POLICIES\nBELOW. THIS CERTIFICATE OF INSURANCE DOES NOT CONSTITUTE A CONTRACT BETWEEN THE ISSUING INSURER(S), AUTHORIZED\nREPRESENTATIVE OR PRODUCER, AND THE CERTIFICATE HOLDER.\n\nIMPORTANT: If the certificate holder is an ADDITIONAL INSURED, the policy(ies) must have ADDITIONAL INSURED provisions or be endorsed.\nIf SUBROGATION IS WAIVED, subject to the terms and conditions of the policy, certain policies may require an endorsement. A statement on\nthis certificate does not confer rights to the certificate holder in lieu of such endors

In [13]:
# Displays the output of the Azure AI Document Intelligence pre-built layout analysis in Markdown format.
display(Markdown(markdown))

<figure>

ACORD®
®

</figure>


# CERTIFICATE OF LIABILITY INSURANCE

DATE (MM/DD/YYYY)

05/05/2025

THIS CERTIFICATE IS ISSUED AS A MATTER OF INFORMATION ONLY AND CONFERS NO RIGHTS UPON THE CERTIFICATE HOLDER. THIS
CERTIFICATE DOES NOT AFFIRMATIVELY OR NEGATIVELY AMEND, EXTEND OR ALTER THE COVERAGE AFFORDED BY THE POLICIES
BELOW. THIS CERTIFICATE OF INSURANCE DOES NOT CONSTITUTE A CONTRACT BETWEEN THE ISSUING INSURER(S), AUTHORIZED
REPRESENTATIVE OR PRODUCER, AND THE CERTIFICATE HOLDER.

IMPORTANT: If the certificate holder is an ADDITIONAL INSURED, the policy(ies) must have ADDITIONAL INSURED provisions or be endorsed.
If SUBROGATION IS WAIVED, subject to the terms and conditions of the policy, certain policies may require an endorsement. A statement on
this certificate does not confer rights to the certificate holder in lieu of such endorsement(s).

PRODUCER

Contoso Insurance

1122 Insurers Way

Nowheresville, MA 12345

INSURED

Adventure Works Construction Co.

123 Icanfixit Dr

Mechanicsville, VA 23111


<table>
<tr>
<td>CONTACT NAME:</td>
<td colspan="2">Brian Walker</td>
</tr>
<tr>
<td>PHONE (A/C, No, Ext):</td>
<td>804 241-0019</td>
<td>FAX (A/C, No):</td>
</tr>
<tr>
<td>E-MAIL ADDRESS:</td>
<td colspan="2">brwalker@microsoft.com</td>
</tr>
</table>


<table>
<tr>
<th>INSURER(S) AFFORDING COVERAGE</th>
<th>NAIC #</th>
</tr>
<tr>
<td>INSURER A : ABC Insurance</td>
<td>1234</td>
</tr>
<tr>
<td>INSURER B : Progressive Casualty Insurance Company</td>
<td>24260</td>
</tr>
<tr>
<td>INSURER C :</td>
<td></td>
</tr>
<tr>
<td>INSURER D :</td>
<td></td>
</tr>
<tr>
<td>INSURER E :</td>
<td></td>
</tr>
<tr>
<td>INSURER F :</td>
<td></td>
</tr>
</table>


COVERAGES

CERTIFICATE NUMBER: 2025050500123

REVISION NUMBER:

THIS IS TO CERTIFY THAT THE POLICIES OF INSURANCE LISTED BELOW HAVE BEEN ISSUED TO THE INSURED NAMED ABOVE FOR THE POLICY PERIOD
INDICATED. NOTWITHSTANDING ANY REQUIREMENT, TERM OR CONDITION OF ANY CONTRACT OR OTHER DOCUMENT WITH RESPECT TO WHICH THIS
CERTIFICATE MAY BE ISSUED OR MAY PERTAIN, THE INSURANCE AFFORDED BY THE POLICIES DESCRIBED HEREIN IS SUBJECT TO ALL THE TERMS,
EXCLUSIONS AND CONDITIONS OF SUCH POLICIES. LIMITS SHOWN MAY HAVE BEEN REDUCED BY PAID CLAIMS.


<table>
<tr>
<th>INSR LTR</th>
<th colspan="2">TYPE OF INSURANCE</th>
<th>ADDL INSD</th>
<th>SUBR WVD</th>
<th>POLICY NUMBER</th>
<th>POLICY EFF (MM/DD/YYYY)</th>
<th>POLICY EXP (MM/DD/YYYY)</th>
<th colspan="2">LIMITS</th>
</tr>
<tr>
<td rowspan="7">A</td>
<td colspan="2">☒ COMMERCIAL GENERAL LIABILITY</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>EACH OCCURRENCE</td>
<td>$ 1,000,000</td>
</tr>
<tr>
<td colspan="2">☐ ☐ CLAIMS-MADE ☒ OCCUR</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td rowspan="2"></td>
<td>DAMAGE TO RENTED PREMISES (Ea occurrence)</td>
<td>$ 300,000</td>
</tr>
<tr>
<td colspan="2">☒ XCU/8FPD/OCP</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>MED EXP (Any one person)</td>
<td>$ 5,000</td>
</tr>
<tr>
<td colspan="2">☒ Separation of Insureds</td>
<td></td>
<td></td>
<td>CGL-1234</td>
<td rowspan="2">01/01/2025</td>
<td rowspan="2">01/01/2026</td>
<td>PERSONAL &amp; ADV INJURY</td>
<td>$ 1,000,000</td>
</tr>
<tr>
<td colspan="2">GEN'L AGGREGATE LIMIT APPLIES PER:</td>
<td></td>
<td></td>
<td></td>
<td>GENERAL AGGREGATE</td>
<td>$ 2,000,000</td>
</tr>
<tr>
<td colspan="2">☐ POLICY ☒ PRO- ☐ JECT LOC</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PRODUCTS - COMP/OP AGG</td>
<td>$ 2,000,000</td>
</tr>
<tr>
<td colspan="2">☐ OTHER:</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>$</td>
</tr>
<tr>
<td rowspan="5">B</td>
<td colspan="2">AUTOMOBILE LIABILITY</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>COMBINED SINGLE LIMIT (Ea accident)</td>
<td>$ 500,000</td>
</tr>
<tr>
<td colspan="2">☒ ANY AUTO</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td rowspan="4"></td>
<td>BODILY INJURY (Per person)</td>
<td>$ 100,000</td>
</tr>
<tr>
<td colspan="2" rowspan="3">☐ OWNED AUTOS ONLY ☐ SCHEDULED ☐ HIRED AUTOS AUTOS ONLY NON-OWNED ☐ ☐ ☐ AUTOS ONLY</td>
<td></td>
<td></td>
<td>PAP-9876</td>
<td></td>
<td>BODILY INJURY (Per accident)</td>
<td>$ 500,000</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PROPERTY DAMAGE (Per accident)</td>
<td>$ 100,000</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>$</td>
</tr>
<tr>
<td rowspan="3"></td>
<td colspan="2">☐ UMBRELLA LIAB ☐ OCCUR</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>EACH OCCURRENCE</td>
<td>$</td>
</tr>
<tr>
<td>☐ EXCESS LIAB</td>
<td>☐ CLAIMS-MADE</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>AGGREGATE</td>
<td>$</td>
</tr>
<tr>
<td colspan="2">☐ ☐ DED RETENTION $</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>$</td>
</tr>
<tr>
<td rowspan="4">C</td>
<td colspan="2" rowspan="4">WORKERS COMPENSATION AND EMPLOYERS' LIABILITY ANYPROPRIETOR/PARTNER/EXECUTIVE Y/N OFFICER/MEMBEREXCLUDED? N (Mandatory in NH) If yes, describe under DESCRIPTION OF OPERATIONS below</td>
<td rowspan="4">N/A</td>
<td rowspan="4"></td>
<td></td>
<td></td>
<td rowspan="4"></td>
<td>☒ PER STATUTE ☐ OTH- ER</td>
<td></td>
</tr>
<tr>
<td rowspan="2">WC-5678</td>
<td rowspan="2"></td>
<td>E.L. EACH ACCIDENT</td>
<td>$ 1,000,000</td>
</tr>
<tr>
<td>E.L. DISEASE - EA EMPLOYEE</td>
<td>$ 1,000,000</td>
</tr>
<tr>
<td></td>
<td></td>
<td>E.L. DISEASE - POLICY LIMIT</td>
<td>$ 1,000,000</td>
</tr>
<tr>
<td></td>
<td colspan="2"></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>


DESCRIPTION OF OPERATIONS / LOCATIONS / VEHICLES (ACORD 101, Additional Remarks Schedule, may be attached if more space is required)


<table>
<tr>
<th>CERTIFICATE HOLDER</th>
<th>CANCELLATION</th>
</tr>
<tr>
<td rowspan="2">123 Leasing 456 RandomHwy Lazytown, NC 98712</td>
<td>SHOULD ANY OF THE ABOVE DESCRIBED POLICIES BE CANCELLED BEFORE THE EXPIRATION DATE THEREOF, NOTICE WILL BE DELIVERED IN ACCORDANCE WITH THE POLICY PROVISIONS.</td>
</tr>
<tr>
<td>AUTHORIZED REPRESENTATIVE Jane Doe</td>
</tr>
</table>


<!-- PageFooter="@1988-2015 ACORD CORPORATION. All rights reserved." -->
<!-- PageFooter="ACORD 25 (2016/03)" -->
<!-- PageFooter="The ACORD name and logo are registered marks of ACORD" -->


In [14]:
system_prompt = f"""You are an AI assistant that extracts data from documents."""

In [15]:
# Prepare the user content for the OpenAI API including any specific details for processing this type of document, and the document text.
user_content = []

In [16]:
user_text_prompt = """Extract the data from this invoice. 
- If a value is not present, provide null.
- Dates should be in the format YYYY-MM-DD."""

user_content.append({
    "type": "text",
    "text": user_text_prompt
})

user_content.append({
    "type": "text",
    "text": markdown
})

In [18]:
with Stopwatch() as oai_stopwatch:
    completion = openai_client.beta.chat.completions.parse(
        model=settings.azure_openai_chat_deployment,
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": user_content
            }
        ],
        response_format=Acord25,
        max_tokens=4096,
        temperature=0.1,
        top_p=0.1,
        logprobs=True # Enabled to determine the confidence of the response.
    )

### Understanding the Structured Outputs JSON schema

Using [Pydantic's JSON schema feature](https://docs.pydantic.dev/latest/concepts/json_schema/), the [Invoice](../../modules/samples/models/invoice.py) data model is automatically converted to a JSON schema when applied to the `response_format` parameter of the OpenAI chat completions request.

The JSON schema is used to instruct the GPT-4o model to generate a strict output that adheres to the structure defined. The approach using Pydantic makes it easier for developers to manage the data structure in code, with helpful descriptions and examples that will be included in the final JSON schema.

Demonstrated below, you can see how the Invoice data model is understood by the OpenAI request:

In [44]:
# Highlight the schema sent to the OpenAI model
print(json.dumps(Acord25.model_json_schema(), indent=2))

{
  "$defs": {
    "Address": {
      "description": "A class representing an address in a form.",
      "properties": {
        "street": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "description": "Street address, e.g. 123 456th St.",
          "title": "Street"
        },
        "city": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "description": "Name of city, town, village, etc., e.g. New York",
          "title": "City"
        },
        "state": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "description": "Name of State or local administrative division, e.g. NY",
          "title": "State"
        },
        "postal_code": {
      

## Visualize the outputs

To provide context for the execution of the code, the following code blocks visualize the outputs of the data extraction process.

This includes:

- The Markdown representation of the document structure as determined by Azure AI Document Intelligence.
- The accuracy of the structured data extraction comparing the expected output with the output generated by Azure OpenAI's GPT-4o model.
- The confidence score of the structured data extraction by comparing against the Azure AI Document Intelligence analysis.
- The execution time of the end-to-end process.
- The total number of tokens consumed by the GPT-4o model.
- The side-by-side comparison of the expected output and the output generated by Azure OpenAI's GPT-4o model.

### Understanding Accuracy vs Confidence

When using AI to extract structured data, both confidence and accuracy are essential for different but complementary reasons.

- **Accuracy** measures how close the AI model's output is to a ground truth or expected output. It reflects how well the model's predictions align with reality.
  - Accuracy ensures consistency in the extraction process, which is crucial for downstream tasks using the data.
- **Confidence** represents the AI model's internal assessment of how certain it is about its predictions.
  - Confidence indicates that the model is certain about its predictions, which can be a useful indicator for human reviewers to step in for manual verification.

High accuracy and high confidence are ideal, but in practice, there is often a trade-off between the two. While accuracy cannot always be self-assessed, confidence scores can and should be used to prioritize manual verification of low-confidence predictions.

In [43]:
# Gets the parsed Invoice object from the completion response.
form = completion.choices[0].message.parsed

expected_dict = expected.model_dump()
form_dict = form.model_dump()

In [25]:
# Determines the accuracy of the extracted data against the expected values.
#accuracy = invoice_evaluator.evaluate(expected=expected_dict, actual=invoice_dict)


In [26]:
# Determines the confidence of the extracted data using both the OpenAI and Azure Document Intelligence responses.
di_confidence = evaluate_di_confidence(form_dict, result)
oai_confidence = evaluate_openai_confidence(form_dict, completion.choices[0])

confidence = merge_confidence_values(di_confidence, oai_confidence)

In [27]:
# Gets the total execution time of the data extraction process.
total_elapsed = di_stopwatch.elapsed + oai_stopwatch.elapsed

# Gets the prompt tokens and completion tokens from the completion response.
prompt_tokens = completion.usage.prompt_tokens
completion_tokens = completion.usage.completion_tokens

In [34]:
# Save the output of the data extraction result.
accuracy = 100  # Set to None if not used, as the accuracy evaluator is not used in this example.
#extraction_result = DataExtractionResult(form_dict, confidence, accuracy, prompt_tokens, completion_tokens, total_elapsed)
extraction_result = DataExtractionResult(form_dict, confidence, accuracy, prompt_tokens, completion_tokens, total_elapsed)

create_json_file(f"{sample_path}/{sample_name}.{pdf_fname}.json", extraction_result)

In [None]:
# Display the outputs of the data extraction process.
df = pd.DataFrame([
    {
 #       "Accuracy": f"{accuracy['overall'] * 100:.2f}%",
        "Confidence": f"{confidence['_overall'] * 100:.2f}%",
        "Execution Time": f"{total_elapsed:.2f} seconds",
        "Document Intelligence Execution Time": f"{di_stopwatch.elapsed:.2f} seconds",
        "OpenAI Execution Time": f"{oai_stopwatch.elapsed:.2f} seconds",
        "Prompt Tokens": prompt_tokens,
        "Completion Tokens": completion_tokens
    }
])

display(df)
display(get_extraction_comparison(expected_dict, form_dict, confidence, accuracy['accuracy']))

TypeError: 'int' object is not subscriptable