# 💡 Fast, Accurate Parsing of Invoices with LandingAI

This notebook demonstrates how to use the `landing-ade` Python package to extract structured information from invoices using LandingAI's Agentic Document Extraction (ADE) service. 

We'll walk through:
- Parsing documents with ADE Parse API.
- Defining a custom schema for use with invoices using `pydantic` or `JSON`.
- Extracting the desired fields using ADE Extract API
- Viewing structured field extractions and metadata.
- Not covered:
    - Connecting to upstream document sources.
    - Inserting parse() and extract() results into structured tables.
    - Optimizing pipeline throughput.

> 📎 Supported formats: `.pdf`, `.png`, `.jpg`, `.jpeg`. (More coming soon)

In [1]:
# ---
# Title: Fast, Accurate Parsing of Invoices with LandingAI
# Author: Andrea Kropp
# Description: How to apply a custom extraction schema to pull fields out of photos and PDFs of invoices.
# Target Audience: Developers, Product Managers
# Content Type: How-To
# Publish Date: 2025-10-06
# ADE Version: landingai-ade-0.17.1
# Change Log:
#    - v1.0: Initial draft
# ---

### ✨ Install LandingAI's Agentic Document Extraction

```bash
!pip install landing-ade
```

### 🗝️ Obtain and Set an API Key

Obtain your API Key from the Visual Playground at https://va.landing.ai/settings/api-key

Read about options for setting your API at https://docs.landing.ai/ade/agentic-api-key


## 📦 Setup and Imports

In [2]:
# Standard libraries
import os
import json
from dotenv import load_dotenv
from datetime import date
from pathlib import Path

In [3]:
from landingai_ade import LandingAIADE

In [4]:
# Helper functions to go along with ADE
from utilities import *

In [21]:
import importlib
import utilities
importlib.reload(utilities)
from utilities import *

In [5]:
# Load setting (including the VISION_AGENT_API_KEY) from the .env file
load_dotenv()

True

In [6]:
client = LandingAIADE(apikey=os.environ.get("VISION_AGENT_API_KEY"))
print("Authenticated client initialized")

Authenticated client initialized


In [7]:
import landingai_ade
print(landingai_ade.__version__)

0.17.1


In [8]:
# --- Import your Pydantic schema class ---
from invoice_schema import InvoiceExtractionSchema  # 👈 imports the Invoice model

## 📁 Define Input and Output Directories

Specify where your documents are located and where results will be saved.


In [9]:
# Define input and output directory paths
base_dir = Path(os.getcwd())
input_folder = base_dir / "input_folder"
results_folder = base_dir / "results_folder"
groundings_folder = base_dir / "groundings_folder"

# Create output folders if they don't exist
results_folder.mkdir(parents=True, exist_ok=True)
groundings_folder.mkdir(parents=True, exist_ok=True)

## 🗂️ Collect Document File Paths

This block filters input files for supported formats.

In [10]:
# Collect all document file paths in input folder with supported extensions
# Convert each Path object to a string to ensure compatibility with parse()

file_paths = [
    str(p)
    for p in input_folder.iterdir()
    if p.suffix.lower() in [".pdf", ".png", ".jpg", ".jpeg"]
]
file_paths[0:5]

['/Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/input_folder/invoice_12.pdf',
 '/Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/input_folder/invoice_13.pdf',
 '/Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/input_folder/invoice_9.pdf',
 '/Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/input_folder/invoice_11.pdf',
 '/Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/input_folder/invoice_10.pdf']

### Thumbnails for the Invoices in the Demo

<img src="images/invoices_to_parse.png" width="80%" alt="Invoice image preview">

## Single Invoice Parsing

In [11]:
# Send a single invoice for parsing

client = LandingAIADE()
single_result = client.parse(document=Path(file_paths[0]),)

print(f"Number of chunks: {len(single_result.chunks)}")
print("Global markdown:", single_result.markdown[:200] + "...")

Number of chunks: 46
Global markdown: <a id='6e515e41-b3b2-41c5-befc-4bce78d79414'></a>

<::logo: Condor
condor
The logo features the word "condor" in a bold, sans-serif font, followed by a circular emblem containing a stylized bird or wi...


In [None]:
# Explore the contents

# single_result.markdown
# single_result.chunks
# single_result.metadata
# single_result.splits
# single_result.grounding

In [17]:
# Send a single invoice for parsing and write the results to file
# Function found in utilities.py
single_result_parse_save = parse_and_save(document_path=file_paths[1], client=client, output_dir=results_folder)
single_result_parse_save

Parse results saved to: /Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/results_folder/parse_invoice_13.json


ParseResponse(chunks=[Chunk(id='0a8bc7ea-1c99-4ca8-8caa-ed65b66dd0dc', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.04086274281144142, left=0.3944183588027954, right=0.505725085735321, top=0.021210074424743652), page=0), markdown="<a id='0a8bc7ea-1c99-4ca8-8caa-ed65b66dd0dc'></a>\n\nTax Invoice", type='marginalia'), Chunk(id='d17da48c-2102-49e5-a343-5f37b7dc68f0', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.03962094336748123, left=0.6299343705177307, right=0.8244345784187317, top=0.017798636108636856), page=0), markdown="<a id='d17da48c-2102-49e5-a343-5f37b7dc68f0'></a>\n\n(ORIGINAL FOR RECIPIENT)", type='text'), Chunk(id='6a9482b2-56fc-473b-add8-7fd2b70fdec8', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.402715265750885, left=0.06802284717559814, right=0.49336951971054077, top=0.04924289882183075), page=0), markdown="<a id='6a9482b2-56fc-473b-add8-7fd2b70fdec8'></a>\n\nKANDHAN METAL COMPANY\nOLD NO: 12, NEW NO:33,JANI BATCHA STREET.\nROYAPETTAH,CHENNA

In [18]:
# Send a single invoice for parsing, save output, send for extrcation, save output
# Function found in utilities.py
single_result_full_pipe = parse_extract_save(file_paths[2], client, InvoiceExtractionSchema, output_dir= results_folder)
single_result_full_pipe

Parse results saved to: /Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/results_folder/parse_invoice_9.json
Extract results saved to: /Users/andreakropp/Documents/Github/andrea-kropp/ade_demos/Invoices/results_folder/extract_invoice_9.json


(ParseResponse(chunks=[Chunk(id='b8d98675-73c6-4b7c-ae20-83f5136e3518', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.096154123544693, left=0.08743181824684143, right=0.44194576144218445, top=0.033484891057014465), page=0), markdown="<a id='b8d98675-73c6-4b7c-ae20-83f5136e3518'></a>\n\n<::logo: Freshworks\nfreshworks\nA stylized leaf-like symbol composed of multiple facets in shades of gray.::>", type='logo'), Chunk(id='2315934d-7178-445e-8557-45fa2008605a', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.21089597046375275, left=0.08651575446128845, right=0.4332708418369293, top=0.10671643912792206), page=0), markdown="<a id='2315934d-7178-445e-8557-45fa2008605a'></a>\n\nFreshworks Inc., (formerly known as Freshdesk Inc.)\n2950 S. Delaware St,\nSuite 201, San Mateo, CA 94403,\nU.S.A.\nPhone: +1 (866) 832 3090\nTax ID: 33-1218825\nTax Reg #: 33-1218825", type='text'), Chunk(id='c3068653-57d0-4361-9464-760eaa962d8c', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom

## 🧩 Parallel ADE Parsing with Progress Tracking

This section performs **parallel document parsing** using the LandingAI Agentic Document Extraction (ADE) client.  
It scans the input directory for all `.pdf`, `.png`, `.jpg`, and `.jpeg` files, sends each file to the ADE API,  
and saves the extracted results to the specified output folder.

Key features:
- ⚡ **Parallel processing** with `ThreadPoolExecutor` to speed up large batches  
- 📊 **Real-time progress bar** using `tqdm` to visualize parsing progress  
- 💾 **Automatic result saving** via `save_parse_results()`  
- 🧱 **Robust handling** — skips over failed files gracefully  
- 🧠 **Results aggregation** — all successful `ParseResponse` objects are stored in `results_summary`

After execution, you'll see:
- A live progress bar showing parsing completion
- Status messages for each document
- A summary of how many documents were successfully parsed and saved

In [19]:
from pathlib import Path
from landingai_ade import LandingAIADE
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
from tqdm import tqdm
from utilities import parse_extract_save
from invoice_schema import InvoiceExtractionSchema

# --- CONFIG ---
input_dir = Path("input_folder")
output_dir = Path("results_folder")
output_dir.mkdir(parents=True, exist_ok=True)

max_workers = 10  # adjust for your system and ADE rate limits
pause_between_requests = 0.2  # small delay to avoid hitting rate limits

# --- CLIENT ---
client = LandingAIADE()

# --- FILE LIST ---
file_paths = [p for p in input_dir.glob("*.*") if p.suffix.lower() in (".pdf", ".png", ".jpg", ".jpeg")]
print(f"Found {len(file_paths)} documents to parse and extract.")

# --- WORKER FUNCTION ---
def process_file(path: Path):
    try:
        # Parse AND extract using the utility function
        parse_result, extract_result = parse_extract_save(
            path, 
            client, 
            InvoiceExtractionSchema, 
            output_dir=output_dir
        )
        time.sleep(pause_between_requests)
        return (parse_result, extract_result)  # 👈 return both results as tuple
    except Exception as e:
        print(f"❌ {path.name} failed: {e}")
        return None

# --- PARALLEL EXECUTION WITH PROGRESS BAR ---
results_summary = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = [executor.submit(process_file, p) for p in file_paths]
    # tqdm progress bar updates as futures complete
    for future in tqdm(as_completed(futures), total=len(futures), desc="Processing documents"):
        result = future.result()
        if result is not None:
            results_summary.append(result)

# --- SUMMARY ---
success_count = len([r for r in results_summary if r is not None])
print(f"\n✅ Completed {success_count}/{len(file_paths)} documents successfully.")
print(f"📊 Each document has been parsed AND extracted with structured data.")

Found 27 documents to parse and extract.


Processing documents:   0%|          | 0/27 [00:00<?, ?it/s]

Parse results saved to: results_folder/parse_invoice_9.json
Parse results saved to: results_folder/parse_invoice_12.json
Parse results saved to: results_folder/parse_invoice_16.json
Parse results saved to: results_folder/parse_invoice_14.json
Parse results saved to: results_folder/parse_invoice_8.json
Parse results saved to: results_folder/parse_invoice_17.json
Parse results saved to: results_folder/parse_invoice_11.json
Parse results saved to: results_folder/parse_invoice_13.json
Parse results saved to: results_folder/parse_invoice_15.json
Extract results saved to: results_folder/extract_invoice_9.json
Parse results saved to: results_folder/parse_invoice_10.json


Processing documents:   4%|▎         | 1/27 [00:13<05:47, 13.36s/it]

Extract results saved to: results_folder/extract_invoice_17.json


Processing documents:   7%|▋         | 2/27 [00:14<02:37,  6.30s/it]

Extract results saved to: results_folder/extract_invoice_8.json


Processing documents:  11%|█         | 3/27 [00:17<01:52,  4.67s/it]

Extract results saved to: results_folder/extract_invoice_16.json


Processing documents:  15%|█▍        | 4/27 [00:18<01:13,  3.20s/it]

Extract results saved to: results_folder/extract_invoice_11.json


Processing documents:  19%|█▊        | 5/27 [00:18<00:47,  2.16s/it]

Parse results saved to: results_folder/parse_invoice_3.json
Parse results saved to: results_folder/parse_invoice_27.json
Extract results saved to: results_folder/extract_invoice_13.json
Extract results saved to: results_folder/extract_invoice_14.json


Processing documents:  26%|██▌       | 7/27 [00:22<00:39,  1.95s/it]

Extract results saved to: results_folder/extract_invoice_12.json


Processing documents:  30%|██▉       | 8/27 [00:23<00:28,  1.48s/it]

Extract results saved to: results_folder/extract_invoice_10.json


Processing documents:  33%|███▎      | 9/27 [00:23<00:19,  1.10s/it]

Parse results saved to: results_folder/parse_invoice_2.json
Parse results saved to: results_folder/parse_invoice_26.json
Parse results saved to: results_folder/parse_invoice_18.json
Parse results saved to: results_folder/parse_invoice_19.json
Extract results saved to: results_folder/extract_invoice_27.json


Processing documents:  37%|███▋      | 10/27 [00:31<00:53,  3.14s/it]

Extract results saved to: results_folder/extract_invoice_3.json


Processing documents:  41%|████      | 11/27 [00:31<00:36,  2.26s/it]

Parse results saved to: results_folder/parse_invoice_1.json
Extract results saved to: results_folder/extract_invoice_18.json


Processing documents:  44%|████▍     | 12/27 [00:33<00:31,  2.10s/it]

Extract results saved to: results_folder/extract_invoice_26.json


Processing documents:  48%|████▊     | 13/27 [00:34<00:23,  1.70s/it]

Extract results saved to: results_folder/extract_invoice_2.json


Processing documents:  52%|█████▏    | 14/27 [00:35<00:20,  1.55s/it]

Parse results saved to: results_folder/parse_invoice_24.json
Parse results saved to: results_folder/parse_invoice_21.json
Extract results saved to: results_folder/extract_invoice_19.json


Processing documents:  56%|█████▌    | 15/27 [00:39<00:27,  2.26s/it]

Extract results saved to: results_folder/extract_invoice_1.json


Processing documents:  59%|█████▉    | 16/27 [00:40<00:20,  1.84s/it]

Parse results saved to: results_folder/parse_invoice_20.json
Parse results saved to: results_folder/parse_invoice_5.json
Parse results saved to: results_folder/parse_invoice_4.json
Parse results saved to: results_folder/parse_invoice_25.json
Extract results saved to: results_folder/extract_invoice_15.json


Processing documents:  63%|██████▎   | 17/27 [00:43<00:22,  2.29s/it]

Extract results saved to: results_folder/extract_invoice_21.json


Processing documents:  67%|██████▋   | 18/27 [00:45<00:18,  2.09s/it]

Extract results saved to: results_folder/extract_invoice_24.json
Parse results saved to: results_folder/parse_invoice_22.json


Processing documents:  70%|███████   | 19/27 [00:45<00:12,  1.59s/it]

Extract results saved to: results_folder/extract_invoice_5.json
Parse results saved to: results_folder/parse_invoice_6.json


Processing documents:  74%|███████▍  | 20/27 [00:49<00:15,  2.20s/it]

Extract results saved to: results_folder/extract_invoice_20.json


Processing documents:  78%|███████▊  | 21/27 [00:49<00:10,  1.77s/it]

Parse results saved to: results_folder/parse_invoice_23.json
Parse results saved to: results_folder/parse_invoice_7.json
Extract results saved to: results_folder/extract_invoice_22.json


Processing documents:  81%|████████▏ | 22/27 [00:53<00:10,  2.17s/it]

Extract results saved to: results_folder/extract_invoice_4.json


Processing documents:  85%|████████▌ | 23/27 [00:54<00:07,  1.88s/it]

Extract results saved to: results_folder/extract_invoice_6.json


Processing documents:  89%|████████▉ | 24/27 [00:56<00:06,  2.14s/it]

Extract results saved to: results_folder/extract_invoice_25.json


Processing documents:  93%|█████████▎| 25/27 [00:58<00:03,  1.87s/it]

Extract results saved to: results_folder/extract_invoice_7.json


Processing documents:  96%|█████████▋| 26/27 [01:01<00:02,  2.21s/it]

Extract results saved to: results_folder/extract_invoice_23.json


Processing documents: 100%|██████████| 27/27 [01:12<00:00,  2.70s/it]


✅ Completed 27/27 documents successfully.
📊 Each document has been parsed AND extracted with structured data.





In [20]:
results_summary[0:5]

[(ParseResponse(chunks=[Chunk(id='91a9fca9-fe21-4cbb-9564-f42f8d5c5d97', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.096154123544693, left=0.08743181824684143, right=0.44194576144218445, top=0.033484891057014465), page=0), markdown="<a id='91a9fca9-fe21-4cbb-9564-f42f8d5c5d97'></a>\n\n<::logo: Freshworks\nfreshworks\nA stylized leaf-like symbol composed of multiple facets in shades of gray.::>", type='logo'), Chunk(id='aa7a2b51-6d87-48d0-a809-bb6a948a707f', grounding=ChunkGrounding(box=ChunkGroundingBox(bottom=0.21089597046375275, left=0.08651575446128845, right=0.4332708418369293, top=0.10671643912792206), page=0), markdown="<a id='aa7a2b51-6d87-48d0-a809-bb6a948a707f'></a>\n\nFreshworks Inc., (formerly known as Freshdesk Inc.)\n2950 S. Delaware St,\nSuite 201, San Mateo, CA 94403,\nU.S.A.\nPhone: +1 (866) 832 3090\nTax ID: 33-1218825\nTax Reg #: 33-1218825", type='text'), Chunk(id='69b3493a-bbb7-4e13-8923-968322bbee95', grounding=ChunkGrounding(box=ChunkGroundingBox(botto

## Summary Tables Containing Invoice Details

In [23]:
# Prepare summary tables using the output from the step above
# Functions located in utilities.py

invoice_summaries = create_invoice_summary_tables(results_summary)

## Invoice Parsing - One Row per Invoice

In [29]:
invoice_markdown = invoice_summaries[0]
invoice_markdown

Unnamed: 0,RUN_ID,INVOICE_UUID,DOCUMENT_NAME,AGENTIC_DOC_VERSION,MARKDOWN
0,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,dpt-2-20250919,<a id='91a9fca9-fe21-4cbb-9564-f42f8d5c5d97'><...
1,ab51eee8-43ab-4c60-974f-d37043e93f76,83aa2614-39b8-418b-90b7-e726ebd3c90a,invoice_17.pdf,dpt-2-20250919,<a id='960d63e5-ea32-4c17-b4b7-56eeaee689f5'><...
2,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,<a id='a24acd30-2126-4953-854b-b4d9635480ea'><...
3,ab51eee8-43ab-4c60-974f-d37043e93f76,116fc387-7366-4838-9ff5-de97eff28186,invoice_16.pdf,dpt-2-20250919,<a id='7e376974-b72e-4f94-bddc-25b2f9858525'><...
4,ab51eee8-43ab-4c60-974f-d37043e93f76,fcfa7140-e234-4eb5-be2e-d356993b6f3f,invoice_11.pdf,dpt-2-20250919,<a id='4dfa9093-3dae-4da1-be94-840c8643bd24'><...
5,ab51eee8-43ab-4c60-974f-d37043e93f76,47b62e70-4187-418c-8cf1-3c1635ee1dc7,invoice_13.pdf,dpt-2-20250919,<a id='63de6f08-16f2-41a5-a702-99292aa34f2f'><...
6,ab51eee8-43ab-4c60-974f-d37043e93f76,dfb98d68-9dd7-474e-a03a-b77cc8465db9,invoice_14.pdf,dpt-2-20250919,<a id='da35e71a-b1f0-49a6-8f42-3800abc9f42e'><...
7,ab51eee8-43ab-4c60-974f-d37043e93f76,e396238d-7846-4950-aa90-8af0c316f2d4,invoice_12.pdf,dpt-2-20250919,<a id='6a7520cc-51fe-40ca-a316-6a23a56e6f2b'><...
8,ab51eee8-43ab-4c60-974f-d37043e93f76,371b2d4a-63e9-468b-8275-f878220c0a8f,invoice_10.pdf,dpt-2-20250919,<a id='d5f0469f-6bee-459a-b00c-2bd0e5eb00f3'><...
9,ab51eee8-43ab-4c60-974f-d37043e93f76,4b1db45a-6de2-4bda-94d3-c606276bb1ce,invoice_27.pdf,dpt-2-20250919,<a id='dfc89375-e0e6-4458-983d-e18c64748754'><...


### Invoice Chunks - One Row per Chunk

In [33]:
invoice_chunks = invoice_summaries[1]
invoice_chunks

Unnamed: 0,RUN_ID,INVOICE_UUID,DOCUMENT_NAME,chunk_id,chunk_type,text,page,box_l,box_t,box_r,box_b
0,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,91a9fca9-fe21-4cbb-9564-f42f8d5c5d97,logo,<a id='91a9fca9-fe21-4cbb-9564-f42f8d5c5d97'><...,0,0.087432,0.033485,0.441946,0.096154
1,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,aa7a2b51-6d87-48d0-a809-bb6a948a707f,text,<a id='aa7a2b51-6d87-48d0-a809-bb6a948a707f'><...,0,0.086516,0.106716,0.433271,0.210896
2,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,69b3493a-bbb7-4e13-8923-968322bbee95,text,<a id='69b3493a-bbb7-4e13-8923-968322bbee95'><...,0,0.596189,0.039222,0.842186,0.191678
3,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,30d86071-0659-4537-bdce-7486b4093fa4,text,<a id='30d86071-0659-4537-bdce-7486b4093fa4'><...,0,0.087368,0.234845,0.314343,0.321745
4,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,8ead2ed5-e678-48a9-b6a6-b639adf25234,text,<a id='8ead2ed5-e678-48a9-b6a6-b639adf25234'><...,0,0.597768,0.235228,0.860476,0.268732
...,...,...,...,...,...,...,...,...,...,...,...
360,ab51eee8-43ab-4c60-974f-d37043e93f76,b29bd092-49ea-4410-9eff-d52c0f184640,invoice_23.pdf,c39203b6-f21c-44e2-b5ae-c136238d0be8,table,<a id='c39203b6-f21c-44e2-b5ae-c136238d0be8'><...,0,0.342401,0.342025,0.891367,0.550549
361,ab51eee8-43ab-4c60-974f-d37043e93f76,b29bd092-49ea-4410-9eff-d52c0f184640,invoice_23.pdf,3f35e4ce-53a9-4c59-acce-6591bd8434fb,attestation,<a id='3f35e4ce-53a9-4c59-acce-6591bd8434fb'><...,0,0.704699,0.564972,0.893611,0.651845
362,ab51eee8-43ab-4c60-974f-d37043e93f76,b29bd092-49ea-4410-9eff-d52c0f184640,invoice_23.pdf,66e3f664-912f-4d9a-a7c9-79a22e545dff,text,<a id='66e3f664-912f-4d9a-a7c9-79a22e545dff'><...,0,0.063023,0.649520,0.898202,0.725029
363,ab51eee8-43ab-4c60-974f-d37043e93f76,b29bd092-49ea-4410-9eff-d52c0f184640,invoice_23.pdf,571d90ff-05b9-486b-8143-527148eefd2c,text,<a id='571d90ff-05b9-486b-8143-527148eefd2c'><...,0,0.087943,0.741623,0.337569,0.820169


### Invoice Contents Main - One Row Per Invoice

In [36]:
invoice_main = invoice_summaries[2]
invoice_main.head(10)

Unnamed: 0,RUN_ID,INVOICE_UUID,DOCUMENT_NAME,AGENTIC_DOC_VERSION,INVOICE_DATE_RAW,INVOICE_DATE,INVOICE_NUMBER,ORDER_DATE,PO_NUMBER,STATUS,...,SHIP_VIA,SHIP_DATE,TRACKING_NUMBER,CURRENCY,TOTAL_DUE_RAW,TOTAL_DUE,SUBTOTAL,TAX,SHIPPING,HANDLING_FEE
0,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,dpt-2-20250919,"May 26, 2021",2021-05-26,FCL233308,,,PAID,...,,,,USD,$0.00,0.0,,,,
1,ab51eee8-43ab-4c60-974f-d37043e93f76,83aa2614-39b8-418b-90b7-e726ebd3c90a,invoice_17.pdf,dpt-2-20250919,Dated 10-Dec-14,2014-12-10,2014/00355,,,,...,By Road,,,,2800.00,2800.0,2800.0,,,
2,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,08-18-23,2023-08-18,412824,,,,...,,,,,90.53,90.53,87.0,3.53,,
3,ab51eee8-43ab-4c60-974f-d37043e93f76,116fc387-7366-4838-9ff5-de97eff28186,invoice_16.pdf,dpt-2-20250919,"Mar 20, 2023",2023-03-20,,,,,...,,,,USD,"0,00 US$",0.0,1529.94,110.92,,
4,ab51eee8-43ab-4c60-974f-d37043e93f76,fcfa7140-e234-4eb5-be2e-d356993b6f3f,invoice_11.pdf,dpt-2-20250919,08/30/2021,2021-08-30,2071221,,,,...,,,,,$1800.87,1800.87,,,,
5,ab51eee8-43ab-4c60-974f-d37043e93f76,47b62e70-4187-418c-8cf1-3c1635ee1dc7,invoice_13.pdf,dpt-2-20250919,2-Dec-2021,2021-12-02,812,,,,...,,,AP39TD4595,INR,6021446.00,6021446.0,5102920.0,918525.6,,0.4
6,ab51eee8-43ab-4c60-974f-d37043e93f76,dfb98d68-9dd7-474e-a03a-b77cc8465db9,invoice_14.pdf,dpt-2-20250919,23.02.2019,2019-02-23,40458946,,,,...,UPS,,,EUR,"77,24 EUR",77.24,,,,
7,ab51eee8-43ab-4c60-974f-d37043e93f76,e396238d-7846-4950-aa90-8af0c316f2d4,invoice_12.pdf,dpt-2-20250919,27.03.2025,2025-03-27,11828454,2025-03-27,,,...,,,,USD,2579.96,2579.96,,,,
8,ab51eee8-43ab-4c60-974f-d37043e93f76,371b2d4a-63e9-468b-8275-f878220c0a8f,invoice_10.pdf,dpt-2-20250919,15-MAY-25,2025-05-15,1000110140,,,,...,,,,USD,0.00,0.0,,,,
9,ab51eee8-43ab-4c60-974f-d37043e93f76,4b1db45a-6de2-4bda-94d3-c606276bb1ce,invoice_27.pdf,dpt-2-20250919,02-Mar-2025,2025-03-02,TRX5FPX4C-20,,,,...,,,,USD,103.93 USD,103.93,103.93,0.0,,


### Invoice Line Items - One Row Per Unique Item Purchased

In [37]:
invoice_items = invoice_summaries[3]
invoice_items.head(10)

Unnamed: 0,RUN_ID,INVOICE_UUID,DOCUMENT_NAME,AGENTIC_DOC_VERSION,LINE_INDEX,LINE_NUMBER,SKU,DESCRIPTION,QUANTITY,UNIT_PRICE,PRICE,AMOUNT,TOTAL
0,ab51eee8-43ab-4c60-974f-d37043e93f76,8a016234-4c23-4927-aa41-f7841f70c611,invoice_9.pdf,dpt-2-20250919,0,,,Freshcaller Phone Credits,10.0,1.0,,10.0,
1,ab51eee8-43ab-4c60-974f-d37043e93f76,83aa2614-39b8-418b-90b7-e726ebd3c90a,invoice_17.pdf,dpt-2-20250919,0,1.0,,Tata Photon 3G Plan 750@7GB,1.0,2800.0,,2800.0,
2,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,0,,,Zazzles 2 Selarid Feline 5-15lb Revolution Ge*...,2.0,,,42.0,
3,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,1,,,1 HCP Combo Vaccine (1 Year) The HCP vaccine w...,1.0,,,45.0,
4,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,2,,,1 Technician Appointment,1.0,,,0.0,
5,ab51eee8-43ab-4c60-974f-d37043e93f76,74954e69-8064-4966-a0cc-d4111458af17,invoice_8.PDF,dpt-2-20250919,3,,,Visa payment,,,,-90.53,
6,ab51eee8-43ab-4c60-974f-d37043e93f76,116fc387-7366-4838-9ff5-de97eff28186,invoice_16.pdf,dpt-2-20250919,0,,PF4AJSC9,Lenovo ThinkPad X1 Carbon Gen 10 21CB - 180-de...,1.0,1529.94,,1529.94,
7,ab51eee8-43ab-4c60-974f-d37043e93f76,fcfa7140-e234-4eb5-be2e-d356993b6f3f,invoice_11.pdf,dpt-2-20250919,0,,,Apple iPhone 12 Pro Max 256 GB Excellent Condi...,3.0,600.29,,1800.87,
8,ab51eee8-43ab-4c60-974f-d37043e93f76,47b62e70-4187-418c-8cf1-3c1635ee1dc7,invoice_13.pdf,dpt-2-20250919,0,,7602,ALUMINIUM SCRAPS,26440.0,193.0,,5102920.0,
9,ab51eee8-43ab-4c60-974f-d37043e93f76,dfb98d68-9dd7-474e-a03a-b77cc8465db9,invoice_14.pdf,dpt-2-20250919,0,1.0,242900,Dunlop Nylon Max Grip Jazz III Players Pack - ...,1.0,3.66,,3.66,3.66


## Save Structured Results

Save the four summary tables to a local file. These could also be inserted into a database or unse for other downstream tasks.

In [38]:
# Save the dataframe to CSV files inside the results_folder
invoice_markdown.to_csv(results_folder / "invoice_markdown.csv", index=False)
invoice_chunks.to_csv(results_folder / "invoice_chunks.csv", index=False)
invoice_main.to_csv(results_folder / "invoice_main.csv", index=False)
invoice_items.to_csv(results_folder / "invoice_items.csv", index=False)

## ✅ Wrap-Up

You’ve now used LandingAI’s ADE to:
- Parse and extract data from invoices, whether the originals are images or PDFs.
- Define custom fields using a `pydantic` schema.
- Run Agentic Document Extraction on a batch of documents and save the results.
- Save the extracted results and as structured data.

To learn more, visit the [LandingAI Documentation](https://docs.landing.ai/ade/ade-overview).